Chinese language
The Chinese language (中文) is a member of the Sino-Tibetan family of languages. It is related to Tibetan and Burmese. It is not at all related to Korean,Vietnamese,Thai or Japanese, though these languages (like other Asian languages) were strongly influenced by Chinese in the course of history. Korean and Japanese both have writing systems which contain Chinese characters. Along with those two languages, Vietnamese contain many Chinese loanwords.
The notion of a "Chinese Language" may seem at first to be a fiction. The term "Chinese" is employed to the classical written language known as "wen2 yan2 (文言 lit. literary language)" which was used by Confucius, as well as the modern standard known as "bai2 hua4 (白話 lit. vernacular)". It includes many different spoken variations which may be mutually unintelligible. The spoken language of Beijing is for example very different from the conversational language of Hong Kong.
Nevertheless, there are good reasons for using a collective name. The most important one is that Chinese themselves consider the language to be unified entity, and there are good reasons for treating it as such. The most important is that the distinctions between the different variations of Chinese are not very distinct. For example, in writing an informal love letter, one may use informal "bai hua." In writing a newspaper article, the language used is different and starts including aspects of "wen yan." In writing a cermonial document, one would use even more "wen yan." The language used in the cermonial document may be completely different from that of the love letter, but there is a socially accepted continuum existing between the two.
There are similar continuums in spoken language. A person living in Taiwan for example, would commonly mix pronunciations, phrases, and words from Mandarin and Min-nan, and these mixtures would be considered socially appropriate under many cirumstances. A person living in Hong Kong would use different combinations of Mandarin, collquial Cantonese, and written Cantonese depending on the social situation.
Another distinctive aspect of the Chinese language is the complex relationship between the various spoken varieties, and the various written varieties. Chinese is written using a "logographic" script in which one character represents one word element. It is generally the case that a Chinese text written in "bai hua" would be readable by most educated Chinese, but again the relationship between written and spoken Chinese is complicated. For example, an educated person in Hong Kong would be able to write a text in written formal Cantonese which is readable by someone who is a Mandarin speaker. However, that written formal Cantonese while similar to written formal Mandarin would be very different from a word for word transcription of what the Cantonese speaker would speak and would also be different from written colloquial Cantonese. One might ask that if formal written formal Cantonese is different from spoken Cantonese, where does the reader learn written formal Cantonese and the answer is that they would learn it in school.
Spoken variations of modern Chinese
Linguists classify the variations in spoken Chinese into seven groups. Within these groups, there are many subgroups many of which are mutually unintelligible. Also the amount of "linguistic consciousness" varies between the groups. For example, a speaker of Cantonese dialect living in Hong Kong tends to feel a great deal of common identity with a speaker of Cantonese living in Taishan, even though these two varieties of Cantonese may be almost unintelligible. By contrast, a Wu speaker in Hangzhou generally does not think of themselves as belonging to the same group as a Shanghaiese speaker in Shanghai even though they are linguistically similar. One can see this even in the naming. The Hong Kong and Taishan person would both claim to be speaking Cantonese in the first case, while in the second case only the person from Shanghai would be speaking Shanghaiese.
There are also great differences in the geographical variation of intelligibility. Mandarin dialects are remarkably constant with people living hundreds of kilometers from each other able to communicate intelligibly. In Fujian, people living ten kilometers away from each other can be speaking untelligible variations of Min.
- Mandarin: This is the mother dialect of Chinese living in Northern China and Sichuan province. It is the basis for the official spoken language of Chinese which is called putonghua in People's Republic of China and guoyu in Republic of China (i.e. Taiwan).
One distinctive feature of Mandarin is the loss of tones in comparison to Middle Chinese and the other dialects. The result of this is that many words which are mono-syllabic in other dialects are expressed as combinations of syllables in Mandarin.
- Wu dialect: spoken in the provinces of Jiangsu and Zhejiang. Wu includes Shanghaiese
- Hakka dialect: spoken by the Hakka people in Southern China
- Min dialect: this is spoken in Fujian and Taiwan and includes Hokkien and Taiwanese
- Cantonese: spoken in Guangdong Province, Hongkong, Macao, Taiwan, all over Southeast Asia and by Overseas Chinese
- Xiang dialect:
- Gan dialect:
The Chinese Written Language
This has been taken to a separate page, see Chinese written language
Chinese grammar
All dialects share a similar grammar system, different from the one employed by European languages. All words have only one grammatical form, as the language lacks conjugation, declension, or a tense system. Concepts like plural or past tense are expressed in a syntactical way:
Tenses are indicated by adverbs of time ("yesterday", "later") and a particle (le) indicating completion of an action or change of state (along with several other context-dependent meanings). Particles are also used to form questions; the syntax of a question is exactly the same as a declarative statement (basically SUBJECT - VERB - OBJECT) with only the appended particle making it a question. Plural meaning has to be inferred from context, since the Chinese language doesn't provide any lexical means of expressing this concept for most nouns (apart from giving exact numbers).
Because of the lack of inflections, Chinese grammar may appear quite simple compared to that of the Romance languages to a speaker who is used to inflected languages. However, features which are unique to Chinese serve to make the grammar complex; for example, the notion of a "perfective" which signifies the degree to which a verb was completed.
Computer processing of Chinese
The computerized processing of Chinese characters involves some special issues both in input and character encoding schemes.
Chinese encoding systems
- guobiao (國標 abbreviation for Chinese National Standard) which is used in Mainland China. All guobiao standard is prefixed by GB, the latest version is GB18030 which is a one, two or four byte encoding.
- big5 which is used in Taiwan and Hong Kong is a one or two byte encoding.
- Unicode is not well accepted by the Chinese government. The Chinese government mandates software must support GB18030 encoding to be legally sold in China. Some says it is purely a political move of protectionism.
Because guobiao is used in Mainland China while big5 is used in Taiwan and Hong Kong, guobiao is usually displayed using simplified characters and big5 is usually displayed using traditional characters. There is however no mandated connection between the encoding system and the font used to display the characters. However, font and encoding are always tied together for practical reasons. For example, one cannot map traditional Chinese glyphs to the GB encoding without compromising the meaning of some characters. Some "simplification" involves mapping many characters with different meaning and usage into a much simpler common writing. One can easily map many-to-one in a Big 5 encoding using simplified glyphs. But mapping one-to-many when assigning traditional glyphs to the GB encoding is tricky, because whatever you pick, some characters would be the wrong choice in some of the usages. Technically one can map simplified glyphs to the Big5 encoding, but such product would not find a profitable market and hence practically non-existent. Unlike UNICODE which assigns different codes for simplified characters than traditional characters, neither Big5 nor Guobiao supports both traditional and simplified characters simultaneously. The GB18030 may be an exception because it was designed to be even bigger than Unicode.
One interesting problem in Chinese data processing is the conversion between traditional and simplified Chinese. As stated in the above paragraph, the one-to-many and many-to-one conversions are tricky. The traditional to simplified (many-to-one) conversion is simple but sometimes information is lost and a round trip conversion often results in a data loss. The simplified to traditional (one-to-many) conversion often requires usage context or common phrases to resolve conflicts.
History of Chinese
Deciphering the history of Chinese poses an interesting problem. How do you know the pronunciation of a language which is not written phonetically. The effort that has been devoted at solving this problem is a testimony to the ingenuity of linguists.
Archaic Chinese
Much of the western work in reconstructing Archaic Chinese comes from the work of Bernard Kalgren whose work is based on the forms of the characters.
Middle Chinese
Linguists are confident in having a good reconstruction of which Middle Chinese sounded like. The evidence for the pronunciation of Middle Chinese comes from two sources: modern dialect variations and rhyming dictionaries.
Modern Chinese
The transition from "wen yan" to "bai hua"
The creation of a "national language"
Educating Mandarin
Character simplification
The Future of Chinese
Weblinks:
- Zhongwen.com: Chinese to English dictionary and other resources presented in English; searchable by English meanings; Chinese text displayed as graphics (i.e. does not require any Chinese font).
- Cantonese Help Sheets: Learn written Chinese and spoken Cantonese with this print-friendly site.
- Chinese to English Dictionary: searchable by English meanings; Chinese text in Big5 code (i.e. requires Chinese font).
- Chinese Linguistics: Sites on Chinese linguistics (in English).
- Chinese Characters Dictionary: supports Japanese, Korean, Cantonese, Hakka etc.
- Cantonese Talking Syllabary: in Chinese; require Big5 font.
- [1]: Listing of Chinese dialects in Ethnologue
- Chinese Translator: Professional translations from English to Chinese