Jump to content

Chinese language

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 24.27.58.16 (talk) at 22:56, 22 June 2002 (*link to Taiwanese language). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Chinese language is a member of the Sino-Tibetan family of languages. It is related to Tibetan and Burmese. It is related not at all to Korean,Vietnamese,Thai or Japanese, though these languages (like other Asian languages) were strongly influenced by Chinese in the course of history. Korean and Japanese both have writing systems which contain Chinese characters. Along with those two languages, Vietnamese contain many Chinese loanwords.

The notion of a "Chinese Language" may seem at first to be a fiction. The term "Chinese" is employed to the classical written language known as "wen2 yan2 (文言 lit. literary language)" which was used by Confucius, as well as the modern standard known as "bai2 hua4 (白話 lit. vernacular)". It includes many different spoken variations which may be mutually unintelligible. The spoken language of Beijing is for example very different than the conversational language of Hong Kong.

Nevertheless, there are good reasons for using a collective name. The most important one is that Chinese themselves consider the language to be unified entity, and there are good reasons for treating it as such. The most important is that the distinctions between the different variations of Chinese are not very distinct. For example, in writing an informal love letter, one may use informal "bai hua." In writing a newspaper article, the language used is different and starts including aspects of "wen yan." In writing a cermonial document, one would use even more "wen yan." The language used in the cermonial document may be completely different than the love letter, but there is a socially accepted continuum that is exists between the two.

There are similar continuums in spoken language. A person living in Taiwan for example, would commonly mix pronounciations, phrases, and words from Mandarin and Min-nan, and these mixtures would be considered socially appropriate under many cirumstances. A person living in Hong Kong would use different combinations of Mandarin, collquial Cantonese, and written Cantonese depending on the social situation.

Another distinctive aspect of the Chinese language is the complex relationship between the various spoken varieties, and the various written varieties. Chinese is written using a "logographic" script in which one character represents one word element. It is generally the case that a Chinese text written in "bai hua" would be readable by most educated Chinese, but again the relationship between written and spoken Chinese is complicated. For example, an educated person in Hongkong would be able to write a text in written formal Cantonese which is readable by someone who is a Mandarin speaker. However, that written formal Cantonese while similar to written formal Mandarin would be very different from a word for word transcription of what the Cantonese speaker would speak and would also be different from written colloquial Cantonese. One might ask that if formal written formal Cantonese is different from spoken Cantonese, where does the reader learn written formal Cantonese and the answer is that they would learn it in school.


Spoken variations of modern Chinese

Linguists classify the variations in spoken Chinese into seven groups. Within these groups, there are many subgroups many of which are mutually unintelligible. Also the amount of "linguistic consciousness" varies between the groups. For example, a speaker of Cantonese dialect living in Hong Kong tends to feel a great deal of common identity with a speaker of Cantonese living in Taishan, even though these two varieties of Cantonese may be almost unintelligible. By contrast, a Wu speaker in Hangzhou generally does not think of themselves as belonging to the same group as a Shanghaiese speaker in Shanghai even though they are linguistically similar. One can see this even in the naming. The Hong Kong and Taishan person would both claim to be speaking Cantonese in the first case, while in the second case only the person from Shanghai would be speaking Shanghaiese.

There are also great differences in the geographical variation of intelligibility. Mandarin dialects are remarkably constant with people living hundreds of kilometers from each other able to communicate intelligibly. In Fujian, people living ten kilometers away from each other can be speaking untelligible variations of Min.

One distinctive feature of Mandarin is the loss of tones in comparison to Middle Chinese and the other dialects. The result of this is that many words which are mono-syllabic in other dialects are expressed as combinations of syllables in Mandarin.


The Chinese Written Language

The Chinese Writing system is logographic, i.e. each character expresses a word part. Originally, the characters were actually little pictures depicting what was meant. This, however, proved inconvenient (as you can imagine - try to depict "philosophy"!). There are still a number of characters which can be traced back to such pictorial characters, but many characters used today are compositions of other, more simple characters. Chinese scholars identify several types of compounds, including "meaning-meaning" compounds, in which each element of the character contributes to the meaning, and "sound-meaning" compounds, in which one component indicates the kind of concept the character describes, and the other hints at the pronunciation (though, as the spoken language has evolved since the characters were standardized, these hints are often quite useless and sometimes directly misleading). For example, the character for "country" (國 'guo2') consists of the outer square (口) which represents the wall/fortress, the radical 'ge1' (戈 meaning lance, a weapon) which represents defense, the radical 'mouth' (口 kou3) which represents population, and a horizontal stroke (一) which represents land. This character falls in the meaning-meaning catagory. Another example, the character for "mother" (媽 'ma', 1. pitch, in Mandarin) consists of one component meaning "female (女)" and another one meaning "horse (馬)" - now this doesn't mean Chinese view mothers as female horses! The first component (or "radical") simply tells that the character denotes a female entity, whereas the second acts as a pronunciation guide by refering to the word for "horse", which is also pronounced 'ma', though in a different pitch.

Every character has a "radical", or most fundamental component, and this design principle is exploited by Chinese dictionaries: full characters are ordered according to their initial radical (for which there are only about 200 possibilities) and the number of strokes they consist of (a more detailed discussion of this can be found in the entry on ideographic writing systems).

Also, this principle is exploited by everybody learning to write Chinese: The vast number of Chinese characters can be memorized a lot easier, if they are mentally decomposed into their constituting radicals. The question, how many characters there are, is subject of a heated discussion: In the 18th century, European scholars claimed the total tally to be about 80,000. This number, however, is exaggerated: The most concise dictionary (the Kangxi Dictionary 康熙字典) lists about 40,000 characters. One reason for large number of characters is that they include all of the different characters in the different variations of Chinese. Popular estimates say, that about 3,000 characters are needed to read a Chinese newspaper, and 4,000 to 5,000 constitute a decent education.

Classification of characters

One can classify characters into character sets of which the following are in common use:

  • "bai hua"
  • "wen yan"
  • "written colloquial Cantonese" - Cantonese is unique in that is it has a commonly used written character system which is different from "bai hua" or "wen yan"
  • "dialectal characters"

Character forms

There are currently two standards for printed Chinese characters. One is the Traditional Writing System, used in Hongkong, Taiwan and by Overseas Chinese. The Peoples's Republic of China (also Singapore) uses the Simplified Writing System, which uses simplified forms for some of the more complicated characters. In addition most Chinese in writing letters will use some personal simplications for cursive.

The Chinese characters are also used to write the Chinese numerals.

Transcription and Romanization

The official standard transcription of Putonghua into the Latin alphabet is Pinyin, though other systems are still sometimes used, such as the older Wade-Giles. Other Chinese languages are transliterated with more or less adhoc systems, sometimes without a clear standard, sometimes with several.

Chinese grammar

All dialects share a similar grammatic system, which is different from the one employed by European languages: All words have only one grammatical form, neither conjugation nor declension nor a tense system exist. Concepts like "plural" or "past tense" have to be expressed in a syntactical way:

Tenses are indicated by adverbs of time ("yesterday", "later") and a number of particles indicating, e.g., completion of an action. Particles are also used to form questions: The syntax of questions is exactly the same as in declarative statements (basically, SUBJECT - VERB - OBJECT). Only the appended particle makes it a question. Plural meaning most of the time has to be inferred from context, since the Chinese language doesn't provide any lexical means of expressing this concept for most nouns (apart from giving exact numbers, which is, of course, possible).

Thus Chinese grammar is generally quite simple compared to that of the Romance languages. However, some subtle grammatical features which are unique to Chinese serve to enrich the grammar; for example, the notion of a "perfective" which signifies the degree to which a verb was completed.

Computer processing of Chinese

The computerized processing of Chinese characters involves some special issues both in input and character encoding schemes.

Chinese encoding systems

  • guobiao (國標 abbreviation for Chinese National Standard) which is used in Mainland China. All guobiao standard is prefixed by GB, the latest version is GB18030 which is a one, two or four byte encoding.
  • big5 which is used in Taiwan and Hong Kong is a one or two byte encoding.
  • Unicode is not well accepted by the Chinese government. The Chinese government mandates software must support GB18030 encoding to be legally sold in China. Some says it is purely a political move of protectionism.

Because guobiao is used in Mainland China while big5 is used in Taiwan and Hong Kong, guobiao is usually displayed using simplified characters and big5 is usually displayed using traditional characters. There is however no mandated connection between the encoding system and the font used to display the characters. However, font and encoding are always tied together for practical reasons. For example, one cannot map traditional Chinese glyphs to the GB encoding without compromising the meaning of some characters. Some "simplification" involves mapping many characters with different meaning and usage into a much simpler common writing. One can easily map many-to-one in a Big 5 encoding using simplified glyphs. But mapping one-to-many when assigning traditional glyphs to the GB encoding is tricky, because whatever you pick, some characters would be the wrong choice in some of the usages. Technically one can map simplified glyphs to the Big5 encoding, but such product would not find a profitable market and hence practically non-existent. Unlike UNICODE which assigns different codes for simplified characters than traditional characters, neither Big5 nor Guobiao supports both traditional and simplified characters simultaneously. The GB18030 may be an exception because it was designed to be even bigger than Unicode.

One interesting problem in Chinese data processing is the conversion between traditional and simplified Chinese. As stated in the above paragraph, the one-to-many and many-to-one conversions are tricky. The traditional to simplified (many-to-one) conversion is simple but sometimes information is lost and a round trip conversion often results in a data loss. The simplified to traditional (one-to-many) conversion often requires usage context or common phrases to resolve conflicts.

History of Chinese

Deciphering the history of Chinese poses an interesting problem. How do you know the pronounciation of a language which is not written phonetically. The effort that has been devoted at solving this problem is a testimony to the ingenuity of linguists.

Archaic Chinese

Much of the western work in reconstructing Archaic Chinese comes from the work of Bernard Kalgren whose work is based on the forms of the characters.

Middle Chinese

Linguists are confident in having a good reconstruction of which Middle Chinese sounded like. The evidence for the pronounciation of Middle Chinese comes from two sources: modern dialect variations and rhyming dictionaries.

Modern Chinese

The transition from "wen yan" to "bai hua"

The creation of a "national language"

Educating Mandarin

Character simplification

The Future of Chinese

Weblinks:

Zhongwen.com: Chinese to English dictionary and other resources presented in English; searchable by English meanings; Chinese text displayed as graphics (i.e. does not require any Chinese font).
Chinese to English Dictionary: searchable by English meanings; Chinese text in Big5 code (i.e. requires Chinese font).
Chinese Linguistics: Sites on Chinese linguistics (in English).
Chinese Characters Dictionary: supports Japanese, Korean, Cantonese, Hakka etc.
Cantonese Talking Syllabary: in Chinese; require Big5 font.