Jump to content

Talk:List of languages by number of native speakers

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Quadell (talk | contribs) at 13:43, 5 August 2005 (Chart request). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

More Urdu speakers than Punjabi speakers? Urdu is spoken mainly in Pakistan and some parts of India. In Pakistan, it is spoken only by 10% of people as mother tongue. Even the most conservative estimates put the Punjabi population in Pakistan at about 50% of population. The population of Pakistan is about 160 million. This means that there are about 80 million Punjabi speakers in Pakistan alone. This does not include about 25 million Punjabi speakers in India. There are also Punjabi speakers in other countries. The total Punjabi speaking population should be around 110 million, not the 57 million. The Urdu population should be around 25 million, not the 60 million.

Filipino diaspora

There are sizeable communities of Filipinos in Canada, Japan, Spain, Italy, Hong Kong, etc. Please research this well. This is well known among Filipinos. If sources are demanded, I will supply, but these facts are well known. I myself, live in Japan and there is a large community of filipinos here. --Jondel 05:35, 6 Jun 2005 (UTC)

(I'm just joining this discussion, not sure how it started) You're right that there are sizeable Filipino communities in those five countries, plus maybe 100 more, but none are noteworthy. "Gusto ko mag-abroad" Filipinos travel to wherever they can. My suggestion would be a note saying "Filipino communities can be found in most countries of Asia, North America, South America, and Europe" There should also be a link to an article about Filipino emigrant communities. That is what is particularly noteworth about Filipino foreign presence, and would be a very interesting article, IMO. Gronky 17:14, 2005 Jun 8 (UTC)
Not sure what qualifies as "noteworthy" but according to the last census (2001), Canada has 174,060 for whom Tagalog is the first language. Even the estimates for the percentage of English as a first language in Canada are optimistic. See http://www.answers.com/topic/language-in-canada for the full breakdown of languages in Canada. 70.71.9.244 04:42, 25 Jun 2005 (UTC)Josh Oakes

341,000,000 total English speakers?!

According to these statistics, the United States - with nearly 296 million people - must contain 87% of the world's English speakers!

US Population: 295,734,134

+ United Kingdom: 60,441,457

+ Canada: 32,805,041

+ Australia: 20,090,437

= 409,071,069 people.

I'm sure that not all of those 409 people speak English as their first language, but the vast majority do - and there are people in other countries that speak English as their first language. How do you get 341 million English speakers? Rmisiak 05:56, 6 Jun 2005 (UTC)

  • the case of Portuguese is even worse. Number of speakers worldwide (article): 176,000,000 Real number of speakers in Brazil (alone): 184,000,000 so dont complain. The number of native speakers of Portuguese is 208 million. with Bilinguals: 218 million. BTW dont count the all population of a country as native speakers... just an advice. -Pedro 06:22, 6 Jun 2005 (UTC)

As a data point, the U. S. Census Bureau reports (in 2000) that of 262,375,000 residents of the United States aged 5 and above, 215,424,000 speak only English at home, 28,101,000 (10.7%) speak Spanish at home, and 18,851,000 (7.2%) use some other language. This being so, I don't think the Summer Institute of Linguistics' estimate of 341 million first-language speakers of English, world-wide, is very far off. (The statistics on their website look awfully well-thought-out and well-researched.) Using the population figures given above, 82.1% of the United States + 80% of Canada (a guess) + 95% of Britain and Australia (more guessing) gives a total for the four countries of 345 million, very close to the 341 million in the SLI table. If 8% of the population is under age 5, and maybe shouldn't be counted, that more than offsets the native English speakers in Ireland, New Zealand, South Africa, etc. No, I think the number (for English) looks pretty accurate. Frjwoolley 16:38, 9 Jun 2005 (UTC)

Delete Page

Hi,

I'm wondering whether it's worth keeping these sorts of pages in regards to language ranking? Statistics vary wildly depending on source and nothing seems reliable. Is it really worth keeping with so many different figures for each language?

Sukh 10:32, 6 Jun 2005 (UTC)

About the Portuguese language, don't forget also to mention that there is an enormous population of lusophones (Portuguese, Brazilian, Cabeverdian, Angolan emigrants) throughout the world. Only Portuguese emigrants are numbered around 5 million luso-descendants from the first and the second generation in countries like France, Luxembourg, Andorra, Venezuela, South-Africa, Australia, The U.S., Canada, Hawaii, Belgium, The Netherlands, The U.K., Germany, Switzerland, Spain, Argentina, Dubai, etc). They are well organised together to keep their heritage. So I think that the list is correct concerning the Portuguese speakers around the world as first and second language. Filipe

Suggestion

That the countries listed under each language be marked in bold where the majority of the population speak that language. It looks rather silly having both e.g. the United States, and Germany, listed with equal emphasis as places where English is spoken - MPF 23:20, 6 Jun 2005 (UTC)

  • that's a good idea... what about bilinguals? for that bold listing what should we count? there aree countries where a languagee is spoken but mostly by bilinguals (i'm not talking about learning languages).-Pedro 18:43, 7 Jun 2005 (UTC)

Locked table

Please do not modify the table contents, as they come directly from an specific source. If you have information from another source, please discuss it here first. Thanks. —Cantus 02:11, Jun 7, 2005 (UTC)

What source would that be, if we may humbly ask? Argyrios 05:44, 7 Jun 2005 (UTC)
I didnt alter any table (except adding countries and a more realistical number!), but you constantly revert my edits. Mr. Cantus if you say wikipedia is unreliable, it really is with that kind of incorrect data from your sacred source. So we are trying to correct it! -Pedro 19:05, 7 Jun 2005 (UTC)

Marked with mild warning

I think that it is obvious that the information in the table is highly inaccurate and that there are no objective criteria for listing a country as a significant habitat for first language speakers. Christ, the CIA world factbook specifically lists English as a common "SECOND LANGUAGE" in Denmark, yet Cantus keeps reinserting it and other equally absurd countries. Whatever your source is, Cantus, cough it up -- it's not accurate. As others have noted. Argyrios 05:59, 7 Jun 2005 (UTC)

Read the article. The source is there, as it has always been. —Cantus 02:21, Jun 8, 2005 (UTC)

This is deeply silly, I agree. We should distinguish countries where a language is predominant from ones where it is a historical minority from ones where there is a recent diaspora from ones where a lot of people speak it as a second language (the last, I think, should not be listed at all). How on earth is it useful to say that Portuguese is spoken in Luxembourg? john k 06:38, 8 Jun 2005 (UTC)

  • Since it is the language of 1/7 of the population and it is gaining some status, just like Spanish is gaining in the US. And even Russian is gaining in Portugal (honestly I see Russian as more important in Portugal than Mirandese (an Historical and local language with some official status) - no company would ever use Mirandese, but some use Russian in ads, there Russian newspapers, etc. In the diaspora, the Portuguese have a tendency to become assimilated in a country's culture, and the language is seen as very important - so everyone is capable in speaking the official language and they propably speak it also between them in public places. I think we should mark bold where it is official (there's the problem of Africa, where the official languages French, English and Portuguese are still not widely spoken), italicize where it is spoken in the diaspora recently by more than 1 % of the population or by more than 100,000 people (or 1,000,000). the rest (historical unofficial languages) should be listed normally.-Pedro 11:50, 8 Jun 2005 (UTC)
Right now, I think that most of the people who have commented here agree that the current table is problematic. The warning is very mild and simply says basically not to trust the info on the table (which directly contradicts MANY other articles here at Wikipedia including English Language and Arabic, for example. Please do not remove the warning unless your vote creates a majority in favor of removing the warning, and do not revert a deletion of the warning unless there is a majority in favor of having it.
Wikipedia is democratic.
For the sake of fairness, I will not re-add the warning until a few more people vote.
For reference, the warning looks like this:

Votes in Favor of Keeping the Warning (3)

  1. Argyrios 08:57, 8 Jun 2005 (UTC)~
  2. john k 13:47, 8 Jun 2005 (UTC)
  • just in case the last isnt accepted. -Pedro 11:50, 8 Jun 2005 (UTC)

Votes in Favor of Not Having the Warning (0)

  1. we dont need the disputecheck after the changes that were made. -Pedro 11:53, 15 Jun 2005 (UTC)
I agree that it is much, MUCH better now, but I am not quite willing to change my vote yet until someone explains to me why English is 4th, when all the resources I check list it as 2nd or 3rd. Can someone provide a source for fourth? Argyrios 01:33, 19 Jun 2005 (UTC)
  • The same occurs with Portuguese that is always placed 6th and German where it is often the 9th. That is due to Arabic... it has a problem not only if it is a unified language but also native language of whom in those countries (I've no opinion in this, Arabic for me is the real Greek, I dont understand a single word, nor It is a subject of my interrest. I dont know why some say it is a single language or various, dont forget that the writing system counts in the definition of a language. The "spoken Arabic" thing that I find in wikipedia doesnt sound very good to me or even neutral, BUT I REALLY DONT KNOW! Sorry, sorry to the experts. IMO, it seems something to devide Arabic, although often the result is exactly the reverse. The problem of chinese is more simple to be solved and splited. English maybe really behind Chinese and Hindi, because these people (sorry, no offense to Indians and Chinese, ppl that I fully respect) reproduce like rabbits (maybe worse than rabbits - countries with more than a 1000000000 people O_O that should be the population of the all world!), and that doesnt give any value to these languages. Just the savage economies of today give value to these rabbit societies, one of the errors and problems of our world today. Maybe Hindi became more spoken that English the other day, because of reproduction. But many solve the problem of the world language due to rabbit societies, using "the most spoken western languages". Again, Arabic: I would like to see a real debate between neutral native speakers of different countries and respective native experts about their language. these Ppl shouldnt be fanatic and ppl that are not pro-western world. Maybe some dialects are languages, and in some places Arabic is a second language. A native would help! -Pedro 02:44, 19 Jun 2005 (UTC)

Votes for adding footnotes with numbers corrected(1) -this wont change the original table.

  1. Pedro 11:50, 8 Jun 2005 (UTC)
    What's the point of this? If the list is wrong, we should scrap it and make another one. The list at List of the most spoken native languages looks marginally better to me. john k 13:47, 8 Jun 2005 (UTC)
The list is definitely wrong on a number of issues. I'm Danish and it is completely unsubstantiated to include English as an important language in Denmark (no matter what SIL says.) English is merely taught in Danish schools as a secondary language, but not used in everyday interaction. I don't find the inclusion of Japan, Italy, Germany, Honduras or Venezuela under "English" very convincing either. On a more problematic note, the list is not cross-checked with the articles about the individual languages. E.g. Armenian is listed as 6 mill. speakers but the article on the Armenian language says 9 mill. (is one of them counting a diaspora?) Xhosa is listed as 6.9 mill. while the article about Xhosa says 7.9 mill., Farsi is listed as 31 mill. while the article says 62-110 mill. It seems like we have to get a better list (and cross-check it). --Valentinian 07:25, 9 Jun 2005 (UTC)
    • They have included France as the third most important country for Portuguese and that is not true, that place is for Angola (that in the near future will be the second place with most native speakers). Mozambique also has more 1st language speakers than France (where some are returning to Portugal). It should be listed as fourth. Besides how can France be more important than São Tomé and Príncipe and other places that use Portuguese? I think the percentage is a more reliable way to order countries. Besides there are other forgotten countries that I've included and Mr. Cantus removed it. Mbundu is really named Mbundu in English??? I really thought that the interaction was made throw Portuguese were the term Quimbundo - or Kimbundu in English/common Portuguese) - the mbundu language is preferable for several tempting reasons. But we should use what is really used in English. I really think that list was made with rush. ABout the problem of English I never went to Denmark although I plan to. But I also think much of the data for English (the language of the people who made the list, incredible!!!) is unreliable. BTW isnt English really used in Sweden or Finland? -Pedro 10:07, 9 Jun 2005 (UTC)

By the way the reasons for using Kimbundu instead of Mbundu are:

  • Mbundu are a group of related languages
  • Mbundu is similar to bunda (ass)
  • Mbundu is an ethnicity
  • Kimbundu means the Mbundu language and it the term that it is often used in Angola.

The article for the language doesnt even exist. o_O The portuguese article (a stub for Kimbundu) developed by a Brazilian women and I. [1]-Pedro 01:05, 12 Jun 2005 (UTC)

  • Okay. Completely disregarding the African language discussion, I would like to discuss the matter of the Arabic "languages" with you. As far as the standard core of the language is concerned, they're practically all the same. Even though most North Africans are ethnic Berbers, whether they are aware of their heritage or not, the vast majority speak Arabic as a first language. I think the 250 million speakers figure for Arabic is quite accurate. It may even exceed that amount. The various spoken varieties of Arabic as listed by ethnologue -- they're nothing more than dialects. In written form, anyone can understand another. It's just like the relation between West Flemish and standard Dutch. I don't trust the number of second language speakers though; It is absurd. The SIL Ethnologue has a tendency to publish false and unreliable information. They classify the Chakma language as Bengali-Assamese in the Ethnologue, when it is clearly Tibeto-Burman. If anything, it is a language which was heavily influenced by Bengali due to close proximity of the speakers of the two distinct languages. Eric July 1, 2005 10:56 (UTC)

That's what I'm afraid. People think it is the same language, they have the same writing system, and some people think these are different languages... I would prefer the official and people's notion of language than of some groundbreaking specialists. But the cases of Chinese and Arabic seems complicated. People have a tendency to compare to the fate of the Romance languages, but that's fate, and a unique case, and it can not be compared to anything. -Pedro 5 July 2005 22:09 (UTC)

What does "they're nothing more than dialects" mean, Eric? Are you using the term in the POV sense to mean a socially dispreferred form? In linguistics, when two versions of a language are called dialects, it means that there are noticeable differences, but that their speakers can functionally communicate with each other. That is, the Queen's English is "nothing more than a dialect". What "functionally" means is of course a matter of some opinion, but by comparision with existing standards, it would mean better than Portuguese, Spanish, and Italian can communicate (unless we want to lump those together!). In a linguistic sense, Arabic is not a single language. As for having the same writing system, that's assuming a lot. Many people cannot write well enough to communicate effectively, and many more cannot write at all. For them, Arabic is not a 'language' in any practical sense of the word. Also, Japanese visiting China can communicate in writing, but that doesn't mean we should classify their languages as related. In special cases like Arabic, we might want to list pan-Arabic, as we have it, for the social sense of the word, plus the various Arabic languages separated for the linguistic sense. Or, we could have an entry for Standard Arabic, which is a unified language. This is what you're talking about with being able to understand each other in writing. However, the number of speakers is substantially less (by a good 100 million) that the total for all Arabic. (Again, what good does it do to speak 'Arabic', if you can't understand someone else who also speaks 'Arabic'?)
(As for Chakma, yes, Ethnologue makes lots of mistakes like that. But I wonder if what they call the Chak language of Bangladesh might be your language?) kwami 2005 July 5 23:04 (UTC)

Where spoken - needs much quality review

39 languages claim to be spoken in the United States. That would be true only if the criteria were extremely low. This low criteria hasn't been applied evenly, if it was, there would be 30+ languages in most developed countries. The criteria must be stated if the information is to be useful, and I suggest setting the criteria to a higher level. How about 1% of a countries population? Gronky 16:28, 2005 Jun 8 (UTC)

The whole thing is silly. I've removed a bunch of the sillier ones (and perhaps some less silly ones...) This should be a list of place where the language is spoken to a significant extent, not any place where somebody who speaks that language may live. john k 15:27, 9 Jun 2005 (UTC)

  • I dont know what silly data you removed, but the new data for Portuguese was even sillier, so I've altered the data for Portuguese and I hope you all do to other languages. I've putted the countries with official status plus where it is spoken by more than 1% of the population (this using: <small> </small> -Pedro 16:01, 9 Jun 2005 (UTC)

More: I've altered the number of speakers, but I didnt want to alter the countries order, so I've used the same number has the language that the article considers more spoken. Althought the number of port. speakers is a bit bigger, that languages can also have bigger numbers. -Pedro 16:03, 9 Jun 2005 (UTC)

Hey Pedro - as I said, I wasn't sure I'd done it completely well. That being said, I like the way you've done it, dividing between major countries where it is either spoken by most people or is an official language, and other countries where there are "significant communities." I'd suggest that we do this for all of them. As for what silly data I removed - Denmark and Germany as English-speaking countries? There was much else that was silly. john k 21:04, 9 Jun 2005 (UTC)
  • Ok, we need the same for other languages! -Pedro 22:00, 9 Jun 2005 (UTC)
I've been going through and changing it - BTW, has anybody noticed the really strangely low numbers for Kurdish (called "Kurdi," giving it a red link, and how Turkey is not mentioned as a country where Kurds live?) john k 23:43, 9 Jun 2005 (UTC)

Also, what are we to do with languages like Awadhi, which is often considered a dialect of Hindi? john k 23:47, 9 Jun 2005 (UTC)

Kurdish - like Arabic - is divided into a load of different languages by the Ethnologue. Hence the numbers. No speaker of the Ethnologue's "Kurdi" lives in Turkey - it's the Iraqi dialect... Mustafaa 23:49, 9 Jun 2005 (UTC)

Yes, but we give one Arabic, and one Farsi, even though those languages are also divided by Ethnologue. john k 23:57, 9 Jun 2005 (UTC)

Oh, I wasn't endorsing Ethnologue! By all means change it. I'm just pointing out the source of the problem (and no doubt of other problems.) - Mustafaa 22:02, 10 Jun 2005 (UTC)

Inconsistency in title

The article is headed "List of languages by total speakers", but it only actually makes reference to "Native total speakers". Shouldn't this be changed?

I dont think so. -Pedro 19:35, 15 Jun 2005 (UTC)
Yes it should be changed. There are well over 341 million english speakers, many in countries where learning english as a second language is commonplace.
Speaking of which, I would be interested in seeing a table of language speakers all together, not just those who speak it natively. --tomf688(talk) 22:24, Jun 18, 2005 (UTC)
Learning English doesnt mean English is a second language, there are several languages that people learn! Dont be biased. And learning English often means NOTHING. Most people wont speak it or will speak a very rudimentary version of English. Your idea is near the stupidity: because learning a language doesnt mean that the language is the second language of someone. People learn several languages, I would count for at least half a dozen! So stupid! Second language speakers are people who speak English in countries were English is an official language, but it isnt their first language. These people use English as a second language. -Pedro 23:10, 18 Jun 2005 (UTC)
Um. Huh? --tomf688(talk) 02:55, Jun 19, 2005 (UTC)

Okay, a few points:

  • A second language is any language that someone learns that is not their native laguage. If someone speaks spanish natively and learns english, french, and italian, then those last three are all second-languages.
  • Yes I am biased by using english as an example. English is my native language, so I will naturally choose that.
  • Saying that learning english "means NOTHING" is silly. Learning any language is beneficial.

I was just using english as an example anyways... calm down. --tomf688(talk) 03:10, Jun 19, 2005 (UTC)

    • Means nothing because: often people wont use it and forget it; and often people who learn it, will never speak it. And for many it will mean nothing in their lives. -Pedro 11:05, 19 Jun 2005 (UTC)
  • A better example why a LEARNING LANGUAGE is not the same as a SECOND LANGUAGE: Cape Verdeans have their native language, a creole, most speak Portuguese too (and speak it like natives, even if they arent), but many also speak french (teached as a foreign language), they learn it in school. Do you think that the status of Portuguese and French in Cape Verde is the same? No! Portuguese is used in school, in communication, in many fields but the creole is the "native language", Portuguese is the "second language" and French is the main "learning language".-Pedro 22:46, 19 Jun 2005 (UTC)

I agree, the title doesn't match the content. Boraczek 09:13, 20 Jun 2005 (UTC)

I agree with Pedro here. As another example - in India, many people (but a minority of the total population) learn English. In the Netherlands, a much larger percentage of the population learns English (I would guess). In neither case is it the people's first language. But to act as though learning English in the Netherlands is anything like as significant as learning it in India seems highly dubious to me. There is a difference between countries where the language of government is one which is not the native language of most people at the country; and the fact that people in a lot of, for instance, European countries learn English in school. It is comparing apples and oranges. john k 01:52, 21 Jun 2005 (UTC)

This is confusing!

Somebody needs to sort out through all this and NUMBER these, so confusion doesn't arise. there are messed up listings, with one language being 96 for instance while the one just below it being 91. It would help if these languages were numbered. I didn't know what to put Haitian Creole's ranking as in its info box, so I just took a guess. Revolución 04:22, 17 Jun 2005 (UTC)

I agree, a # column would help here Argyrios 01:32, 19 Jun 2005 (UTC)


Italian and dialects

According to this article, Italian speakers are 62.000.000. Then it is said Lombard speakers are 8.000.000, Venetian 2.000.000 and so on. I am wondering: do you consider the speakers of Lombard also in the Italian speakers amount or not? I mean I live in Lombardy, I speak both Italian and a dialect of Lombard (yes, cause there are many Lombard dialects, mutually intellegible at least) but I consider Italian my first language. Do you consider me twice?

--Suhardian 19:18, 19 Jun 2005 (UTC)

Should now be clearer. The figure includes 55 million first- and second-language speakers, out of an Italian population of 58 million, but Ethnologue says 'Possibly nearly half the population do not use Standard Italian as first language.' kwami 03:59, 2005 Jun 27 (UTC)

I don't know when Ethnologue was last updated, but the claim thay half the population do not use Standard Italian as a first language is outrageous. It may have been true in the thirthies, but it is definitely not true now.

I think that this page may be appreciated for the tentative. It is true that sources may vary. Concerning italian dialects and languages, one have to refer to the international standard classification ISO 639. This contains Sicilian language, Sardinian language, which are languages that are widely spoken in parts of Italy and with significant communities worldwide, but also Lombard language (ISO 639-3 code 'lmo). On the other hands, it is true that Lombardy have 8 million inhabitants, but this does not means that there are 8 millions speakers, and I am surprised if somebody claims it. Nevertheless, Lumbard is spoken by a significative community, and even if romantsch language, or Ladin are wellknown and wellcoded languages, with many similarities with Lombard, Lombard may be elegible as a language listed here. --Gmelfi July 21 2005.

What a mess

Oh, wonderful - now we've got everybody moving their favorite languages about at will. Until somebody finds an authoritative source, we should just leave the numbers as they were. And that means you, Punjabi. john k 01:48, 21 Jun 2005 (UTC)

javanese

More javanese speakers than indonesian and malay speakers. It can't be possible, javanese is not the official language. Also most of the people that talks javanese talks indonesian. I think they are bilingual so .....

What does the official language have to do with anything? At any rate, this is supposedly a list of first language speakers, in which case Javanese surely has more speakers than Indonesian.

john k 14:47, 25 Jun 2005 (UTC)

using consistant figures

Since everyone seems to be complaining about where the data are coming from, and most of the languages have Ethnologue (15th ed.) as a source, I'm extending Ethnologue data to the major languages as well. (The credited source for this page, SIL, are the publishers of Ethnologue.) Another source would be fine, but mixing up sources without documentation will continue to be a mess. Coupla points:

  • I broke up Punjabi, because Lahnda and Gurmukhi are closer to other languages on this list than they are to each other.
  • However, I did not attempt to break up Chinese or Arabic into individual languages. I also did not verify the speaker data for Chinese, as I cannot find an Ethologue total for all Sinitic languages.
  • I unified Hindustani and Malay.
  • However, I did not unify some of the smaller mutually intelligible dialects, like Czech/Slovak.
  • Swahili's now at the very bottom, with 0.7 million native speakers. However, I think this must be wrong: more likely it's 0.7 million ethnic Swahili, as others have certainly adopted the language, especially in the cities. The number of second language speakers is rural, so it may be an underestimate as well.

kwami 03:52, 2005 Jun 27 (UTC)

  • I think that abritary decision of unified and spliting is not very neutral.

A Portuguese can read Galician with some difficulty but can, but an Urdu can't read Hindi. Portuguese and Galician are splitted, Urdu and hindi united?!?!?! I think languages with massive writing differences and historical disconnection shoud be separated. And that includes Hindi/Urdu. Portuguese/Galician... and other similar, dispite their limited mutual inteligebility in the spoken or written form. I dont know what is the problem of Czech and Slovak... The problem of Chinese and Arabic is still in debate. These share a unified writing system, but they seem to be separated from the spoken language. Languages are a mess because of politics and academics with tendencies (political, lack of knowladge, prejudice). -Pedro 1 July 2005 16:02 (UTC)

If I started writing English in the Arabic alphabet due to my faith, and you couldn't read it, should I then claim that we speak different languages? Of course not. Likewise, Cherokee is written in both its own script and the Latin, but is a single language nonetheless. Hindi and Urdu are two national standards of what are essentially the same language. They use different scripts, and different technical vocabulary, but the day-to-day language is very similar. In India, illiterate people will claim to speak Hindi or Urdu based on their religion — that is, the difference has little to do with how they actually speak. (Of course, the literate are educated with different standards, but what they speak at home doesn't change much because of that.) From a linguistic standpoint, they speak the same language (if perhaps different registers or dialects) because they can understand each other without (much) problem. Writing is superficial. For most of us, most of the time, language is speech. That's its essence, not writing. Similarly, Taiwanese Mandarin and mainland Mandarin speakers cannot read each other's texts, because of the simplification of the writing system on the mainland. Should Mandarin then be considered two separate languages?
(BTW, Mandarin speakers cannot read Cantonese, or can read it only with difficulty. The Chinese languages do not share a unified writing system; rather, the Chinese people all read and write Mandarin, with a few exceptions such as Hong Kong comic books and American Chinatown newspapers. Claiming this makes them a single language is a bit like claiming Middle Korean is a Chinese dialect because in the 14th century, the Korean people read and wrote in Chinese.)
I agree that the data are inconsistant. Ethnologue is inconsistant. What we really need to do is to test all of the world's major languages. Any people that can communicate with each other at, say, FSI level 3 or above would be counted as speaking the same language, whatever the politicians say. Any "language" where people cannot communicate at that level would be split up. This has been done for some languages, such as the Mayan languages of Guatemala. However, I have never heard of anyone doing the research and applying a uniform standard to the whole planet. Until someone does, we're stuck with making educated guesses and fudging things in an attempt to be consistant. Maybe Portuguese and Galician should be unified. I considered that, but I know little Portugese and no Galician, so I left it alone. Likewise, perhaps several of the Scandinavian languages should be unified. Arabic should be split up, but I am not the one to decide how. It would be nice to use a common definition as to what is a 'language' for all entries on the list. The changes I made are a start in that direction, but there's still a lot to do. kwami 2005 July 5 07:45 (UTC)

Wait, you unified Hindustani and Malay?? Huh?? john k 5 July 2005 02:54 (UTC)

From personal experience, it's as easy or easier to switch from Malaysian to Indonesian Malay than from American to British English, so yes, Malay's clearly a single language (or rather, the two official standards are, whatever the status of their many regional forms). As for Hindustani, as above. kwami

Oh...you mean you joined together Indonesian and Malay, and also joined together Hindi and Urdu. I thought you were saying you joined Hindustani with Malay, which you must admit would be crazy. Were you the one who joined Czech and Slovak, as well? In terms of Arabic, I'd strongly advise against splitting it up. john k 5 July 2005 18:05 (UTC)

I believe I did. I haven't gotten to most of the smaller languages, so it's still a bit of a mess.
To be consistant, Arabic should be split up. Either that or merge Spanish, Catalan, Portuguese, Galician, and Italian into a single language, and do with same with Russian, Ukrainian, Polish, Czechoslovak, Serbocroatian, and Bulgarian. Many forms of Arabic do not have functional intelligibility with each other, which the Slavic languages almost do. Perhaps we could also list Standard Arabic, which has a good 100 million speakers (or more) even though it has no native speakers. But, like I said, I'm not the one to do this, at least not without a better reference than Ethnologue.
I imagine that a primary use of this list is to answer the question, 'If I learn language X, how many people will I be able to communicate with?' We imply that learning 'Arabic' will allow you to communicate with 450 million, which is an exageration. kwami 2005 July 5 18:44 (UTC)

A couple of points: 1) written languages - Arabic forms all share a common written language - the colloquial forms do not have their own standardized written forms. This is quite in contrast to the Romance and Slavic languages. It is more like the case with Chinese (which is, admittedly, divided up now). 2) Political facts - the basic fact is that there is no simply way to define what a language is. In some cases, political distinctions are silly. Whatever the Moldovan government may call it, the language that they speak is virtually identical with Romanian. But it gets a lot more difficult as you go towards finer distinctions. The Scandinavian languages are all, in my understanding, mutually intelligible. But they have long separate histories, and are considered to be separate languages. I don't see what is to be gained from merging them. The basic fact is that we simply cannot be consistent based on linguistic criteria, because political criteria are nearly important. Obviously, Swedish and Danish are a lot closer to each other, and more mutually comprehensible, than Moroccan Arabic and Iraqi Arabic are to one another. But politically, Danes and Swedes insist that their languages are separate, while Moroccans and Iraqis insist that their languages are the same. That has to count for something. john k 5 July 2005 20:01 (UTC)

There's a very simple way to define what a language is: put two people in a room, and see if they can communicate effectively, the same way you would test if someone is a fluent non-native speaker. Moldovan is, of course, Romanian. As for Scandinavian, not all dialects are functionally intelligible (to the point of some people being clueless as to what others are saying), but the distinctions don't follow national boundaries very well. We have two definitions of language here: social/political and linguistic. Both are important, of course, but neither is absolute: I know Moroccans who say they speak a different language than Standard Arabic, and certainly different from Iraqi, and Danes who say their language is essentially the same as Norwegian (or some dialects of it, at least).

Hacing looked into Arabic, I will note, for reference, that these are the divisions of Arabic with more than one million speakers given by Ethnologue:

  • Egyptian Spoken Arabic – 44,406,000
  • Algerian Spoken Arabic – 21,097,000
  • Moroccan Spoken Arabic – 19,480,600
  • Sudanese Spoken Arabic – 18,986,000
  • Sa’idi Spoken Arabic – 18,900,000
  • Mesopotamian Spoken Arabic – 15,100,000
  • North Levantine Spoken Arabic – 14,309,537
  • Najdi Spoken Arabic – 9,863,520
  • Tunisian Spoken Arabic – 9,247,800
  • Sanaani Spoken Arabic – 7,600,000
  • Ta’izzi-Adeni Spoken Arabic – 6,869,000
  • North Mesopotamian Spoken Arabic – 6,300,000
  • South Levantine Spoken Arabic – 6,145,000
  • Hijazi Spoken Arabic – 6,000,000
  • Libyan Spoken Arabic – 4,505,000
  • Hassaniyya – 2,787,625
  • Gulf Spoken Arabic – 2,338,600
  • Eastern Egyptian Bedawi Spoken Arabic – 1,610,000

(john k 5 July 2005 20:52 (UTC))

  • I agree with you John. Although the behaviour of these governments is rather political than linguistical.

Case of the Romance languages: Portuguese, Spanish and Italian are not that similar as it may look sometimes. These languages diverge in their base. But share the same past and most lexicon (the middle ages did a mess to the uniformity of these language - and also French - it is not that divergent from the other as it may look - Romanian is the most different- I don't understand almost nothing of it - I have no difficulties in understand most Romance languages.), that's why it is not hard for a speaker of one language to learn the other very fastly. And, a normal person can understand something of the other language (but not enough to make an medium dialog). the relation between Portuguese and Galician is different, these have the same base, but sociolinguistically these form two different languages, they have different and independent ortographies, the government say the languages are different.

examples:
English "The house and the cat"
spanish "la casa y el gato"
Galician "a casa e o gato"
Portuguese "a casa e o gato"
English "The image of the house"
Spanish "La imagen de la casa"
Galician "A imaxe da casa"
Portuguese "A imagem da casa"

Portuguese and Galician diverge in termination, plus Galician retains old Portuguese characteristics, such as:

EN: two - thing - all the (fem.)
GL: dous - cousa - todalas
PT: dois - coisa - todas as

Portuguese and Galician also diverge in other terminations such as: PT: -ão GL: -ón PT: -vel GL: ble, etc. Although the Galician pronunciation is still found in Northern Portugal, where it is seen as part of the local dialects. Galician dialects are very influenced by Spanish and it seems like portuñol, while other dialects like those of rural people (especially from the border), and those of Galician fishermen (even in places as far as A Coruña) which are remakably familiar (I'm talking about accents now). There are also large ortographical differences: the Portuguese "J" and "G" (i.e. ge, gi) is the Galician "X", it isnt a significant difference in the speech but written it is a big difference. other differences are the Galician "ñ" and "ll" and Portuguese "nh" and "lh" - just ortographical differences without differences in the speech. Both populations also tend to see both dialects as different languages. So as you can see, the case is complex. And I've no prespectives on fusion, most Galicians aren't like Catalans or the Basque people in the protection of their culture. Plus, the language is more similar to Spanish than Catalan. -Pedro 5 July 2005 23:24 (UTC)

I advise keeping Arabic together, if only out of practicality. Arabic is a dialect continuum to an extent that Chinese simply isn't, and there really is no well-defined way to determine dialect boundaries (let alone count the speakers of each dialect); the Ethnologue's splits massively exceed anything that mutual intelligibility could justify (Algerian and Moroccan and Tunisian, for example, are totally mutually intelligible) and mutual intelligibility is itself currently in flux, as a result of the spread of pan-Arab mass media in recent decades. - Mustafaa 7 July 2005 22:09 (UTC)

What happened to Esperanto?

On the "Esperanto language" part of this website, it clearly says that there are about 1.6 million speakers. So, why isn't it on the list?

Esperanto has only a thousand or so native speakers, so it's well below the cutoff by even the most optimistic accounts. (I have a suspicion that Swahili's above a million, and that the Ethnologue figures are wrong, but that needs confirmation. Meanwhile Swahili has been left on the list.)
That's assuming this list is based on the number of native speakers. If it's based on total speakers, then by all means let's add Esperanto. kwami 2005 July 5 07:45 (UTC)
Is there a page here containing lists language in general? Some languages were not ment to be spoken as a first language...
There are a couple lists, which aren't very well distinguished. There's a List of the most spoken native languages, and this list, which is for 'total speakers', by which was meant to be 'total number of native speakers'. Not sure how that's supposed to be different from the previous one. Either they should be merged, or they should cover separate topics. (Personally, I think they should be merged regardless.) I've started adding second language speakers (even if I haven't gotten to most) because that fits the title. If you want to turn this list into a true List of languages by total speakers, and include every language spoken by more than 1M, go ahead, though it might be polite to discuss it here first. However, we should add more than just Esperanto, and I don't think I'm up to the task! kwami 2005 July 7 19:01 (UTC)

More Dutch speakers

I would say about 23 million people speak Dutch rahter than 20 million:

  • Netherlands 16 million
  • Belgium 6 million
  • Suriname, Antilles, other communities 1 million
  • total 23 million

[anonymous] I don't know where you get those numbers, where comes the 1 million number? population:

  • Antilles 212,226
  • Aruba 103,000
  • Suriname 438,144

the population of these three regions doesnt reach one million, and Dutch is definetly not a national language in these places. Plus, the Netherlands and Belgium have a lot of immigrants. So I belive more in the 20 million figure, rather than the 23 million one.-Pedro 5 July 2005 11:27 (UTC)

There are for example probably about 12 million native speakers in the Netherlands, once you sutract immigrants, Frisian, Limburgish, and other Germanic language speakers. Suriname definitely isn't natively Dutch speaking; some of the languages there are closer to English (if you can even recognize gbe for 'leave' as derived from English!). So 20 million may be a rather generous number. kwami 2005 July 5 17:50 (UTC)

Germany is not listed...

Am I the only one who noticed Germany's absence? It's definitely in the top 100...

Standard German is currently at #10. kwami 2005 July 5 07:45 (UTC)
  • Why standard German? The Language is known as German! -Pedro 5 July 2005 11:20 (UTC)
Because it does not include many lects that are also known as German: various forms of Saxon, Swiss, etc. "German", as the word is commonly used in English, isn't really a language. I put a note in with the numbers instead, parallel to other languages. kwami 2005 July 5 17:53 (UTC)

Persian is listed incorrectly

The total # for Persian speakers is very incorrect. No speakers of Persian as a second language in Iran are listed (giving a false total number), and may I remind all that Dari (Afghanistan's official language) is in fact Persian. It's not even a dialect of Persian. It is Persian. It is also the 2nd langaueg in Tajikestan and some other areas in Central Asia.--Zereshk 6 July 2005 10:41 (UTC)

  • It seems a dialect of Persian:

The syntax of Afghanistan's Persian does not differ greatly from Iran's Persian (locally called Farsi), but the stress accent is less prominent in Afghanistan's Persian than in Iran's Persian. To mark attribution, spoken Afghan Persian uses the suffix -ra. The vowel system also differs from that of Iranian Persian, to some degree. I guess you are using the term dialect to refer to a similar language without a writing system.-Pedro 6 July 2005 11:37 (UTC)

There is far far more difference between the English spoken in Australia, the UK, and The US, than the difference between Persian in Afghanistan and Persian in Iran. There is far more difference between the Persian spoken in Tehran and that of Yazd, than Iran and Afghanistan.
Either you have to break up Persian into its various dialects that vary from town to town in Iran (and therefore not list Persian at all on the list here), or list it correctly.
I vote for the latter. Dari is Farsi (Persian). And I have been schooled in Persian as my mother tongue.--Zereshk 6 July 2005 14:27 (UTC)
How different is Tajik? I agree that both Afghan and Iranian Persian should be listed together. john k 6 July 2005 15:31 (UTC)
As literary standards, Farsi & Dari may be identical, but as spoken languages, they differ. However, as far as I know, there is no interuption in intelligibility between local varieties of "Farsi", "Dari", and "Tajiki", and the real distinction is political. The question is whether the geographic extremes can understand each other - maybe a bit like Italian? Also, if we're going on Ethnologue numbers, make sure Hazaragi is included.
Few of the languages listed include numbers of non-native speakers, so it's not just Persian. kwami 2005 July 6 17:53 (UTC)
  1. Im not sure about the current status of the Tajik since Tajikestan was heavily Russified by The USSR. But I am pretty sure about Afghanistan.
  2. Almost every language listed above Persian gives the total figure by adding the 2nd langauge speakers to the native ones. If we do that, we can get the following rough number for Persian language speakers (Native and Non Native), according to the CIA Factbook, by the following formula, as of 2005, not including Tajikestan:

Total = [N + NN in Iran] + [50% of Afghanistan] + [10% of the UAE] + [a conservative estimate for Iranian expatriates in Canada, the US, and Europe]

= 68,017,860 + 0.5(29,928,987) + 0.1(2,563,212 ) + 1,000,000

= 84,238,675

Note: Add to this figure 7,163,506 if Tajikestan is decided to be added. See Tajik language for aiding your decision. Also add same minority from Uzbekistan.--Zereshk 6 July 2005 22:25 (UTC)

  • my POV: Honestely, I'm shocked to see that there are so many problems around with several languages, why doesnt the UN or UNESCO creates something like "The human languages rights", to pretect the unity of languages against political intervenction. A language porpouse is just for comunication not to be used as a weapon or for nationalistic bias. Just my 2 cents. -Pedro 7 July 2005 22:58 (UTC)

Final formula for Persian re-estimate

This new estimate will add the first language and second language speakers of Persian, similar to other languages on the table. Estimates are according to 2005 CIA population figures:

34,689,168 + 33,328,751 (1st + 2nd language speakers in Iran according to CIA)
+ 14,964,494 (the 50% of Afghans that speak Dari according to CIA as main language)
+ 3,437,000 (Total estimate of Diaspora in US, Turkey, UAE, Iraq, Germany, UK, Canada, France, India, Australia, Syria, Russia)[2]
-----------
Native Total = 53,090,614
Non Native Total = 33,328,751
Grand Total = 86,419,365

Note:

  1. The Tajik have not been included in this estimate. I dont know whether to include them or not. Their language is Persian, but their script has been cyrillicized I understand.
  2. The diaspora figure, although based on not the best of sources, is a good estimate. Ive seen other estimates of the Iranian diaspora. They are in the same exact range. In fact the Baptist estimate is conservative, since it doesnt mention Iranians living in Israel (both the President and Defense Min of Israel are Iranian)
  3. I vote that this new total figure be inserted into the list. --Zereshk 7 July 2005 21:59 (UTC)
I believe the 50% figure for Afghanistan is not the number of native speakers. Afghanistan is about 50% Pashtun, and there are other ethnicities as well. Persian is (if I remember correctly) somewhere around 30%. 'Main language' includes lots of people who speak some other language locally or at home, but use Persian when communicating outside of their group. kwami 2005 July 7 22:17 (UTC)

I only quoted the CIA website directly where it says of Afghanistan: Language: Afghan Persian or Dari (official) 50%, Pashtu (official) 35%, Turkic languages (primarily Uzbek and Turkmen) 11%, 30 minor languages (primarily Balochi and Pashai) 4%. And besides, even the Pashtun are Iranian as well, and their language a Persian dialect.--Zereshk 7 July 2005 22:30 (UTC)

"Iranian" does not mean "Persian". Honestly, that was the main reason for changing the name of Persia to Iran! Saying Pashtun is a dialect of Persian because they're both Iranian is like saying English is a dialect of German because they're both Germanic.
I'm not so sure the CIA is any better as a source than Ethnologue. The Wikipedia article for Afghanistan lists Pashtun 42%, Tajik (Persian) 27%, Hazara (Persian) 9%, Uzbek 9%, Aimak (Persian) 4%, Turkmen 3%, Baloch 2%, other 4% (these are also CIA figures, actually). That puts Persian speakers at 40%, assuming that language follows ethnicity. If Pashtun is spoken by only 35%, then ~20% of Pashtuns have given up their language, which I'd like to see stated explicitly in a reliable source before accepting. In the Afghan demongraphy article, it states that "Dari is spoken by more than one-third of the population as a first language and serves as a lingua franca for most Afghans", consistant with the ethnic figures, but elsewhere says that it's the language of 50% (also a CIA figure).
Native: 51% of Iran (68M), 40% of Afghanistan (30M), 80% of Tajikistan (7.2M), 5% of Uzbekistan (27M), = 53.8M.
Your source lists 350k in the UAE, but Ethnologue has only 80k specifically Persian speakers. Perhaps your source is counting all Iranians? Likewise, 500k instead of 800k for Turkey, 230k for Iraq (equivalent), 90k Germany (equivalent), 900k USA (rather than 1.5M). Also about a million Dari in Pakistan weren't counted, and 40k Tajik in Russia (not significant). This gives a total native speaking population of ~57M, somewhat above your estimate. As for non-native speakers, we can assume that most of the remaining population of Iran, Afghanistan, and Tajikistan speak Persian fluently: That's about 44M, assuming half of non-Persian Afghans are fluent, which is just a guess on my part (I've heard that many Pashtun do not speak Dari well, especially women). So the total would be somewhere around 100M speakers. Of course, this is a rough estimate, and will vary significantly depending on how many Afghans speak Dari fluently. If all did, we'd have just a tad under 110M, which therefore is the upper limit.
Oops, I got signed out again, and only my IP address appeared as a signature. Do the rest of you have that problem, of only being able to sign in for a few minutes at a time?
Anyway, I believe that a significant number of that 1M refugees in Pakistan have now gone home, or am I wrong? The ethnic total was under 57M anyway, so given this uncertainty, maybe we should count it as 56M? —kwami 2005 July 8 03:31 (UTC)

Final formula for Persian re-estimate -- Part 2

  1. I think it is better to avoid counting the Afghan refugees in Pakistan (1M) and Iran (2M). Sooner or later they will go back, and the figures will be revised accordingly.
  2. Ethnologue's estimates of Iranians in the UAE is outdated. There has been a masive flux of Iranians migrating to the UAE in the past 10 years. Even the CIA puts the Iranian population currently at roughly 250,000 in Dubai, Sharjah, and Abu Dhabi. I think the 350,000 is therefore well accurate.
  3. However, Ethnologue does mention Oman and Qatar, which I didnt count.
  4. I think that Kwami is right. We should consider putting 40% of Afghanistan's population as native speakers. However I do think we should count the rest of Afghanistan as 2nd language speakers, since it is the official language of the country anyway.

Hence the revised estimate (according to the combined sources of Ethnologue, CIA, and [3] ) is:

34,689,168 + 33,328,751 (1st + 2nd language speakers in Iran according to CIA)
11,971,595 + 17,957,393 (40% Afghans that speak Dari as main language + rest that speak it as second language)
1,000,000 in Pakistan (according to Ethnologue)
3,535,000 (Total estimate of Iranian diaspora in US, Turkey, UAE, Iraq, Germany, UK, Canada, France, India, Australia, Syria, Russia, Oman, Qatar, Kuwait, Israel)
1,181,452 (Tajiks in Uzbekistan)
5,723,641 + 1,439,865 (1st + 2nd language speakers in Tajikestan according to CIA)
-----------
Native Total = 58,100,856
Non Native Total = 52,726,009
Grand Total = 110,826,865

Should we go ahead with this number?--Zereshk 8 July 2005 09:30 (UTC)

This seems like an acceptable number to me. SouthernComfort 8 July 2005 10:52 (UTC)
I thought we were going to avoid the 1M refugees in Pakistan? That's what the number in Ethnologue is from. (Was the population estimate for Afghanistan revised downward to reflect the refugees, or are they being counted twice?) Also, we should round off to the nearest million. Our estimates differ by tens of millions, so indicating thousands is silly.
Also, just because Dari is official in Afghanistan, that doesn't mean everyone speaks it. Pashtun is also official. Probably 99% of the population speaks one of those two languages, but I'd be more comfortable if we had some evidence for the actual number of Dari speakers, rather than just assuming it's universal. Rather like French in Canada: we wouldn't want to count the entire country as French speakers just because it's official. (As far as I know, even reading common knowledge stuff like Nat Geo, there are large segments of the Pashtun population that don't speak Dari.) kwami 2005 July 9 18:17 (UTC)

If Ethnologue's 1 Million Pakis are the refugees (which I thought they werent), then they should be taken off, which they have, as I can see. However Dari is more official than Pashtu. This is according to the Afghanistan page, the CIA, and the fact that the official language in Kabul and the govt of Karzai (a Pashtu himself) is Dari. --Zereshk 00:53, 10 July 2005 (UTC)[reply]

I can find nothing in either of those sources supporting your claim that "Dari is more official than Pashtu". They are co-official, as near as I can tell. From what I understand, Dari is more widely used as a second language than Pashtun is. However, in the Wikipedia Dari (Afghanistan) article, it says that 60% of Afghans speak Dari, 12M less than we have now, and the total for Persian would be just under 100M. I don't know the source for the Dari article's data, but we shouldn't claim that all Afghans speak it without evidence. kwami 01:32, 2005 July 10 (UTC)
National Geographic 2004 [4] breaks down the ethnic composition as "Pashtun (38 percent), Tajik (25 percent), Hazara (19 percent), and Uzbek (6 percent)". Another 4% Aimak (itself not a very reliable figure) would indeed put the native Persian-speaking total at almost half: 48%. So our figures range from "over a third" to "50%" and everything in between, and we have no data for the number of non-native speakers. (It's an assumption that all Iranians and Tajiks speak Persian, but not an unreasonable one. It is unreasonable in the case of Afghanistan.) We really need some reliable data, if it exists. kwami 01:53, 2005 July 10 (UTC)
Here's a 2001 source from Human Rights Watch that seconds NatGeo's figures: [5]. Another, here, has 46-48% (with the number of Aimaq uncertain). However, these figures are calculated from a 1995 census estimate of only 17 million! By their own admission, that's way off, and makes me doubt the NatGeo figures. kwami
  1. The Afghanistan embassy in Canada names Dari as the "lingua franca" [6]
  2. I still think the CIA is more accurate and up to date than any other available source. It is credible enough (see their contributors). What is interesting is that it lists Persian native speakers at 58% of the population in Iran. That would add another 5.4 million to the current 57 million. And they do list 50% for Dari speakers. Therefore, it seems the 57 million is quite an underestimate.--Zereshk 23:12, 10 July 2005 (UTC)[reply]
First, "lingua franca" doesn't mean everyone speaks it. I've been in plenty of countries where I've gone to the market, and the market women don't speak the national lingua franca. Second, the CIA isn't a very credible source. Did you know that the UK only became an independent country in 1809? That is, according the the CIA factsheet, before a British journalist pointed out their error to them. It appears that the 50% figures are from ten years ago, when Afghanistan was reported to have only half the population it does now. Many of the missing were presumably refugees, many of them Pashtun. Many of the rest were simply not counted, and who knows who they were. All we can say is that somewhere between 35% and 50% of the population speaks Dari natively, and a large but unknown number speak it as a second language. kwami 23:28, 2005 July 10 (UTC)
  • Still, they dont list Pastho as the Lingua Franca. And why do I see Dari on the front pages of Afghan official websites instead of Pasthu?
  • And u still have to account for the deficient 5.4M which the CIA lists for 2005, which is quite plausible.--Zereshk 23:44, 10 July 2005 (UTC)[reply]

Czech and Slovak languages

Since there's a request to point out innacuracies, I'd like to point out that Czech and Slovak are separate languages, and should not be listed together. While similar, neither one of them qualifies as a dialect of the other, and they have evolved independently (see Slovak language#Relationships to other languages for a discussion on the topic). There is certainly more arguments for listing them separately than there are for listing Bulgarian and Macedonian separately, and unlike that example, both Czechs and Slovaks will agree that their languages are separate. Since I'm not familiar with the souces used for the article, I'd like to request that someone with knowledge on where to find reliable data separate the two. Thanks. --Aramգուտանգ 8 July 2005 04:47 (UTC)

Are we going to declare that every dialect with an army is a separate language? When people learn a language, they're interested in where it's spoken, and who they can speak to with it. I don't speak much, but I picked up a little Slovak while in that country. When I went to Czechia, all of a sudden I was speaking Czech. The words coming out of my mouth hadn't changed, only the country I was saying them in had. Sure, there are differences in standardization, but they are clearly two standardized dialects of the same language. Your arguments for separating them (neither a dialect of the other, and evolving independently) could equally be made for separating Bostonian and Brooklynese. As for people's conception, that's a social distinction, not a linguistic one (at least not in the narrow sense of the word). Separate standards would argue for Canadian and Usonian being listed separately. As for the statement "both Czechs and Slovaks will agree that their languages are separate", it's false. True, some people do hold that opinion. I'm sure you could dig up people who'd say the same about US and UK English. But I've also met Slovaks and Czechs who state flatly that they're dialects (usually, of course, that the other is "just a dialect" of their language!) This certainly deserves mention, as any difference in national or regional standardization does, but as a note to a single Czechoslovak language entry. kwami 2005 July 8 06:30 (UTC)
I have lived in the Czech Republic for 8 years, and I am a fluent speaker of Czech, yet I would never claim to be able to speak Slovak. The ability to understand a language by a speaker of another language does not make one a dialect of the other. True, the languages are mutually intelligible to the extent that on Czech TV news reports from Slovakia are in Slovak, and any speaker of Czech, including myself, can understand them, but this does not imply they're the same. Especially in the case of Slavic languages which are very similar in both vocabulary and grammar, differentiating languages is often based on subtle differences. Such differences, however, are significantly strengthened when the speakers of one language claim a separate cultural identity from speakers of the other, as is the case with Czech and Slovak. As an example, consider Eastern Armenian and Western Armenian. I am a native speaker of Eastern Armenian, yet I have incredible difficulty understanding a person speaking in Western Armenian, more so than I do in understanding Slovak. Yet, E. & W. Armenian are considered to be dialects, not separate languages, perhaps in part due to the fact that the groups of speakers both claim the same cultural identity. Of course, there's a flip side to this argument, as can be illustrated by the Moldavian language, but in the case of Czech and Slovak, it's quite clear that they are generally recognised as separate languages. Comparing the differences between them to differences between UK and US English is especially out of place. --Aramգուտանգ 8 July 2005 09:57 (UTC)
You've just demonstrated, very nicely, that Czech and Slovak are dialects of the same language. The description you've given is almost the definition of what dialects are: Not the same, but readily intelligible. If common understanding differs from objective reality, then that warrants a footnote, but not division of the language. Cultural identity does not define a language, mutual intelligibility does. As for Armenian, many people say that Armenian is two separate languages, for exactly the reasons you give. Ethnologue does not reflect that, but again, Ethnologue is rather sloppy. kwami

This is an argument that's come up repeatedly, and will continue to come up. We have two lists of languages by native speakers. Should we make one a list of languages by cultural identification, separating Czechoslovak and unifying Chinese, and the other a list of languages by the criterion of mutual intelligibility? That should make everyone happy. kwami 2005 July 9 17:51 (UTC)

Kwami, intelligibility just can't be the definition of a language. As our Romance languages article notes, standard Italian and Castilian Spanish are mutually intelligible. But Sicilian and Piemontese are not...yet Italian and Spanish are universally considered "separate languages," while Sicilian and Piemontese are almost as universally considered "dialects of Italian," (although this is perhaps inaccurate of the north-Italian dialects, which are more closely related to the Langue d'Oc than to standard Italian, supposedly). I think that the distinction has more to do with standardization and institutionalization. "A language is a dialect with an army, a navy, and a police force," as my advisor likes to quote some linguist he talked to as saying. One might add hat a language has television broadcasts, a literary tradition, education taught in schools, and so forth. john k 9 July 2005 19:08 (UTC)
You'll notice that several Italian "dialects" are listed as separate languages. As for Italian and Spanish being mutually intelligible, they are, to a degree. But which degree of intelligibility warrants classifying two lects as dialects of the same language? Turkish and Azeri are listed separately, and have a large degree of intelligibility. All I ask is that we use a reasonably consistant criterion.
True, intelligibility isn't the only definition of what constitutes a languages, but it is one of them. But by the criterion of self identification, Zhuang is a Chinese dialect, even though it's a member of the Kadai language family. If we separate Czech and Slovak because some native speakers (hardly all) consider them to be separate languages, should we also include Zhuang in Chinese, because almost all of its speakers consider it to be a dialect of Chinese? Should we then say that Chinese belongs to both the Sino-Tibetan and Kadai language families?
By your definition, most African and American languages are no longer languages, because they aren't taught in schools and aren't used for journalism or TV broadcasts. This ties into the 19th century idea that civilized people speak "languages", and savages speak "dialects". kwami 20:02, 2005 July 9 (UTC)

This is a whole new low with regard to sociolinguistics on Wikipedia - that Czech and Slovak are not merely grouped together, but wholly equated. While trivially useful for some readers to have some sort of an overview of which foreign languages can be grouped together, and (I presume) a fun exercise for linguists, it's also out of touch with reality because it blatantly ignores the behaviour and thoughts of the people speaking those dialects. You simply cannot claim the high ground "genetically they're the same, so there!" and expect for people to just accept it. --Joy [shallot] 9 July 2005 19:15 (UTC)

Then disambiguate them as you did for Serbocroatian. You can always improve an article by editing it yourself! (Actually, they're not wholly equated. They're linked to two separate articles.)
Since we're ranking languages by number of speakers, dividing them up into their constituent dialects makes them look less important than they are. kwami

We have two contrary tendencies here: distinguishing languages genealogically, and distinguishing them culturally. This is going to continue to create conflicts, until we decide on one or the other - or create two lists. When I first saw this article, Chinese was listed as a single language, but there were half a dozen Italian "dialects" listed as separate languages. That's just silly: Italian has about the diversity of Cantonese. We should go one way or the other. I've tried to make the list somewhat more consistant, but of course haven't been able to do everything. If you don't like the direction I've gone, fine: Do something better. But let's at least make it internally consistant. kwami 20:02, 2005 July 9 (UTC)

A couple of points: 1) the Italian dialects are, according to Ethnologue, not even all that closely related to each other. Calling Sardinian an Italian dialect would appear to be technically inaccurate. And, according to Ethnologue, Piemontese, Lombardese, and so forth are closer to French and the Langue d'Oc than they are to Italian. As far as general standards, I'd suggest that the presumption should be that languages listed as separate languages by Ethnologue should be treated as separate languages. However, if languages listed by ethnologue as separate languages are often considered to be the same language, especially for political reasons, we should unify them. Thus, Eastern and Western Farsi, or Gheg and Tosk Albanian, should get unified despite being separate languages on Ethnologue. Probably this goes for Arabic as well, if only because differentiating the dialects is so difficult. This should perhaps also be done for some of the Hindi dialects like Awadhi or Haryanvi (but probably not Punjabi or the Bihar dialects). I'm not sure what should be done about Hindi and Urdu, but they should probably be separated out again, as well. Czech and Slovak should definitely be separated, because the languages are listed as separate on Ethnologue, and are not normally considered to be actually the same language. john k 20:16, 9 July 2005 (UTC)[reply]

Sardinian is pretty universally considered a separate language, so it's not really relevant to the discussion of Italian dialects. But you're trying to have it both ways. Why should we separate Italian dialects to be "technically accurate", if we unify other lects that are generally considered to be the same language, which Italian is? I believe Panjabi is now unified, despite the fact that Ethnologue classifies W. Panjabi as closer to Sindhi, and E. Panjabi as closer to Hindi, than they are to each other. This was because of vociferous opposition to dividing it up. If we're going to include lects within Hindi that are more divergent than Urdu is, but separate out Urdu, then we're going on sociological criteria alone. If that's the case, we shouldn't follow Ethnologue. Shouldn't we also list "Bosnian" as a separate language and unify all of Chinese?
I think we need to cover our bases. Either (1) have two lists; (2) follow self identification, but note that (a) Bosian is essentially the same language as Serbian and Croatian and Czech and Slovak are essentially the same language, and (b) there are significant differences among Chinese, Arabic, Italian, German, Igbo, and Armenian "dialects"; or (3) follow mutual intelligibility, but note that Serbian/Bosnian/Croatian, Czech/Slovak, Hindi/Urdu, and Mandarin/Cantonese are separate standards and often considered separate languages. I just don't think we should follow one criterion for some languages and a different one for others. kwami
I see you've gone ahead and done it. When I get the chance, I'll unify Chinese, Hindi (all Indic dialect continuum lects not given official status in Indian constitution --> Hindi), and Italian, and separate Serbocroatian and Malay. (Should Malay be three languages, Malaysian, Indonesian, and Malay, the latter for Brunei and Singapore?)
Also unified Fulani, Quechua, etc. Sorry the edit was anonymous, Wikipedia signed me out in the middle of it! kwami 01:00, 2005 July 10 (UTC)
A couple of points: 1) For Punjabi, other sources (e.g. Britannica) give a completely different (and really more sensible) division of the Indo-Aryan languages, which only has one group for Punjabi. 2) In terms of the other dialects, I'm not sure there's reason to remove them. I think it would make sense for Hindi, Italian, German, and so forth to give the number of total speakers of the broader language, but to also list the major subsidiary languages separately as well. We should just note when we are doing this. We could do the same thing for Chinese, I think, although Mandarin is such a distinct dialect/language that we should perhaps note both the total number of Chinese speakers and the total number of Mandarin speakers. I certainly don't think we should remove all the entries for the southern Chinese languages/dialects. In terms of separating languages, I'm not sure either Serbo-Croatian or Malay should be separated out. For Serbo-Croatian, maybe, since there are different writing systems involved (which alphabet do Bosnian Muslims use, by the way?) And what's the deal with "Malaysian" as a language? I've never heard this claim. I've always heard the language of Malaysia called "Malay." At any rate, there's no reason to be dogmatic about this. I think the best thing to do is be flexible. As I said before, use Ethnologue as a baseline for when separate languages exist (e.g. don't separate out Moldovan), but unify in instances where the Ethnologue divisions are contrary to generally accepted usage, like East and West Farsi. I think we should be especially careful about removing information - does any good really come from removing the different Chinese dialects? john k 02:35, 10 July 2005 (UTC)[reply]
Why not try for some semblance of consistancy? If Czech and Slovak are separate languages, fine: Your criterion is speaker identification. Then all of Chinese is a single language. If you want to split up Chinese, fine: Your criterion is mutual intelligibility. Then Czechoslovak is a single language. As for Malay, no, in Malaysia it's called Bahasa Malaysia (Malaysian language). The language is called Malay (Bahasa Melayu) in Brunei. Both Malaysians and Indonesians tend to get rather indignant when told that they're the same language. Most people I've met insist that they're separate, despite the fact that I'm speaking "Indonesian" on one side of the border, "Malaysian" on the other side, and the words out of my mouth haven't changed (except perhaps for a couple, like 'shan't' vs. 'won't' in English). Arguing for half of one and half of the other is just being wishy-washy.
If you want this article to be taken seriously, then apply the same criterion to all languages in a rational manner. If you want to split Czechoslovak and unify Arabic, then split Malay and and Serbocroatian, and unify Chinese. What does the writing system have to do with anything, except people's conception of their language, which you say should form the basis of classification? You say you don't want to remove information, but you don't hesitate to remove the fact that Czechoslovak and Hindustani, as spoken, are essentially single languages. That's also information. I'll let the revert go for a while, but eventually we need to decide what our working definition of "language" is. I tried mutual intelligibility, and was reverted because that's 'not what language is'. I've now tried speaker identification, and have been reverted because that's 'not what language is'. Maybe there's a third way someone wants to work out (having both in one list, like you suggested above, so languages are counted double; or having two separate lists [which is what we had prior to this revert]); I'm not going to bother myself. However, I do expect some sort of a half-way intelligent standard. The current mish-mash is not acceptable in an encyclopedia. kwami 04:35, 2005 July 10 (UTC)
Yes, consistency is exactly what my initial request entailed. If Czech/Slovak are together, then so should Russian/Ukrainian, which are mutually intelligible to a similar extent. Yet grouping the latter pair would not be acceptable to most people, thus showing that intelligibility is not a good criterion. If words thou placest archaic in manner queer within sentence, stayest the union not understood? Perhaps the ability of the speaker of a language to speak the other is a better criterion. Cultural identity is yet another, however with more obvious pitfalls. You must also keep in mind what kind of information people are looking for when they go to a page called "list of languages by total speakers". It seems that the highly ambigous definition of "what is generally accepted to be a separate language" is the most preferable one, which is a kind of amalgam of the above. No matter what the criterion is, however, consistency is crucial. --Aramգուտանգ 08:05, 10 July 2005 (UTC)[reply]
Yes, Aram, exactly. Intelligibility alone is clearly not a good criterion. By that standard, By that standard, as noted before, Italian and Spanish would have to be combined. Also all the Scandinavian languages. All the East Slavic languages. And so on. Cultural identity alone, though, is just as bad. The southern Chinese dialects are clearly defined and quite distinct from Mandarin. We need to use a combination of all these things and use the basic idea of what is generally considered a language. I do think that in some instances counting double would be appropriate - we list "Chinese" for all the Chinese languages, then list the various dialects separately; we list "Hindustani" for Hindi, Urdu, and all the closely related dialects (Awadhi, Haryanvi, the Bihari dialects, and so forth), and then list them all separately as well. Have "Malay" give the total for Malay+Indonesian, but then give Indonesian a separate entry. That kind of thing... john k 16:11, 10 July 2005 (UTC)[reply]
I agree about Slavic, so either cultural identification (what I had done) or John's suggestion of doubling up would work. As for the Chinese "dialects" (which most Chinese insist are dialects and not languages, despite what Europeans feel they "should" think), there's always the second article based (partially) on mutual intelligibility, which I referred to in the intro. Is someone going to follow up with John's suggestion? Because my last edit was at least consistant, and preferable to the current situation. kwami 21:54, 2005 July 10 (UTC)

Confusion over name of the page

So are we ranking by Native speakers or Total speakers? If it's by total, why does it say otherwise in the first sentence of the page? If it is by Native, then what is this page: List of the most spoken native languages ? Please clarify.--Zereshk 8 July 2005 22:32 (UTC)

Yeah, it's stupid, they're both of native speakers. We need to work this out, I think. john k 9 July 2005 00:11 (UTC)

If it is by total, then we should start re-ranking the page accordingly. Right now, it's ranked by natives. I'll be woking on that.--Zereshk 9 July 2005 12:02 (UTC)

I'm the one who started adding second language speakers to this page. Before that, it was entirely native speaking populations. It was the "total number of native speakers", by which the author meant all native speakers of a language, in all countries. The other page is simply a list lifted from the 13th edition of Ethnologue, and might actually be a copyright violation.
If you want to order this list by total (1st + 2nd language) speakers, there are two major problems:
  • We do not have 2nd language data for most languages,
  • Estimates of 2nd language numbers are even less reliable than those for native speakers, so if we rank by the 1st+2nd total, we will get into lots of edit wars by people claiming that language X is exagerated, and Y really has more speakers, based on varying definitions of what a second-language speaker is.
I think it's safer to order the list by number native speakers.
I have another suggestion, in the Czechoslovak discussion above: Since our main disagreements center around what's a language and what's a dialect, why not use one list for languages defined by linguistic criteria (intelligibility), and one defined by cultural identification? That is, in one Czechoslovak would be unified and Chinese broken up, and in the other Czechoslovak would be broken up and Chinese unified? kwami 2005 July 9 18:05 (UTC)

Im fine one way or the other. But I dont think getting estimates of 2nd language speakers would be too difficult. The list can be according to any predefined definition. In any case, all we must do is define what we want to be listed here, and stick to it.--Zereshk 9 July 2005 18:39 (UTC)

I tend to think Chinese and Czech/Slovak should both be broken up. I'd suggest that only in cases like Arabic, where it's not only a complicated question of whether it's one or several languages, but also hard to figure out how exactly to divide it up into separate languages, that we should keep them together. I'd also suggest just redirecting the other page to here, and explaining that this page is listing native speakers. In terms of 2nd language speakers, one problem is how to define "2nd language" - is it anybody who has any knowledge of the language at all? Or is it more specific than that? Is a Dane who speaks some English because he studied it in school the same as a Yoruba who speaks English as a second language and has to use it in everyday communication? Better to avoid the whole question, I think. john k 9 July 2005 19:01 (UTC)

Someone went ahead and reordered the list according to 1st+2nd language speakers, but since we don't (yet) have the data for 2nd language speakers for most languages, I reverted. However, 2nd language data is of interest when you're considering how useful a language is.
If you want to separate Czech and Slovak because of speaker identification, why would we even want to break up Arabic? Or Chinese? Or Italian? They are single languages by cultural definition. kwami 20:09, 2005 July 9 (UTC)
  • I've talk about this earlier, please don't confuse a second language with a learning language. These are two different subjects. Finding second language users is not hard! Finding learning languages users is difficult. -Pedro 01:07, 10 July 2005 (UTC)[reply]

Language families

I've added a column for language families - so far, I've mostly only had a chance to add in the broadest families - Indo-European, Austronesian, &c., but hopefully we can add in the more specific branches over the next few days. I think this should be useful, especially for the less well known languages which we don't have specific articles about. As far as I can tell, 17 18 language families (Indo-European, Uralic, Altaic, Afro-Asiatic, Niger-Congo, Nilo-Saharan, Sino-Tibetan, Dravidian, Tai-Kadai, Hmong-Mien, Austro-Asiatic, Austronesian, Japonic, Quechuan, Aymaran, Uto-Aztecan, and Mayan, and Tupian) and 1 language isolate (Korean) are represented among the languages with more than one million speakers. john k 9 July 2005 01:30 (UTC)

And Tupian. kwami 2005 July 9 18:21 (UTC)
Yup, forgot that. john k 9 July 2005 18:55 (UTC)

The language called "Persian" is known internationally and domestically within Iran as "Farsi."

Yup, and in English it's called "Persian". Lots of languages (Spanish, Chinese, French, Japanese) have English or Anglicized names. Even when the native name is sometimes used in English (such as "Ivrit" for "Hebrew", "Bahasa" for "Indonesian", or "Italiano"", "kiSwahili", "Russki", etc.), this should be noted in the dedicated articles, but isn't necessary here.
Actually, "Farsi" isn't a good name internationally, because many speakers outside of Iran refer to their language as Dari or Tajiki. kwami 21:22, 2005 July 12 (UTC)

More accurate sources

The CIA, in the World Factbook have a list of the most used native languages:

  • Chinese, Mandarin 13.69%,
  • Spanish 5.05%,
  • English 4.84%,
  • Hindi 2.82%,
  • Portuguese 2.77%,
  • Bengali 2.68%,
  • Russian 2.27%,
  • Japanese 1.99%,
  • German, Standard 1.49%,
  • Chinese, Wu 1.21% (2004 est.)

note: percents are for "first language" speakers only See reference

Internet World Stats has also a list of the people able to speak each language, including second languages (is ordered by internet users thought)

  • English 1,109,729,839
  • Chinese 1,316,007,412
  • Japanese 128,137,485
  • Spanish 389,587,559
  • German 96,141,368
  • French 375,066,442
  • Korean 75,189,128
  • Italian 58,608,565
  • Portuguese 227,621,437
  • Dutch 24,218,157

See the stats.

The english numbers seem a bit high compared with other reports, but they claim to have accurate data. :? --Bisho 15:33, 14 July 2005 (UTC)[reply]

Language defined by cultural identification/self designation

Okay, despite all the talk, no one's fixed this article. I'm reverting to the last version to define language by speaker identification (keeping Akira's edits), since most people feel that intelligibility tests are unworkable. I'm the one who added the Chinese "dialects" in the first place, and I don't have any problem removing my own additions to this article.

By all means, please add the individual Chinese "dialects" back in if you like, but keep the main heading and make a note under it. I might do that myself. No need to go to a lot of work; the info is all in the page history from when I added it the first time. And put back Malay, Czechoslovak, and Serbocroatian back in if you like, as additional info - it's all there in the page history.

If our conception of language is to be cultural or self identification, then we shouldn't mix in intelligibility tests, unless it's added as additional information, and cross referenced. We need some sort of consistency in an encyclopedia article, not just whatever feels right for everyone's favorite language. kwami 22:40, 2005 July 14 (UTC)

I say group Chinese, Arabic (except Maltese and other separated Arabic languages) add a linguistics note in these cases, Group Swiss German to the rest of German. Split Malay/Indonesia, Hindu/Urdu, etc. -Pedro 23:37, 14 July 2005 (UTC)[reply]

That's pretty much what we have now with the last revert. (Maltese isn't populous enough to include anyway.) kwami 00:45, 2005 July 15 (UTC)


Turkish language

Turkish language has much more total speakers than it is written on this page. Azeri, Kyrgyz, Kazakh, Uzbek, Turkmen and other Turkic languages have only dialectic differences from Turkish. And they are called with Turkish like Azeri Turkish or Kyrgyz Turkish instead of Azeri or Kyrgyz. So, the total number of Turkish speakers is 165,61 million according to this page and in fact it is almost 250 million with the Turks living as minorities all around the world and living in autonomous Turk regions especially in Russian Federation and China.

We are considering including both "language" by the criterion of mutual intelligibility, double listing "language" by social/national convention. Turkish is certainly one of the languages we need to consider.
If you can provide evidence that in general Turks consider their 'dialects' to be a single language, as the Arabs and Chinese do, then we should make the change you suggest even without the double listing. However, this needs to be the general conception of Turkic language speakers, and not just a political platform of pan-Turkish nationalists.
Meanwhile, you've changed the numbers of just Osmanli. Do you have documentation? Everybody and their brother wants to maximize the numbers for their favorite language, so we're automatically reverting such edits unless they're supported. (Read above for the interminable discussion for Persian.) It appears at first glance that you're counting Turkish Kurds as native Turkish speakers.
kwami 23:57, 2005 July 16 (UTC)

Korean

Someone just revised the Korean population upward to 71M, but left no ref. However, this looks about right: S Korea 48.4, N Korea 18-20 (officially 23, not considering the famine), China 1.9 (probably not counting recent refugees), USA 1.8, Japan 0.7, Canada and Australia together 0.1. (Few Russian Koreans still speak the language.) This gives us 70.9 million using the lower estimate for North Korea. Perhaps a million or so more wouldn't be unreasonable, but I don't know how the Wikipedia article gets 78. kwami 06:44, 2005 July 19 (UTC)

Thai Language

Why is there only 20 M Thai native speaker while the population is 67 M right now. ALthough there are several dialect in Thailand right now. But everyone can use the standard Thai including old people and adolescents.

The numbers should probably be updated. But the 20M refers to native speakers, not everyone who can speak the language. (Take a look at Vietnamese, Burmese, Tagalog.) Many Thai speak something closer to Lao, or are otherwise considered to speak distinct languages, and have been counted appropriately; "Siamese" speakers are less than half the Thai population. (However, 80% of ethnic Chinese are counted as Thai speakers.) There are also quite a few non-Thai languages in Thailand: 3M Malays, a million Khmer, a million Hakka Chinese, etc.
However, the distinctions are Ethnologue distinctions based on intelligibility, which is not the standard we are following for this article. Perhaps all Thai, or at least all non-Lao Thai, should be counted? Do "Northeastern Thai" speakers consider their language to be Standard Thai, or Lao? kwami 01:58, 2005 July 21 (UTC)
I don't think there's any linguistic justification for regarding Thai and Isan as the same language, whatever the criterion- Thai and Lao are always treated as separate languages, and Isan is much closer to the latter. Mark1 06:45, 1 August 2005 (UTC)[reply]
I've gone ahead and added 15M Isan to Lao. Isan had not been counted at all! Split it up if you like; Lao proper has 3.2M native + 0.8M 2nd speakers = 4M total (1991 UBS) if you do. kwami 11:07, 2005 August 2 (UTC)

Bosnian and Serbian separate?

It is absurd to separate Bosnian and Serbian. Both the written and spoken languages are, as far as I am aware, virtually identical, and about half of the population of Bosnia are Serbs, who would be surprised to learn that they do not speak Serbian. I'd suggest that, given this confusion, we should merge Serbo-Croatian back together into a single language. john k 05:42, 2 August 2005 (UTC)[reply]

I thought we weren't going by mutual intelligibility tests? They're separate national languages. Presumably the Serbs in Bosnia would say that they speak Serbian, not Bosnian. In fact, I believe that's how they were counted. It's also absurd to separate Malaysian and Indonesian, Belorusan and Russian, Turkish and Azeri, etc etc, but these divisions are accepted. Who among us is going to bestow the status of "language" on a particular lect? kwami 08:46, 2005 August 2 (UTC)

Dividing up the table

I just split up the table both for ease of navigation and for ease of editing. As for the numbers I picked, it's a logarithmic scale: languages with 106 (1 million) speakers, 106.5 (~3 million) speakers, 107 (10 million) speakers, 107.5 (~30 million) speakers, 108 (100 million) speakers or more. That way there are similar numbers of entries in each table, though the first is rather shorter and the last somewhat longer than the others. Anyway, that's why the 3 and 30 are there, in case anyone thinks they're odd numbers to use. kwami 08:55, 2005 August 2 (UTC)

Bajar?

There is no "Bajar" language listed in Ethnologue for Malaysia or Indonesia. There are too many speakers for it to be Bajaw, and Banjar should be included in Malay. Any ideas? If not, we should probably delete this. kwami 10:41, 2005 August 2 (UTC)

Maithili

Why is Maithili split out, but the other Bihari languages (e.g. Bhojpuri) are included in Hindi? I would suggest that the Bihari and Rajasthani languages are perhaps distinct enough, and considered distinct enough, from Hindi to warrant not being included. This in contrast to, say, Awadhi, which is usually considered a dialect. john k 06:08, 4 August 2005 (UTC)[reply]

All Bihari dialects are counted by many as Hindi, but since 2003 Maithili has had official status. I agree this is bizarre, but no more bizarre than giving Urdu separate status. I think we need to ask: do Maithili speakers consider their language to be Hindi, or separate? Do speakers of the other Bihari dialects consider their languages to be Hindi, part of a Bihari language with Maithili, or separate? I don't know. Do you have evidence as to how this is perceived? If Maithili is perceived as distinct, but Bhojpuri is still perceived as a Hindi dialect, then the article is fine as it is. kwami 06:25, 2005 August 4 (UTC)
How bout we put Maithili back in Hindi until we figure this out? It won't be the only language with more than one officially recognized standard; think of "Chinese" & Cantonese. Perhaps Maithili hasn't been used officially long enough for attitudes to have changed much in this area. kwami 06:33, 2005 August 4 (UTC)
BTW, I don't believe Rajasthani is considered a single language. Marwari may be, but splitting it out and leaving the rest in Hindi would be like splitting out Maithili. kwami 06:58, 2005 August 4 (UTC)

Chart request

Would someone skilled in m:EasyTimeline be willing to make a chart of these? – Quadell (talk) (sleuth) 13:43, August 5, 2005 (UTC)