Jump to content

Wikipedia talk:Statistics

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Enchanter (talk | contribs) at 17:19, 28 August 2002 (Suggestions for measuring numbers of articles.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

OK, somebody has to say this: The fact that we are patting ourselves on the back for intentionally undercounting our articles is just plain silly. I just now went and looked under "short pages" at all 28 pages with exactly 100 bytes, and 13 of them contained a comma. Not a single one of them deserves to be called an article, but almost half are counted. Next I looked at all 33 pages with exactly 200 bytes, and 27 of those contained a comma. A few of them (not eighty percent!) might be considered articles under an extremely lenient definition of article, but does anyone outside of Wikipedia consider a single, brief paragraph to be an article? Are ANY of Brittanica's articles under 500 bytes?

I estimate our median article size as 1000 bytes, because that's the size of our 18943rd longest page according to long pages. (18943 would be the median of 37886 total articles.) To my mind, a conservative count of articles would place an 1000-byte minimum, rather than a 1000-byte median, which would trim our total article count in half. But no matter how we count articles, let us at least prominently post the median size of the articles which are included in the count. And please, please don't call the count "unimpeachable". (For refernence, my little tirade (including this sentence) is 1367 bytes long, i.e. rather longer than our median article.)

--Fritzlein 02:55 Aug 17, 2002 (PDT)

I agree.

There really needs to be a more conservative total article count that makes a distinction between encyclopedia articles and almanac articles and that excludes more certain pages. What particularly troubles me is that there are now thousands of year almanac pages and that most of them can't be considered to even be almanac articles because they are just templates.

I therefore propose the following (in addition to the current criteria); 1) any page linked to centuries should be excluded from the total article count and should be given its own line in special:statistics at least until most of these pages become almanac articles (the vast majority are either templates or templates with one or two entries). 2) any page with a link to Wikipedia:Disambiguation be excluded from the count. 3) any page that is less than 500 bytes be excluded from the count (E. coli is 610 bytes). and 4) there should be three "total article counts" for everything not excluded by the above; one for anything with the string, list, chart, timeline or table in their titles (these would be "almanac-like" articles), one for everything left over (these would be "encyclopedia articles") and one grand total count that would still be the number displayed on the Main Page.

Our current count is exaggerating the true number of articles we have and is harming the project as a result. We need to be honest with our article counts and very conservative -- otherwise we will loose credit with passers-by who are at first impressed by our article count but then find out that it is bloated. --mav 13:36 Aug 28, 2002 (PDT)

I think that sounds pretty reasonable. --Brion 13:47 Aug 28, 2002 (PDT)

As long the criteria you select are easily computable, I'm happy to make whatever change in the software is necessary to reflect a better count. I also don't think anyone is making any claims about the accuracy of the count--the statistics page itself is careful to point out that these are just estimates. But I agree, a more conservative estimate is entirely warranted. --LDC

Great! While you are at it a link to Wikipedia:What is an article under the first occurance of that word on the special stats page would be nice. --mav
I never thought I'd say this, but I'd like something to be more conservative.  :-) (Just the article count, not any of Bush's cabinet). --KQ
From a random sampling of pages, I would say that something like a third of our pages would truly count as useful articles in the eyes of a new user (that agrees with earlier observations from Kajakit on the mailing list). That would mean we have something like 10,000-15,000 'useful articles' in the database. We could proxy that by counting, say, articles over 2000 characters long. At the time of writing, that would give 12,471 'useful articles' compared to a headline number on the main page of 39,654.
The 2000 character threshold has the advantage of being long enough to cover most of the non-articles according to the criteria suggested by mav (century pages, disambiguation pages etc) automatically.
I would not like to see the headline count on the main page reduced - I think that would be confusing for new users and perhaps a bit demotivating for the rest of us. We could consider changing the main page wording to something like:
... We started in January 2001 and are already working on 6,998,054 articles, with more being added and improved all the time. We want to make over 100,000 complete articles, so let's get to work! Anyone, including you, can edit any article ....
That would let us keep the headline count without creating the suggestion that they are all finished, polished articles. We could keep a running total of '2000 character articles', and perhaps other sizes too, on the statistics pages for those that are interested.
Enchanter 17:19 Aug 28, 2002 (PDT)