Jump to content

Wikipedia:Categorization

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Fennec (talk | contribs) at 04:10, 2 June 2004 (tweak Comics). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Overview

This page will serve as a central operating point for the different categorization projects.

All existing categories are listed here.

See also: m:MediaWiki User's Guide: Using Categories, Wikipedia:Category, Wikipedia:Category schemes, Lists of articles by category and Wikipedia:Categories for deletion

Formatting

At the beginning or at the end of articles (with interlanguage links) ?

Category tags should, in all cases, be the very first thing listed on the article. →Raul654 17:27, 30 May 2004 (UTC)[reply]
Strictly against this. Categories should be listed last in the article. Why? We had the same discussion with interlanguage links: it's just not pretty for a new user when he clicks edit to see a bunch of categories. Put them at the end, please. --denny vrandečić 20:50, May 30, 2004 (UTC)
They should be above the interlanguage links at the bottom in my opinion. Angela. 21:46, 30 May 2004 (UTC)[reply]
Agree. --denny vrandečić 21:52, May 30, 2004 (UTC)
Agree. James F. (talk) 21:56, 30 May 2004 (UTC)[reply]
Agree (at the end of articles). -- User:Docu
Agree, end is the place. Jamesday 01:25, 1 Jun 2004 (UTC)
Yep. Dori | Talk 05:28, Jun 1, 2004 (UTC)
Another vote for the bottom, either before or after the w:xx's. Hajor 18:40, 1 Jun 2004 (UTC)
Agree, just the position where I choosed to put them myself before coming here. andy 20:18, 1 Jun 2004 (UTC)
Disagree. Category seems more natural at the top, IMHO. Quadell (talk) 17:53, Jun 1, 2004 (UTC)
First. Interlanguage links make more sense at the bottom because they're numerous and intimidating and not going to be edited (or understood) by newer users. Categories are the opposite: not numerous, not intimidating and all users should be able to easily understand and add categories, so they shouldn't be hidden at the bottom. RADICALBENDER 00:22, 2 Jun 2004 (UTC)
They should go at the bottom, after the inter-language links. IMHO, nothing, not a table or an image, should go before the first line(s) of text in the article, just to keep from intimidating a new user. Gentgeen 00:37, 2 Jun 2004 (UTC)

Hierarchicalization

Categories should be kept as hierarchical as possible. That is, the number of top-level categories should be kept at a minimum.

Most importantly, an article should only link to the most specific categories it is in. Thus, Paul McCartney should link not to People, nor to People of Britain, nor to Musicians, but to British musicians, or rather The Beatles. The reason for this is that The Beatles should be hierarchicalized as follows (arrows denote parentage; linked items are articles; the rest are categories):

Paul McCartney
      |                            -----> Musicians -----
      V                           /                      \
The Beatles -> British musicians -                        ---> People
                                  \                      /
                                   -> People of Britain -

Thus, it's known that British musicians are not only musicians and British people, but also People. Which is obvious. Thus, the number of actual category links on the article page can be kept to a minimum.

Another example:

Harry Potter   Albus Dumbledore
   |                  |      |
   |                  |      |
   |   /--------------/      |
   |   |                     V
   |   |         Hogwarts teachers --->Harry Potter characters
   V   V                                ^   |       |
Gryffindors-----------------------------/   |       V
                                            V    Harry Potter
                           Fictional characters     |
                                                    V
                                              Fictional universes

Note that an article may have more than one parent. Thus, the structure is not a strict top-down hierarchy. (And thus absolutely not equivalent to the old subpage model.)

Disagree with Most importantly, an article should only link to the most specific category it is in. Most people will place items in their simple (food, say) category as their first choice and articles routinely belong in multiple categories. An article should be in as many categories as it belongs in. If you want a table of contents, create a specific table of contents system, which you cna also do with category tags. Category tags themselves are not solely for making a table of contents. The description above is perhaps a fair start for a description of TOCCategories. It's not a good start of a description of how to use category tags. See Category extraction below, which this proposed heirarchy directly contradicts. Jamesday 01:25, 1 Jun 2004 (UTC)
Putting an article in "as many categories as it belongs in" would create tremendous and largely pointless clutter. It's much more useful to have Category:Fictional characters as a hierarchy than as one big, flat list, which is exponentially more likely to turn into kibble. Further, there's absolutely no contradiction with #Category extraction: categories can be extracted recursively. For instance, in the above example, extracting Fictional characters would extract Harry Potter characters, and thus Hogwarts teachers, and thus Albus Dumbledore. This allows for a very fine degree of control. There's frankly no reason to add data that can be autogenerated by the software.
True, tools don't exist to do this right now, but the process isn't that difficult. Take the following pseudocode (given that the add operation on sets does nothing when we add an item that's already in there, like doing a $hash{$item} = 1 in Perl), which simply performs a breadth-first search of the category graph starting at the given element, then adds all articles contained in that subset of the category space:
Queue.enqueue(category_to_extract) 
while (!Queue.empty)
  this_category = Queue.dequeue();
  CategorySet.add(this_category)
  foreach (subcategory in this_category)
    if (!CategorySet.in(subcategory) and !Queue.in(subcategory))
      Queue.enqueue(subcategory) # if we haven't seen it before
foreach (category in CategorySet)
  foreach (article in category)
    ArticleSet.add(article)
foreach (article in ArticleSet)
   Extract(article)
Once the new-format database dumps come out, I'll see how implementing this works out.
Furthermore, by simply doing repeated parent-lookups, we perform a similar search in the opposite direction---up toward root categories, not down toward articles---and extract something like the following:
Fictional universes
     |
     V
   Harry Potter--\
                 |           /---->Hogwarts employees--\
                 V           |                         |
   Harry Potter characters---+                         +-->Albus Dumbledore
     ^                       |                         |
     |                       \---->Gryffindors---------/
Fictional characters
Which could even be written thusly. by graph-distance from the article:
  • Albus Dumbledore
    • Hogwarts employees, Gryffindors
      • Harry Potter characters
        • Harry Potter, Fictional characters
          • Fictional universes
Pseudocode follows (to produce the above list, not the generalized dag).
foreach (category of my_article)
  Queue.enqueue( [category,1] )
while (!Queue.empty)
  [this_category,depth] = Queue.dequeue();
  CategorySet.add( [this_category,depth] )
  if (this_category has no parents)
    this_category.setFlag(is_a_root)
  foreach (super_category containing this_category)
    if (!CategorySet.in(super_category) and !Queue.in(super_category))
      Queue.enqueue( [subcategory,depth+1] )
      maxdepth = max(depth+1,maxdepth)
    else 
      [super_category,old_depth] = CategorySet.retrieve(super_category)
      if (old_depth > depth+1)
        CategorySet.add( [super_category,depth+1] )
foreach ([category,depth] in CategorySet)
  outputSet[depth].add(category)
foreach (depth in 1..maxdepth)
  output depth;
  foreach (category in outputSet[depth])
    output category;
The idea is to trace it back to one or more root categories. Root categories should be made with this in mind. There's no real reason, for instance, to make Mathematics and natural sciences simply to contain Natural sciences and Mathematics, when starting the extraction with the set { Mathematics, Natural sciences } would work exactly the same.
Hope this helped explain some things. grendel|khan 14:42, 2004 Jun 1 (UTC)
Paul McCartney can be linked directly to British musicians since he did work independently of The Beatles. (He's also got other categories he can be in, but that's irrelevant to the discussion.) -- Cyrius| 03:06, 1 Jun 2004 (UTC)
The point is that the individual members of The Beatles are already qualified as British musicians through the Beatles category. -Sean Curtin 12:22, 1 Jun 2004 (UTC)
I don't think it is important that an article links only to the one most specific category. Frequently, there are several specific categories. It might even be ok if it links to a category and its parent category. But if the parent category ends up with a couple hundred articles, those also linked deeper should be weeded out. Unless, of course, someone wants to implement an automatic category TOC that splits by alphabet--first letter? first two letters? --ssd 03:36, 1 Jun 2004 (UTC)
I hopen eventually the categories will work similar to Special:Allpages, allowing to use categories for the larger of the current "Lists of articles by category". -- User:Docu

Naming conventions

Descriptive, with no abbreviations unless really "desirable"

Category names should be descriptive; prefer World War II equipment over WW2 equipment.

No "tree" structure indications

But don't imply a strict tree structure; use Monarchs, not People - Monarchs.

Plurals

Plurals are used.

"List of ..."

In instances where List of Quuxen is a simple enumeration with no other information on it (unlike List of Twilight Zone episodes, for instance), it should be replaced with Category:Quuxen.

seems odd that the ones that are only lists are the very ones that are NOT to be called "List...", whereas the Twilight example (with headings and nested subheadings) is much more than a single list Robin Patterson 00:26, 2 Jun 2004 (UTC)

The guidelines at Wikipedia:Naming conventions should be followed in naming categories, with the qualified exception of "Prefer singular nouns". If the category is equivalent to a "List of Foos" page, then use a plural noun.

Category extraction

An advantage of categorization is that it allows extraction of large portions of Wikipedia. For instance, if years and dates were as below (leftmost items are regular articles, the rest are categories), extracting, say, a timeline for the 21st century would be trivial.

2004 -> Years in the 21st century -> Years -
                                            \
                                             --> Time periods
                                            /
30 March -----> Days in March ----> Days ---

Current projects

These are current projects. Volunteers to these projects are very welcome. Please list only the main page of the category and not subcategories (For example, list World War II, but not World War II people, etc)

Why is this being described here instead of in the wikiprojects concerned? Aren't those the people who should be deciding on the appropriate way to organise the topic of their project? Jamesday 01:25, 1 Jun 2004 (UTC)

See Category talk:Fundamental.

Need?

Cimon (shall be doing it with my trusty IP:)
User:Docu

Sports