Jump to content

User:Dcoetzee/Wikicode/Specification

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Dcoetzee (talk | contribs) at 03:15, 6 May 2004 (=Indention= Oops, need the 2 spaces here). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A while ago, on Talk: Binary tree, dze27 said:

Also, perhaps we need some sort of consistent pseudo-language for specifying pseudocode. Something like they do in Introduction to Algorithms, or smilar dze27

The pseudocode in Intro to Algorithms requires some rather complex typesetting (I've had to duplicate it before). This would be better for readers, but not conducive to editing by writers. I agree there needs to be consistency; I've written pseudocode on a number of pages, but I don't claim mine is ideal. I try to focus on a few aspects I consider important:
  • Clarity: the most important; it should be clear what the code does
  • Brevity: the code should not occupy much article space, and should consist mainly of algorithm content as opposed to unnecessary syntax
  • Ease of writing: the code should be easy to write and edit
  • Universality: the code should be clear to programmers familiar with nearly any common language, and not too similar to any one
To facilitate these I've adopted a few conventions, such as Wirth-style := for assignment, use of indention in place of blocks and end marks, and so on, but I appreciate any suggestions for improvements.
Derrick Coetzee 18:30, 4 May 2004 (UTC)

From that discussion I later decided to write this proposal for a standard pseudocode. Here's my (that is, DCoetzee's) somewhat incomplete proposal, intended to meet the above general goals. Please criticize, expand, and help wikify and finalize it:

The Wikicode Standard

This is an informal standard for a pseudocode which I call wikicode. The purpose of wikicode is to easily facilitate both writing and reading of clear and concise descriptions of algorithms and data structures. All wikicode will be typeset in a fixed-width font, for example by prefixing each line with spaces. Do not wrap the code in HTML <pre> tags; this disallows additional markup inside the code block. If wikicode occurs in the middle of encyclopedic text, this will be indicated by wrapping it in HTML <code> tags, which also use a fixed-width font.

Functions

A possibly recursive function is defined using the emboldened function keyword. It is given a list of arguments; types may or may not be specified for the arguments, but if they are they will be indicated by the argument name, a colon (:), and then the type name (see Types). Unabbreviated names should be preferred, unless the name is explained in the text or comments:

 function sort(list, comparisonPredicate)
     (indented body of function)
 function addModK(left : int, right : int, modulus : int)
     (indented body of function)

Results are returned from functions using the return keyword, as in return 5 or simply return if there is no result. Functions are called by simply specifying the name, followed by a parenthesized list of arguments:

 addModK(5, 3, 7)
 sort(myList, func(x,y) x < y)
 sin(5)

Here, sin determines the sine of a value.

As shown above, anonymous functions may be created with func. You can also generally assume all sorts of useful math, string, etc. functions are available, as long as you explain them, in comments or in the text (as also shown above). However, when manipulating the built-in lists, sets, and maps, please use their standard interface (see their sections).

Comments

Comments, surrounded by parentheses and typed in italics, occupy an entire line or a suffix of a line. Parentheses are used in analogy with English text, to suggest that comments are parenthetical:

  (Do something silly)
  i := i + 0       (add nothing to i)

If a comment is so long it requires multiple lines, it is preferable to embed it in the text somehow instead; however, if necessary they may be done simply by placing the entire comment inside parentheses, italicizing each line, and indenting appropriately:

    (This is a very long and unnecessary comment which is somewhat difficult
     to edit because it would require the editor to fix the line wrapping,
     which is bound to be annoying.)
    i := 5

I'm hoping for a better proposal for multi-line comments.

Variables

Variables are mutable, are referenced by name, and are assigned new values using the := operator. See Names for naming guidelines. Local variables need not be declared, but if you wish to you can use the var form for this:

 var w               // number of free munchkins
 var x := 2
 var y : int := 5
 var z : int
 var a := 2, b : int, c, d : int := 5

Variables need not be initialized when declared, but may not be used before being initialized.

Conditionals and Loops

Conditional tests can be done with the if-else statement, with if and else emboldened:

 if x = 5
     (x is 5 here)
 else
     (x is not 5 here)

There is no then or endif, and there need not be an else branch. The condition must have have bool type (see section Types). The equality comparison operator is simply =. For less-than or equal to (&le;) will be used, for greater-than or equal to (&ge;) will be used, and for not-equal (&ne;) will be used, as in mathematical texts.

There are only four types of loops, the while loop, and the do-while loop, the for-to loop, and the for each-in loop. The while loop iterates while the condition holds; the until keyword may also be used to iterate until the condition does not hold. No extra parentheses or keywords are needed:

 while x ≠ 0
      (body of while loop)

The do-while loop, unlike the while loop, always executes at least once before testing its condition:

 do
     (body)
 while done = false

The for-to loop steps an integer variables along a range of integer values in increasing order. For decreasing order, for-downto can be used.

 for x := 1 to 10
     (body, executed 10 times)
 for x := 10 downto 1
     (body, executed 10 times)

Finally, the for each-in loop is used with lists and sets (see their sections) and iterates over the elements of the list in order or set in some arbitrary order:

 for each n in numberList
     (do something with n)
 for each n in numberSet
     (do something with n)

The keyword end loop can be used to break out of any loop at any time.

Types

There are only a few built-in types available, as well as mechanisms for creating user-defined types. Types (like comments) are always typeset in italics; no ambiguity arises because types should not occur on a line by themselves or be preceded by //.

The simplest built-in type bool has one of two built-in values: true or false. Any bool value is a valid condition for a conditional or loop construct. The built-in type int is an infinite-precision integer type including positive and negative values and zero. The type character includes all conceivable writable characters and is written by surrounding the character with single quotes ('), and string is a string of such characters, surrounded by double-quotes ("). Double-quotes may be used inside a string, as long as it's clear from context where the string ends; the outer double-quotes may be emphasized with bold ("hel"lo") if this is not clear.

There is also a primitive list type, a primitive set type, and a primitive map type. See their respective sections for more details.

A new record type (called a struct in C) may be defined using the record keyword, may be recursive, and must include field names but not necessarily types:

 record employee
     name : string
     age : int
     mentor : employee
     stuff

Records, as well as tagged unions below, are only referred to by reference, and references may contain the special value null to indicate they refer to nothing. Record fields are accessed using ., as in e.name, and such values may be assigned to. Multiple fields may be listed on a line, if commas are used:

 record employee
     name : string, age : int
     mentor : employee, stuff

Types may also be constructed using the tagged union keyword, which defines a tagged union, much like ML's datatype. Tagged unions may be recursive, and assign a tag name to each branch:

 tagged union tree
     Leaf:
     Node:  left : tree, right : tree

Fields are specified after each tag; only one set of these fields is available at a time, depending on the tag. The tag of a tagged union object may be determined using the cases keyword, and within the cases, fields of the correct branch of the tagged union may be accessed:

 cases t
     Leaf: (it's a leaf)
           (do stuff)
     Node: (it's a node)
           (do stuff with t.left and t.right)

If a type is not specified for an entity, such as an argument, local variable, or record field, it can assume any value. Operations such as + may "fail" (make the pseudocode invalid) if they may be applied to a value they do not apply to.

List

A list stores a sequenced list of values. The primitive list operations are:

 list(1, 2, 3)             (the list containing 1, 2, 3 in that order)
 emptyList                 (the list containing no elements)
 insertFront(list, value)  (inserts new value at front of list; push)
 insertEnd(list, value)    (inserts new value at end of list; enqueue)
 removeFront(list)         (removes value from front of list; pop/dequeue)
 removeEnd(list)           (removes value from end of list; undo enqueue)
 isEmpty(list)             (returns true if and only if list is empty)
 getFront(list)            (gets the value at the front of the list)
 getEnd(list)              (gets the value at the end of the list)
 list[i]                   (gets ith element of the list, zero for front)
 size(list)                (gets number of elements in the list)
 listToSet(list)           (get a set containing all elements of the list)

You can also use for-in to iterate over a list from front to back.

Set

A set stores an unordered sequence of values with no duplicates. The primitive set operations are:

 set(1, 1, "a", 3)         (the set containing 1, "a", and 3)
 emptySet                  (the empty set; don't use the math symbol)
 insert(set, value)        (insert a new value into the set)
 remove(set, value)        (remove a value from the set)
 x := extractElement(set)  (removes a value from the set and returns it)
 x ∈ set                   (x is in the set; written &isin;)
 x ∉ set                   (x is not in the set; written &notin;)
 size(set)                 (gets number of elements in the set)
 setToList(set)            (get a list containing all elements of the set)
 

You can also use for-in to iterate over a set in some arbitrary order.

Map

Maps are associative arrays. They take keys and produce associated values. All keys which have not been assigned values map to the special value unassigned, but it may be clearer to readers if you use the unassigned operation below. The primitive operations are:

 map(1 -> "a", 'k' -> 7)   (the map mapping 1 to "a" and 'k' to 7)
 emptyMap                  (the empty map)
 map[x]                    (get value associated with x)
 map[x] := y               (make x map to y from now on)
 map[x] := unassigned      (unassign x)
 size(map)                 (gets number of assigned keys in the map)

You cannot iterate over a map.

Names

No two named entities, including functions, variables, and types, may share the same name in a single code listing (although duplicate names are possible in a single article). Valid names may include letters, numbers, greek letters, and mathematical markup, such as subscripts, superscripts, and primes (typed as apostrophes), but names may not start with a number. Names are case-sensitive, but names that differ only by case are strongly discouraged.

Multiword names will use mixed case, always beginning with lowercase. Underscores (_) or other word separators will not be used.

Line length

If possible, no line should exceed 78 characters, but there may be up to 100 characters on a line. Also, if lines can be made shorter without adding additional lines or sacrificing clarity, this should be done.

Indention

The entire code listing will be indented an initial 2 spaces. A function body or the body of a conditional or loop construct must always be indented exactly 4 additional spaces. For example:

 function foo(x, y)
     if x ≠ y
         while x > y
             x := x - y
         return x
     else
         foo(x + 1, y)

Any other indention performed is up to the writer, but should be done in a clear way that keeps line length under the maximum while retaining clarity and not falsely associating unrelated elements.

The following are some helpful suggestions about indentation that are not mandated. If a function call's arguments are large enough that they cannot fit on one line, a line break should either follow an operator, a comma, or an opening parenthesis. If it follows a comma, the next line is indented to the column following the initial opening parenthesis of the call. If it follows an operator or parenthesis, it is indented 2 additional spaces. For example (don't really use names like the following):

 function foo(x, y)
     if x > y
         otherFunctionWithGreatBigName(thisArgumentFillsUpTheRestOfTheLine,
                                       argument1ToThePlusOperator +
                                         argument2ToThePlusOperator,
                                       callYetAnotherFunctionWithBigName(
                                         argToYetAnotherFunction))

Naturally, since this is less clear, clarity of naming and line length should be traded off intelligently.

Links may be used anywhere they are useful in pseudocode, particularly in comments and around calls to functions not specified in the current listing. However, if used on an emboldened or italic word, it should be used in addition to the existing markup. It may also be desirable to link tagged union or record the first time they are used conspicuously in an article. Also, use common link sense: link only strongly relevent articles.

{{msg:wikicode}}

A page will be created describing wikicode for readers, and each page which uses wikicode must immediately precede its first conspicuous use with {{msg:wikicode}}, a message which will provide the reader a link to this page.


Please add your own comments here

Some thoughts

  • Instead of using indenting, it may be more clearer to use begin and end blocking words to make the function body more clearer, if the block is larger than one line.
    • This doesn't seem unreasonable; however, I don't really want to have end keywords for conditionals and loops, because of the useless space they occupy (without much added clarity). I'm following the style of Introduction to Algorithms in that. What do you think? Derrick Coetzee 23:31, 5 May 2004 (UTC)
  • (Style?) Instead of declaring type of variables after the name, perhaps they should be done before: integer left, integer right, integer modulus for example.
    • This is a rather difficult issue; placing types before the name works fine for single-word types, but can become rapidly more complicated as types do, until it's no longer clear where the type ends and the variable name begins; the italics might help there though.Derrick Coetzee 23:31, 5 May 2004 (UTC)
  • Comments should always have some kind of pre-delimiting thingy. Italicised comments solely on one line will be confusing (above, for instance, "Do something silly"). The italics don't matter so much.
    • My previous complaint about this was bogus. I don't yet have a good solution for multiline comments, and I've mandated parentheses around comments now as well as italics. Derrick Coetzee 03:12, 6 May 2004 (UTC)
  • I don't like this "var x" thing. It should not be the sole way of designating variables in a pseudocode form such as this. If one is demonstrating weak typing, sure, but otherwise, one should probably use a declaration form such as integer number, real your_bank_balance, boolean is_true, etc
    • Ah, you're misunderstanding here. A type specification is allowed, as the examples with  : int demonstrate, they're just not required. You may be looking for them before and not after (see second point). Derrick Coetzee 03:12, 6 May 2004 (UTC)
  • The control statements look good, though, perhaps for the foreach type code, one could actually write for each blah in...
  • Maps should be a little more general, not just mapping numbers to anything, but also mapping strings and other types to anything as well.
    • On this point, I didn't mean to imply they were so limited; I'll expand the example set.Derrick Coetzee 23:31, 5 May 2004 (UTC)
  • Indenting shouldn't be so rigidly described however. The way wikicode is structured won't make things too unreadable.
    • Done; is this more like what you had in mind? Derrick Coetzee 03:12, 6 May 2004 (UTC)

This is all just some random thoughts, so feel free to disregard what I say as you see fit ;) Dysprosia 23:09, 5 May 2004 (UTC)

Thanks a lot for your careful examination and comments, will make some related changes soon. Derrick Coetzee 23:31, 5 May 2004 (UTC)

I don't think brackets on a function call should be optional. Why have two ways to do the same thing? CGS 23:44, 5 May 2004 (UTC).

The idea is to reduce clutter in some cases; it's done this way in languages like ML, for example. It may in fact not be justified though. Change it if you like. Derrick Coetzee 00:06, 6 May 2004 (UTC)
Update: Done; I think you're right that the benefit of consistency outweighs any chance of added clarity here. Derrick Coetzee 03:13, 6 May 2004 (UTC)