Jump to content

C (programming language)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by XJaM (talk | contribs) at 19:28, 5 September 2002 (+Cfront as a predecessor of C++). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

C is a programming language that was designed by Dennis Ritchie during the early 1970s to be used for operating system implementation and other low-level programming tasks.

The main features of C are:

  • Explicit structuring to blocks, using { to start blocks and } to end them
  • Lexical variable scoping
  • Use of a pre-processor for various tasks, such as setting up macros and including header files (for interfacing with shared libraries or other program files.)
  • Lack of built-in facilities for input-output. Functions from a standard library have to be used instead.
  • Extensive use of pointers to reference memory directly. There are no built-in facilities for dynamic allocation; this is also handled by the standard library.
  • The language's operators are all O(1), so that the programmer has a tighter degree of control over the efficiency of the code.

One of the main weaknesses of C is that it is up to the programmer to manage the contents of computer memory. This makes it easy to write bugs related to erroneous memory operations, such as buffer overflows. Tools such as lint have been created to help programmers avoid these errors.

Since the introduction of C, several languages based on C have been invented to fill various perceived shortcomings of C. The Cyclone programming language adds features to prevent erroneous memory operations. C++ and Objective C add constructs designed to aid in object-oriented programming. Java and C# add object-oriented programming constructs as well as a generally higher level of abstraction, such as automatic memory management.

History

The initial development of C occurred between 1969 and 1973 (according to Ritchie, the most creative period was during 1972). It was called "C" because many features derived from an earlier language named B, in commemoration of its parent, BCPL. BCPL was in turn descended from an earlier Algol-derived language, CPL.

By 1973, the C language had became powerful enough that most of the kernel of the Unix operating system was reimplemented in C. This was the first time that the kernel of an operating system had been implemented in a high level language. In 1978, Ritchie and Brian Kernighan published The C Programming Language (a.k.a. "the white book", or K&R.) For many years, this book served as the specification of the language; even today, it enjoys great popularity as a manual and learning tutorial.

C became immensely popular outside Bell Labs during the 1980s, and was for a time the dominant language in systems and microcomputer applications programming. It is still the most commonly-used language in systems programming, and is one of the most frequently used programming languages in computer science education.

In the late 1980s, Bjarne Stroustrup and others at Bell Labs worked to add object-oriented programming language constructs to C. The language they produced with Cfront was called C++ (thus avoiding the issue of whether the successor to "B" and "C" should be "D" or "P".) C++ is now the language most commonly used for commercial applications on the Microsoft Windows operating system, though C remains more popular in the Unix world.

Versions of C

K&R C

C evolved continuously from its beginnings in Bell Labs. In 1978, the first edition of Kernighan and Ritchie's The C Programming Language was published. It introduced the following features to the existing versions of C:

  • struct data type
  • long int data type
  • unsigned int data type
  • The =+ operator was changed to +=, and so forth (=+ was confusing the C compiler's lexical analyzer).

For several years, the first edition of The C Programming Language was widely used as a de facto specification of the language. The version of C described in this book is commonly referred to as "K&R C." (The second edition covers the ANSI C standard, described below.)

K&R C is often considered the most basic part of the language that is necessary for a C compiler to support. Since not all of the currently-used compilers have been updated to fully support ANSI C fully, and reasonably well-written K&R C code is also legal ANSI C, K&R C is considered the lowest common denominator that programmers should stick to when maximum portability is desired. For example, the bootstrapping version of the GCC compiler, xgcc, is written in K&R C. This is because many of the platforms supported by GCC did not have an ANSI C compiler when GCC was written, just one supporting K&R C.

However, ANSI C is now supported by almost all the widely used compilers. Most of the C code being written nowadays use language features that go beyond the original K&R specification.

ANSI C and ISO C

In 1989, C was first officially standardized by ANSI in ANSI X3.159-1989 "Programming Language C". One of aims of the ANSI C standard process was to produce a superset of K&R C. However, the standards committees also included several new features, more than is normal in programming language standardization.

Some of the new features had been "unofficially" added to the language after the publication of K&R, but before the beginning of the ANSI C process. These included:

  • void functions
  • functions returning struct or union types
  • void * data type
  • struct field names in a separate name space for each struct type
  • assignment for struct data types
  • The stdio library and some other standard library functions became available with most implementations (these already existed in at least one implementation at the time of K&R, but were not really standard, and thus not documented in the book).

Several features were added during the ANSI C standardization process itself, most notably function prototypes (borrowed from C++). The ANSI C standard also established a standard set of library functions.

The ANSI C standard, with a few minor modifications, was adopted as ISO standard number ISO 9899. The first ISO edition of this document was published in 1990 (ISO 9899:1990.)

C99

After the ANSI standardization process, the C language specification remained relatively static for some time, whereas C++ continued to evolve. However, the standard underwent revision in the late 1990s, leading to ISO 9899:1999, which was published in 1999. This standard is commonly referred to as "C99."

The new features added in C99 include:

  • inline functions
  • freeing on restrictions on the location of variable declarations (in line with C++)
  • the addition of several new data types, including "long long int" (to reduce the pain of the 32->64 bit transition looming for much old code with the predicted obsolescence of the x86 architecture), an explicit boolean datatype, and a type representing complex numbers.
  • Variable-length arrays

Interest in supporting the new C99 features is mixed. Whilst GCC and several commercial compilers support most of the new features of C99, the compilers made by Microsoft and Borland do not, and these two companies do not seem to be interested in adding such support.


The following simple application prints out "Hello, world!" to the standard output file (which usually the screen, but might be a file or some other hardware device.) It appeared for the first time in K&R.

 #include <stdio.h>

 int main(void)
 {
     printf("Hello, World!\n");
     return 0;
 }
 

Anatomy of a C Program

A C program consists of functions and variables. C functions are like the subroutines and functions of Fortran or the procedures and functions of Pascal. The function main is special in that the program begins executing at the beginning of main. This means that every C program must have a main function.

The main function will usually call other functions to help perform its job, such as printf in the above example. The programmer may write some of these functions and others may be called from libraries. In the above example return 0 gives the return value for the main function. This indicates a successful execution of the program to a calling shell program.

A C function consists of a return type, a name, a list of parameters (or void in parentheses if there are none) and a function body. The syntax of the function body is equivalent to that of a compound statement.

Control structures

Selection statements

C has three types of selection statements: two kinds of if and the switch statement.

The two kinds of if statement are

  if (<expression>) 
     <statement>

and

  if (<expression>) 
     <statement>
  else 
     <statement>

In the if statement, if the expression in parentheses is nonzero or true, control passes to the statement following the if. If the else clause is present, control will pass to the statement following the else clause if the expression in parentheses is zero or false. The two are disambiguated by matching an else to the next previous unmatched if at the same nesting level.

The switch statement causes control to be transferred to one of several statements depending on the value of an expression, which must have integral type. The substatement controlled by a switch is typically compound. Any statement within the substatement may be labeled with one or more case labels, which consist of the word "case" followed by a constant expression and then a colon (:). No two of the case constants associated with the same switch may have the same value. There may be at most one default label associated with a switch; control passes to the default label if none of the case labels are equal to the expression in the parentheses following switch. Switches may be nested; a case or default label is associated with the smallest switch that contains it.

  switch (<expression>) {
     case <label1> :
        statements;
        break;
     case <label2> :
        statements;
        break;
     default :
        statements;
  }

Iteration statements

C has three forms of iteration statement:

  do 
     <statement>
  while (<expression>) 
  while (<expression>) 
     <statement>
  for (<expression> ; <expression> ; <expression>)
     <statement>

In the while and do statements, the substatement is executed repeatedly so long as the value of the expression remains nonzero or true. With while, the test, including all side effects from the expression, occurs before each execution of the statement; with do, the test follows each iteration.

If all three expressions are present in a for, the statement

  for (e1; e2; e3)
     s;

is equivalent to

  e1;
  while (e2) {
     s;
     e3;
  }

Any of the three expressions in the for loop may be omitted. A missing second expression makes the while test nonzero, creating an infinite loop.

Jump statements

Jump statements transfer control unconditionally. There are four types of jump statements in C: goto, continue, break, and return.

The goto statement looks like this:

  goto <identifier>

The identifier must be a label located in the current function. Control transfers to the labeled statement.

A continue statement may appear only within an iteration statement and causes control to pass to the loop-continuation portion of the smallest enclosing such statement. More precisely, within each of the statements

  while (...) {
     ...
     contin: ;
  }
  do {
     ...
     contin: ;
  } while (...)
  for (...) {
     ...
     contin: ;
  }   

a continue not contained within a smaller iteration statement is the same as goto contin.

The break statement is used to get out of a for loop, while loop, do loop, or switch statement. Control passes to the statement following the terminated statement.

A function returns to its caller by the return statement. When return is followed by an expression, the value is returned to the caller of the function. Flowing off the end of the function is equivalent to a return with no expression. In either case, the returned value is undefined.

Operator precedence

     () [] -> . ! ++ -- (cast)
         * & sizeof (var)        unary operators
     * / %                       multiplicative operators
     + -                         additive operators
     << >>                       shift operators
     <  <=  >  >=                relational operators
     == !=                       equality operators
     &                           bitwise and
     ^                           bitwise exclusive or
     |                           bitwise inclusive or
     &&                          logical and
     ||                          logical or
     ?:                          conditional operator
     = += -= *= /= %= <<= >>=
         &= |= ^=                assignment operators 
     ,                           comma operator

Data declaration

Arrays

Examples:

   int myvector [100];
   float mymatrix [3] [2] = {2.0 , 10.0, 20.0, 123.0, 1.0, 1.0}
   char lexicon  [10000] [300] ;  /* 10000 entries with max 300 chars each. */

References

The Development of the C Language http://cm.bell-labs.com/cm/cs/who/dmr/chist.html

The C Programming Language, by Brian Kernighan and Dennis Ritchie. Also known as K&R. This is good for beginners.

  • 1st, Prentice-Hall 1978; ISBN 0-131-10163-3.
  • 2nd, Prentice-Hall 1988; ISBN 0-131-10362-8.

C: A Reference Manual, by Samuel P. Harbison and Guy L. Steele. This book is excellent as a definitive reference manual, and for those working on C compiler and processors. The book contains a BNF grammar for C.

  • 4th, Prentice-Hall 1994; ISBN 0-133-26224-3.
  • 5th, Prentice-Hall 2002; ISBN 0-130-89592-X.

This article (or an earlier version of it) contains material from FOLDOC, used with permission.