ASCII

ASCII (American Standard Code for Information Interchange) is a character set and a character encoding based on the Roman alphabet as used in modern English, used by computers and other communication equipment to represent text and to control devices that work with text. Like other codes (such as IBM's EBCDIC), it specifies a correspondence between integers that can be represented digitally and the symbols of a written language, allowing digital devices to communicate with each other and to process and store information. The ASCII character encoding (or a compatible extension; see below) is used on nearly all common computers (especially personal computers and workstations). The preferred MIME name for this encoding is "US-ASCII".

ASCII was first published as a standard by the ASA (the American Standards Association, which is now ANSI) in 1963; its present form is essentially ANSI X3.4-1967 / ECMA-6.

ASCII is a seven-bit code, meaning that it uses the integers representable with seven binary digits (a range of 0 to 127) to represent information. Even at the time that ASCII was introduced, most computers dealt with eight-bit bytes as the smallest unit of information; the eighth bit was commonly used for error checking on communication lines or other device-specific functions.

The non-printable control characters:

Decimal	Hex	Value	Unicode printable representation	Interpretation
000	00	NUL	␀	Null character
001	01	SOH	␁	Start of Header
002	02	STX	␂	Start of Text
003	03	ETX	␃	End of Text
004	04	EOT	␄	End of Transmission
005	05	ENQ	␅	Enquiry
006	06	ACK	␆	Acknowledgment
007	07	BEL	␇	Bell
008	08	BS	␈	Backspace
009	09	HT	␉	Horizontal Tab
010	0A	LF	␊	Line Feed
011	0B	VT	␋	Vertical Tab
012	0C	FF	␌	Form Feed
013	0D	CR	␍	Carriage Return
014	0E	SO	␎	Shift Out
015	0F	SI	␏	Shift In
016	10	DLE	␐	Data Link Escape
017	11	DC1	␑	XON Device Control 1
018	12	DC2	␒	Device Control 2
019	13	DC3	␓	XOFF Device Control 3
020	14	DC4	␔	Device Control 4
021	15	NAK	␕	Negative Acknowledgement
022	16	SYN	␖	Synchronous Idle
023	17	ETB	␗	End of Trans. Block
024	18	CAN	␘	Cancel
025	19	EM	␙	End of Medium
026	1A	SUB	␚	Substitute
027	1B	ESC	␛	Escape
028	1C	FS	␜	File Separator
029	1D	GS	␝	Group Separator
030	1E	RS	␞	Record Separator
031	1F	US	␟	Unit Separator

Printable characters:

 Decimal  Hex   Value    Decimal  Hex   Value    Decimal  Hex   Value
 -------  ----  -----    -------  ----  -----    -------  ----  -----
   032     20  (Space)     064     40     @        096     60     ` 
   033     21     !        065     41     A        097     61     a 
   034     22     "        066     42     B        098     62     b 
   035     23     #        067     43     C        099     63     c 
   036     24     $        068     44     D        100     64     d 
   037     25     %        069     45     E        101     65     e 
   038     26     &        070     46     F        102     66     f 
   039     27     '        071     47     G        103     67     g 
   040     28     (        072     48     H        104     68     h 
   041     29     )        073     49     I        105     69     i 
   042     2A     *        074     4A     J        106     6A     j 
   043     2B     +        075     4B     K        107     6B     k 
   044     2C     ,        076     4C     L        108     6C     l 
   045     2D     -        077     4D     M        109     6D     m 
   046     2E     .        078     4E     N        110     6E     n 
   047     2F     /        079     4F     O        111     6F     o 
   048     30     0        080     50     P        112     70     p 
   049     31     1        081     51     Q        113     71     q 
   050     32     2        082     52     R        114     72     r 
   051     33     3        083     53     S        115     73     s 
   052     34     4        084     54     T        116     74     t 
   053     35     5        085     55     U        117     75     u 
   054     36     6        086     56     V        118     76     v 
   055     37     7        087     57     W        119     77     w 
   056     38     8        088     58     X        120     78     x 
   057     39     9        089     59     Y        121     79     y 
   058     3A     :        090     5A     Z        122     7A     z 
   059     3B     ;        091     5B     [        123     7B     { 
   060     3C     <        092     5C             124     7C     | 
   061     3D     =        093     5D     ]        125     7D     } 
   062     3E     >        094     5E     ^        126     7E     ~ 
   063     3F     ?        095     5F     _        127     7F    DEL

The first thirty-two codes (numbers 0--31) in ASCII are reserved for control characters: codes that may not themselves represent information, but that are used to control devices (such as printers) that make use of ASCII. For example, character 10 represents the "line feed" function (which causes a printer to advance its paper), and character 27 represents the "escape" key found on the top left of common keyboards.

Different operating systems use different conventions to represent the end of a line. Unix uses character 10 (line feed), MacOS uses character 13 (carriage return), and DOS and Windows use carriage return followed by line feed.

Code 32 is the "space" character, denoting the space between words, which is produced by the large space bar of a keyboard. Codes 33 to 126 are called the printable characters, which represent letters, digits, punctuation marks, and a few miscelaneous symbols (see Table 1). Note how uppercase characters can be converted to lowercase by adding 32 to their ASCII value; in binary, this can be accomplished simply by setting the sixth-least significant bit to 1.

Code 127 (all seven bits on) is another special character known as "delete" or "rubout". Though its function is similar to that of other control characters, it was placed at this position so that it could be used to erase a section of paper tape, a popular storage medium at the time, by punching out all its holes.

The international spread of computer technology led to many variations and extensions to the ASCII character set, since ASCII does not include accented letters and other symbols necessary to write most languages besides English that use Roman-based alphabets. International standard ISO 646 (1972) was the first attempt to remedy this problem, although it regrettably created compatibility problems as well. ISO 646 was still a seven-bit character set, and since no additional codes were available, some were re-assigned in language-specific variants. For example, the ASCII code 93 (the right square bracket, "]") is used in the German variant ISO 646-DE for the uppercase letter U with umlaut (Ü), and in the Danish variant ISO 646-DK for the uppercase letter A with ring (Å).

Improved technology brought out-of-band means to represent the information formerly encoded in the eighth bit of each byte, freeing this bit to add another 128 additional character codes for new assignments. Eight-bit standards such as ISO 8859 enabled a broader range of languages to be represented, but were still plagued with incompatibilities and limitations. Still, ISO 8859-1 and original 7-bit ASCII are the most common character encodings in use today, though Unicode (with a much larger code set) is quickly becoming standard in many places. These newer codes are backward-compatible: that is, the first 127 code points of each code are the same as ASCII, and the first 256 code points of Unicode are the same as ISO 8859-1.

ASCII does not specify any way to represent information about the structure or appearance of a piece of text. That requires the use of a markup language.

The portmanteau word "ASCIIbetical" has evolved to describe the collation of data in ASCII code order rather than genuine alphabetical order (which requires some tricky computation, and varies with language). (See [1].)