ASCII
ASCII (American Standard Code for Information Interchange) is a character set and a character encoding based on the Roman alphabet as used in modern English, used by computers and other communication equipment to represent text and to control devices that work with text. Like other codes (such as IBM's EBCDIC), it specifies a correspondence between integers that can be represented digitally and the symbols of a written language, allowing digital devices to communicate with each other and to process and store information. The ASCII character encoding (or a compatible extension; see below) is used on nearly all common computers (especially personal computers and workstations). The preferred MIME name for this encoding is "US-ASCII".
ASCII was first published as a standard by the ASA (the American Standards Association, which is now ANSI) in 1963; its present form is essentially ANSI X3.4-1967 / ECMA-6.
ASCII is a seven-bit code, meaning that it uses the integers representable with seven binary digits (a range of 0 to 127) to represent information. Even at the time that ASCII was introduced, most computers dealt with eight-bit bytes as the smallest unit of information; the eighth bit was commonly used for error checking on communication lines or other device-specific functions.
The non-printable control characters:
Decimal | Hex | Value | Unicode printable representation | Interpretation |
---|---|---|---|---|
000 | 00 | NUL | ␀ | Null character |
001 | 01 | SOH | ␁ | Start of Header |
002 | 02 | STX | ␂ | Start of Text |
003 | 03 | ETX | ␃ | End of Text |
004 | 04 | EOT | ␄ | End of Transmission |
005 | 05 | ENQ | ␅ | Enquiry |
006 | 06 | ACK | ␆ | Acknowledgment |
007 | 07 | BEL | ␇ | Bell |
008 | 08 | BS | ␈ | Backspace |
009 | 09 | HT | ␉ | Horizontal Tab |
010 | 0A | LF | ␊ | Line Feed |
011 | 0B | VT | ␋ | Vertical Tab |
012 | 0C | FF | ␌ | Form Feed |
013 | 0D | CR | ␍ | Carriage Return |
014 | 0E | SO | ␎ | Shift Out |
015 | 0F | SI | ␏ | Shift In |
016 | 10 | DLE | ␐ | Data Link Escape |
017 | 11 | DC1 | ␑ | XON Device Control 1 |
018 | 12 | DC2 | ␒ | Device Control 2 |
019 | 13 | DC3 | ␓ | XOFF Device Control 3 |
020 | 14 | DC4 | ␔ | Device Control 4 |
021 | 15 | NAK | ␕ | Negative Acknowledgement |
022 | 16 | SYN | ␖ | Synchronous Idle |
023 | 17 | ETB | ␗ | End of Trans. Block |
024 | 18 | CAN | ␘ | Cancel |
025 | 19 | EM | ␙ | End of Medium |
026 | 1A | SUB | ␚ | Substitute |
027 | 1B | ESC | ␛ | Escape |
028 | 1C | FS | ␜ | File Separator |
029 | 1D | GS | ␝ | Group Separator |
030 | 1E | RS | ␞ | Record Separator |
031 | 1F | US | ␟ | Unit Separator |
Printable characters:
Decimal Hex Value Decimal Hex Value Decimal Hex Value ------- ---- ----- ------- ---- ----- ------- ---- ----- 032 20 (Space) 064 40 @ 096 60 ` 033 21 ! 065 41 A 097 61 a 034 22 " 066 42 B 098 62 b 035 23 # 067 43 C 099 63 c 036 24 $ 068 44 D 100 64 d 037 25 % 069 45 E 101 65 e 038 26 & 070 46 F 102 66 f 039 27 ' 071 47 G 103 67 g 040 28 ( 072 48 H 104 68 h 041 29 ) 073 49 I 105 69 i 042 2A * 074 4A J 106 6A j 043 2B + 075 4B K 107 6B k 044 2C , 076 4C L 108 6C l 045 2D - 077 4D M 109 6D m 046 2E . 078 4E N 110 6E n 047 2F / 079 4F O 111 6F o 048 30 0 080 50 P 112 70 p 049 31 1 081 51 Q 113 71 q 050 32 2 082 52 R 114 72 r 051 33 3 083 53 S 115 73 s 052 34 4 084 54 T 116 74 t 053 35 5 085 55 U 117 75 u 054 36 6 086 56 V 118 76 v 055 37 7 087 57 W 119 77 w 056 38 8 088 58 X 120 78 x 057 39 9 089 59 Y 121 79 y 058 3A : 090 5A Z 122 7A z 059 3B ; 091 5B [ 123 7B { 060 3C < 092 5C 124 7C | 061 3D = 093 5D ] 125 7D } 062 3E > 094 5E ^ 126 7E ~ 063 3F ? 095 5F _ 127 7F DEL
The first thirty-two codes (numbers 0--31) in ASCII are reserved for control characters: codes that may not themselves represent information, but that are used to control devices (such as printers) that make use of ASCII. For example, character 10 represents the "line feed" function (which causes a printer to advance its paper), and character 27 represents the "escape" key found on the top left of common keyboards.
Different operating systems use different conventions to represent the end of a line. Unix uses character 10 (line feed), MacOS uses character 13 (carriage return), and DOS and Windows use carriage return followed by line feed.
Code 32 is the "space" character, denoting the space between words, which is produced by the large space bar of a keyboard. Codes 33 to 126 are called the printable characters, which represent letters, digits, punctuation marks, and a few miscelaneous symbols (see Table 1). Note how uppercase characters can be converted to lowercase by adding 32 to their ASCII value; in binary, this can be accomplished simply by setting the sixth-least significant bit to 1.
Code 127 (all seven bits on) is another special character known as "delete" or "rubout". Though its function is similar to that of other control characters, it was placed at this position so that it could be used to erase a section of paper tape, a popular storage medium at the time, by punching out all its holes.
The international spread of computer technology led to many variations and extensions to the ASCII character set, since ASCII does not include accented letters and other symbols necessary to write most languages besides English that use Roman-based alphabets. International standard ISO 646 (1972) was the first attempt to remedy this problem, although it regrettably created compatibility problems as well. ISO 646 was still a seven-bit character set, and since no additional codes were available, some were re-assigned in language-specific variants. For example, the ASCII code 93 (the right square bracket, "]") is used in the German variant ISO 646-DE for the uppercase letter U with umlaut (Ü), and in the Danish variant ISO 646-DK for the uppercase letter A with ring (Å).
Improved technology brought out-of-band means to represent the information formerly encoded in the eighth bit of each byte, freeing this bit to add another 128 additional character codes for new assignments. Eight-bit standards such as ISO 8859 enabled a broader range of languages to be represented, but were still plagued with incompatibilities and limitations. Still, ISO 8859-1 and original 7-bit ASCII are the most common character encodings in use today, though Unicode (with a much larger code set) is quickly becoming standard in many places. These newer codes are backward-compatible: that is, the first 127 code points of each code are the same as ASCII, and the first 256 code points of Unicode are the same as ISO 8859-1.
ASCII does not specify any way to represent information about the structure or appearance of a piece of text. That requires the use of a markup language.
The portmanteau word "ASCIIbetical" has evolved to describe the collation of data in ASCII code order rather than genuine alphabetical order (which requires some tricky computation, and varies with language). (See [1].)
See also Extended ASCII, Unicode.