EBCDIC
EBCDIC (Extended Binary Coded Decimal Interchange Code) is an 8-bit character encoding used on IBM mainframes and AS/400s. It is descended from punched cards and the corresponding six bit binary-coded decimal code that most of IBM's computer peripherals of the late 1950s and early 1960s used. Outside of such IBM systems and compatible systems from other companies, ASCII (and its descendants such as Unicode) are normally used instead; EBCDIC is generally considered an anachronism.
Single byte EBCDIC takes up eight bits, which are divided in two pieces. The first four bits are called the zone and represent the category of the character, whereas the last four bits are the called the digit and identify the specific character. There are a number of different versions of EBCDIC, customised for different countries.
Some East Asian countries use a double byte extension of EBCDIC to allow display of Chinese, Japanese and Korean scripts for their mainframes. In the double byte extentension of EBCDIC, there are shift codes [0x0E,0x0F] to shift between the single byte and double byte modes.
IBM typically names all of its codepages with a number called a CCSID (Coded Character Set IDentifier). It is important to note that the same CCSID can have different character positions in a codepage. For example, the newline character can be a different byte value in os/390 open edition versus the other EBCDIC based operating systems. This becomes an issue when transferring EBCDIC based text data between machines.
History
EBCDIC was devised in the 1963-1964 timeframe by IBM and was announced with the release of the IBM System/360 line of mainframe computers at the apex of IBM’s mainframe monopoly. It was created to extend the Binary Coded Decimal that existed at the time. EBCDIC was the predecessor to ASCII, which was devised in 1968. EBCDIC is an 8 bit encoding, vs. the 7 bit encoding of ASCII. Many extensions to ASCII had been devised before Unicode became a standard.
There is a nice correspondence between hexadecimal character codes and punch card codes for EBCDIC — an important feature at the time. An IBM card punch could make a 12-row punch card with up to 2 punches per column, the first punch somewhere in the first 3 rows (called the zone) and the second punch somewhere in the last 9 rows (called the number). The zone could thus be considered a value from 0 to 3, and the number a value from 0 to 9, where 0 means no punch, and non-zero means the corresponding row was punched. The initial version of EBCDIC was just (0xf-zone)<<4+number and defined only the lower-left 10x4 part of the table shown below (the zone was apparently reversed so the letters would at least be in alphabetic order).
Unfortunately the Roman alphabet characters are numerically non-contiguous, which is a great annoyance for many of the more modern programming languages. It is more difficult to sort or select a range of characters in EBCDIC compared to ASCII (for example, sorting the range of characters [a-z] when accented characters are involved).
All IBM mainframe peripherals and operating systems used EBCDIC. They did provide an ASCII mode for reading magnetic tapes and some mainframe applications provide partial Unicode support.
There is an EBCDIC Unicode Transformation Format called UTF-EBCDIC proposed by the Unicode consortium, but it is not intended to be used in open interchange environments.
Codepage layout
This is CCSID 500, a variant of EBCDIC. Characters 0x00–0x3F and 0xFF are controls, 0x40 is space, 0x41 is no-break space, 0xCA is soft hyphen.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | ||
40 | â | ä | à | á | ã | å | ç | ñ | [ | . | < | ( | + | ! | |||
50 | & | é | ê | ë | è | í | î | ï | ì | ß | ] | $ | * | ) | ; | ^ | |
60 | - | / | Â | Ä | À | Á | Ã | Å | Ç | Ñ | ¦ | , | % | _ | > | ? | |
70 | ø | É | Ê | Ë | È | Í | Î | Ï | Ì | ` | : | # | @ | ' | = | " | |
80 | Ø | a | b | c | d | e | f | g | h | i | « | » | ð | ý | þ | ± | |
90 | ° | j | k | l | m | n | o | p | q | r | ª | º | æ | ¸ | Æ | ¤ | |
A0 | µ | ~ | s | t | u | v | w | x | y | z | ¡ | ¿ | Ð | Ý | Þ | ® | |
B0 | ¢ | £ | ¥ | · | © | § | ¶ | ¼ | ½ | ¾ | ¬ | | | ¯ | ¨ | ´ | × | |
C0 | { | A | B | C | D | E | F | G | H | I | | ô | ö | ò | ó | õ | |
D0 | } | J | K | L | M | N | O | P | Q | R | ¹ | û | ü | ù | ú | ÿ | |
E0 | \ | ÷ | S | T | U | V | W | X | Y | Z | ² | Ô | Ö | Ò | Ó | Õ | |
F0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ³ | Û | Ü | Ù | Ú |
See also
- EBCDIC-codepages with Latin-1-charset
- codepage 037 ( English, Portuguese )
- codepage 285 ( Ireland, United Kingdom )
External links
- F.0 Appendix F. Code Pages from AS/400 International Application Development V4R2
- ICU Converter Explorer Contains more information about EBCDIC, including DBCS EBCDIC (Double Byte Character Set EBCDIC)
- ICU Charset Mapping Tables Contains Unicode mapping tables for EBCDIC and many other character sets
- LegacyJ- EBCDIC Table
- Computer Character Set Table
- Unicode Technical Report #16: UTF-EBCDIC
- http://home.arcor.de/wzwz.de/wiki/ebcdic/cc_en.htm // EBCDIC-codepages with
Latin-1-Zeichensatz (JavaScript)
- http://home.arcor.de/wzwz.de/wiki/ebcdic/aa70_all_pages.zip // ZIPped version