IA-64

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Bad Byte (talk | contribs) at 23:16, 17 September 2004 (categorized). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computing, IA-64 (Intel Architecture-64) is a 64-bit CPU architecture developed by Intel and Hewlett-Packard for processors such as Itanium. Unlike previous Intel x86 processors, the Itanium is not geared toward high performance execution of the IA-32 (x86) instruction set.

Architecture

EPIC

In a mainstream "out-of-order" design, a complex decoder system examines each instruction as they flow through the pipeline and sees which can be fed off to operate in parallel across the available execution units — e.g., a series of instructions that say A = B + C and D = F + G will not affect each other, and so they can be fed into two different execution units and run in parallel. The ability to extract instruction level parallelism (ILP) from the instruction stream is essential to good performance in a modern CPU.

Predicting which code can and cannot be split up this way is a very complex task. In many cases the inputs to one line are dependent on the output from another, but only if some other condition is true. For instance, consider the slight modification of the example noted before, A = B + C; IF A==5 THEN D = F + G. In this case the calculations remain independent of the other, but the second command requires the results from the first calculation in order to know if it should be run at all.

In these cases the circuitry on the CPU typically "guesses" what the condition will be. In something like 90% of all cases, an IF will be taken, suggesting that in our example the second half of the command can be safely fed into another core. However, getting the guess wrong can cause a significant performance hit when the result has to be thrown out and the CPU waits for the results of the "right" command to be calculated. Much of the improving performance of modern CPUs is due to better prediction logic, but lately the improvements have begun to slow.

IA-64 instead relies on the compiler for this task. Even before the program is fed into the CPU, the compiler examines the code and makes the same sorts of decisions that would otherwise happen at "run time" on the chip itself. Once it has decided what paths to take, it gathers up the instructions it knows can be run in parallel, bundles them into one larger instruction, and then stores it in that form in the program—hence the name VLIW or "very long instruction word."

Moving this task from the CPU to the compiler has several advantages. Firstly, the compiler can spend considerably more time examining the code, a benefit the chip itself doesn't have because it has to complete as quickly as possible. Thus the compiler version can be considerably more accurate than the same code run on the chip's circuitry. Secondly, the prediction circuitry is quite complex, and offloading prediction to the compiler reduces that complexity enormously. It no longer has to examine anything; it simply breaks the instruction apart again and feeds the pieces off to the cores. Thirdly, doing the prediction in the compiler is a one-off cost, rather than one incurred every time the program is run.

The downside is that a running program's behaviour is not always obvious in the code used to generate it, and may vary considerably depending on the actual data being processed. The out-of-order processing logic of a mainstream CPU can make decisions on the basis of actual run-time data which the compiler can only guess at. That means that it is possible for the compiler to get its prediction wrong even more often than the comparable logic placed on the CPU. The VLIW design thus relies heavily on the performance of the compilers, the trade-off being to decrease microprocessor hardware complexity by increasing compiler software complexity.

Registers

The IA-64 architecture includes a very generous complement of registers: 128 each of 82-bit floating point and 64-bit integer registers. In addition to the sheer number, IA-64 adds in a register rotation mechanism that is controlled by the Register Stack Engine. Rather than the typical spill/fill or window mechanisms used in other processors, the Itanium can rotate in a set of new registers to accommodate for new function parameters or temporaries. The register rotation mechanism combined with predication is also very effective in executing automatically unrolled loops.

Instruction set

The architecture also provides a CISC-like complement of instructions. Thus we have explicit instructions for multimedia operations, and explicit instructions for floating point operations.

Despite its great capabilities, the IA-64 instruction set is notoriously difficult to program directly. Intel has strongly recommended against the practice of assembly programming on Itanium, in general, and instead use their C++ compiler, which contains platform-specific heuristics.

A raw Itanium, when first booted, is actually missing some of its instruction functionality. A boot-rom like program called an EFI program is loaded which loads additional code into on-chip memory for defining these instructions, and performing other boot-time configurations, such as choosing the execution mode of the processor (64-bit versus 32-bit.) This design allows an Itanium system to be deployed with different capabilities depending on the contents of the EFI program.

IA-32 support

In order to support IA-32, the Itanium can switch into 32-bit mode with special jump escape instructions. The IA-32 instructions have been mapped to the Itanium's functional units. However, since the Itanium is built primarily for speed of its EPIC-style instructions, and because it has no out-of-order execution capabilities, IA-32 code executes at a severe performance penalty compared to either the IA-64 mode, or its Pentium line of processors. For example, the Itanium functional units do not automatically generate integer flags as a side effect of ordinary ALU computation, and does not intrinsically support multiple outstanding unaligned memory loads. There have been reports that Intel is seeking a software emulation based solution (much like Transmeta did) for executing IA-32 code in future versions of Itanium to replace the current hardware solution.

Competitors

Although other 64-bit architectures have existed for a long time, most (MIPS, Alpha, PA-RISC) have faded from the marketplace. Itanium's remaining competition for the 64-bit server and workstation market appear to be the newcomer AMD with its AMD64 architecture, and the entrenched rivals: IBM's POWER architecture, and Sun's Sparc64 architecture. Apple may also challenge Intel with its XServe product line based on the IBM PowerPC architecture.

Intel has so far only marketed the Itanium in the high-end work station, server and super-computer space.

In response to favourable industry reaction to the AMD64, Intel has plans for future versions of the Xeon to support its EM64T extensions to IA-32, which are largely instruction-set compatible with AMD64.

External links