Hard disk drive

File:Hard-drive-allsorts-1990s-l.jpg

Typical hard drives of the mid-1990s.

A hard disk (or "hard disc") is a computer storage device using rigid rotating platters. It stores and retrieves digital data from a planar magnetic surface. Information is written to the disk by transmitting an electromagnetic flux through an antenna or write head that is very close to a magnetic material that changes its polarization due to the flux. The information can be read back in a reverse manner, as the magnetic fields cause electrical change in the coil or read head that passes over it.

A typical hard disk drive design consists of a central axis or spindle upon which the platters spin at a constant speed. Moving along and between the platters on a common armature are the read-write heads, with one head for each platter face. The armature moves the heads radially across the platters as they spin, allowing each head access to the entirety of the platter.

The associated electronics control the movement of the read-write armature and the rotation of the disk, and perform reads and writes on demand from the disk controller. Modern drive electronics are capable of scheduling reads and writes efficiently across the disk, and of remapping sectors of the disk which have failed.

The (mostly) sealed enclosure protects the drive internals from dust, condensation, and other sources of contamination. The hard disk's read-write heads fly on an air bearing (a cushion of air) only nanometers above the disk surface. The disk surface and the drive's internal environment must therefore be kept immaculately clean, as fingerprints, hair, dust, and even smoke particles have mountain-sized dimensions when compared to the submicroscopic gap that the heads maintain.

Some people believe a disk drive contains a vacuum — this is incorrect, as the system relies on air pressure inside the drive to support the heads at their proper flying height while the disk is in motion. Another common misconception is that a hard drive is totally sealed. A hard disk drive requires a certain range of air pressures in order to operate properly. If the air pressure is too low, the air will not exert enough force on the flying head, the head will not be at the proper height, and there is a risk of head crashes and data loss. (Specially manufactured sealed and pressurized drives are needed for reliable high-altitude operation, above about 10,000 feet.) Some modern drives include flying height sensors to detect if the pressure is too low, and temperature sensors to alert the system to overheating problems.

Hard disk drives are not airtight. They have a permeable filter (a breather filter) between the top cover and inside of the drive, to allow the pressure inside and outside the drive to equalize while keeping out dust and dirt. The filter also allows moisture in the air to enter the drive. Very high humidity year-round will cause accelerated wear of the drive's heads (by increasing stiction, or the tendency for the heads to stick to the disk surface, which causes physical damage to the disk and spindle motor). You can see these breather holes on all drives -- they usually have a warning sticker next to them, informing the user not to cover the holes. The air inside the operating drive is constantly moving too, being swept in motion by friction with the spinning disk platters. This air passes through an internal filter to remove any leftover contaminants from manufacture, any particles that may have somehow entered the drive, and any particles generated by head crash.

Due to the extremely close spacing of the heads and disk surface, any contamination of the read-write heads or disk platters can lead to a head crash — a failure of the disk in which the head scrapes across the platter surface, often grinding away the thin magnetic film. For GMR heads in particular, a minor head crash from contamination (that does not remove the magnetic surface of the disk) will still result in the head temporarily overheating, due to friction with the disk surface, and renders the disk unreadable until the head temperature stabilizes. Head crashes can be caused by electronic failure, a sudden power failure, physical shock, wear and tear, or poorly manufactured disks. Normally, when powering down, a hard disk moves its heads to a safe area of the disk, where no data is ever kept (the landing zone). However, especially in old models, sudden power interruptions or a power supply failure can result in the drive shutting down with the heads in the data zone, which increases the risk of data loss. Newer drives are designed such that the rotational inertia in the platters is used to safely park the heads in the case of unexpected power loss. In recent years, IBM pioneered drives with "head unloading" technology, where the heads are lifted off the platters onto "ramps" instead of having them rest on the platters. Other manufacturers have begun using this technology as well.

Spring tension from the head mounting constantly pushes the heads towards the disk. While the disk is spinning, the heads are supported by an air bearing, and experience no physical contact wear. The sliders (the part of the head that is closest to the disk and contains the pickup coil itself) are designed to reliably survive a number of landings and takeoffs from the disk surface, though wear and tear on these microscopic components eventually takes its toll. Most manufacturers design the sliders to survive 50,000 contact cycles before the chance of damage on startup rises above 50%. However, the decay rate is not linear — when a drive is younger and has fewer start/stop cycles, it has a better chance of surviving the next startup than an older, higher-mileage drive (literally, as the head drags along the drive surface until the air bearing is established). For the Maxtor DiamondMax series of drives, for instance, the drive typically has a 0.02% chance of failing after 4,500 cycles, a 0.05% chance after 7,500 cycles, with the chance of failure rising geometrically to 50% after 50,000 cycles, and increasing ever after.

Using rigid platters and sealing the unit allows much tighter tolerances than in a floppy disk. Consequently, hard disks can store much more data than floppy disk, and access and transmit it faster. In 2004, a typical workstation hard disk might store between 80 GB and 250 GB of data, rotate at 5400 to 10,000 rpm, and have an average transfer rate of over 30 MB/s. The fastest workstation hard drives spin at 15,000 rpm. Notebook hard drives are smaller and slower than their desktop counterparts. Most spin at only 4200 rpm or 5400 rpm, though the newest top models spin at 7200 rpm.

Performance

There are three primary factors that determine hard drive performance: seek time, latency and internal data transfer rate:

Seek time is a measure of the speed with which the drive can position its read/write heads over any particular data track. Because neither the starting position of the head nor the distance from there to the desired track is fixed, seek time varies greatly, and it is almost always measured as an average seek time, though full-track (the longest possible) and track-to-track (the shortest possible) seeks are also quoted sometimes. The standard way to measure seek time is to time a large number of disk accesses to random locations, subtract the latency (see below) and take the mean. Note, however, that two different drives with identical average seek times can display quite different performance characteristics. Seek time is always measured in milliseconds (ms), and often regarded as the single most important determinant of drive performance, though this claim is debated. (More on seek time.)

All drives have rotational latency: the time that elapses between the moment when the read/write head settles over the desired data track and the moment when the first byte of the required data appears under the head. For any individual read or write operation, latency is random between zero (if the first data sector happens to be directly under the head at the exact moment that the head is ready to begin reading or writing) and the full rotational period of the drive (for a typical 7200 rpm drive, just under 8.4 ms). However, on average, latency is always equal to one half of the rotational period. Thus, all 5400 rpm drives of any make or model have 5.56 ms latency; all 7200 rpm drives, 4.17 ms; all 10,000 rpm drives, 3.0 ms; and all 15,000 rpm drives have 2.0 ms latency. Like seek time, latency is a critical performance factor and is always measured in milliseconds. (More on latency.)

The internal data rate is the speed with which the drive's internal read channel can transfer data from the magnetic media. (Or, less commonly, in the reverse direction.) Previously a very important factor in drive performance, it remains significant but less so than in prior years, as all modern drives have very high internal data rates. Internal data rates are normally measured in Megabits per second (Mbit/s).

Subsidiary performance factors include:

Access time is simply the sum of the seek time and the latency. It is important not to mistake seek time figures for access time figures!

The external data rate is the speed with which the drive can transfer data from its buffer to the host computer system. Although in theory this is vital, in practice it is usually a non-issue. It is a relatively trivial matter to design an electronic interface capable of outpacing any possible mechanical read/write mechanism, and it is routine for computer makers to include a hard drive controller interface that is significantly faster than the drive it will be attached to. As a general rule, modern ATA and SCSI interfaces are capable of dealing with at least twice as much data as any single drive can deliver; they are, after all, designed to handle two or more drives per bus even though a desktop computer usually mounts only one. For a single-drive computer, the difference between ATA-100 and ATA-133, for example, is largely one of marketing rather than performance. No drive yet manufactured can utilise the full bandwidth of an ATA-100 interface, and few are able to send more data than an ATA-66 interface can accept. The external data rate is usually measured in Megabytes per second. (MB/s — note the upper-case "B".)

Command overhead is the time it takes the drive electronics to interpret instructions from the host computer and issue commands to the read/write mechanism. In modern drives it is negligible.

Access and interfaces

A hard disk is generally accessed over one of a number of bus types, including ATA (IDE, EIDE), SCSI, FireWire/IEEE 1394, and Fibre Channel. In late 2002 Serial ATA was introduced.

Back in the days of the ST-506 interface, the data encoding scheme was also important. The first ST-506 disks used Modified Frequency Modulation (MFM) encoding, which was originally developed for floppy drives (and is still used on the common "1.44 MB" (1.4 MiB) 3.5-inch floppy), and ran at a data rate of 5 megabits per second. Later on, controllers using 2,7 RLL (or just "RLL") encoding increased this by half, to 7.5 megabits per second; it also increased drive capacity by half.

Many ST-506 interface drives were only certified by the manufacturer to run at the lower MFM data rate, while other models (usually more expensive versions of the same basic drive) were certified to run at the higher RLL data rate. In some cases, the drive was overengineered just enough to allow the MFM-certified model to run at the faster data rate; however, this was often unreliable and was not recommended. (An RLL-certified drive could run on a MFM controller, but with 1/3 less data capacity and speed.)

ESDI also supported multiple data rates (ESDI drives always used 2,7 RLL, but at 10, 15 or 20 megabits per second), but this was usually negotiated automatically by the drive and controller; most of the time, however, 15 or 20 megabit ESDI drives weren't downward compatible (i.e. a 15 or 20 megabit drive wouldn't run on a 10 megabit controller). ESDI drives typically also had jumpers to set the number of sectors per track and (in some cases) sector size.

SCSI originally had just one speed, 5 MHz (for a maximum data rate of 5 megabytes per second), but this was increased dramatically later. The SCSI bus speed had no bearing on the drive's internal speed because of buffering between the SCSI bus and the drive's internal data bus; however, many early drives had very small buffers, and thus had to be reformatted to a different interleave (just like ST-506 drives) when used on slow computers, such as early IBM PC compatibles and Apple Macintoshes.

ATA drives have typically had no problems with interleave or data rate, due to their controller design, but many early models were incompatible with each other and couldn't run in a master/slave setup (two drives on the same cable). This was mostly remedied by the mid-1990s, when ATA's specfication was standardised and the details begun to be cleaned up, but still causes problems occasionally (especially with CD-ROM and DVD-ROM drives, and when mixing Ultra DMA and non-UDMA devices). Serial ATA does away with master/slave setups entirely, placing each drive on its own channel (with its own set of I/O ports) instead.

Addressing modes

There are two modes of addressing the data blocks on more recent hard disks. The older one is the CHS addressing (Cylinder-Head-Sector), used on old ST-506 and ATA drives and internally by the PC BIOS, and the more recent one the LBA (Logical Block Addressing), used by SCSI drives and newer ATA drives (ATA drives power up in CHS mode for historical reasons).

CHS describes the disk space in terms of its physical dimensions, data-wise; this is the traditional way of accessing a disk on IBM PC compatible hardware, and while it works well for floppies (for which it was originally designed) and small hard disks, it caused problems when disks started to exceed the design limits of the PC's CHS implementation. The traditional CHS limit was 1024 cylinders, 16 heads and 63 sectors; on a drive with 512-byte sectors, this comes to 504 MiB (528 megabytes). The origin of the CHS limit lies in a combination of the limitations of IBM's BIOS interface (which allowed 1024 cylinders, 256 heads and 64 sectors; sectors were counted from 1, reducing that number to 63, giving an addressing limit of 8064 MiB or just under 8 GiB), and a hardware limitation of the AT's hard disk controller (which allowed up to 65536 cylinders and 256 sectors, but only 16 heads, putting its addressing limit at 2^24 bits or 128 GiB).

When drives larger than 504 MiB began to appear in the mid-1990s, many system BIOSes had problems communicating with them, requiring LBA BIOS upgrades or special driver software to work correctly. Even after the introduction of LBA, similar limitations reappeared several times over the following years: at 2.1, 4.2, 8.4, 32, and 128 GiB. The 2.1, 4.2 and 32 GiB limits are hard limits: fitting a drive larger than the limit results in a PC that refuses to boot, unless the drive includes special jumpers to make it appear as a smaller capacity. The 8.4 and 128 GiB limits are soft limits: the PC simply ignores the extra capacity and reports a drive of the maximum size it is able to communicate with.

SCSI drives, however, have always used LBA addressing, which describes the disk as a linear, sequentially-numbered set of blocks. SCSI mode page commands can be used to get the physical specifications of the disk, but this is not used to read or write data; this is an artifact of the early days of SCSI, circa 1986, when a disk attached to a SCSI bus could just as well be an ST-506 or ESDI drive attached through a bridge (and therefore having a CHS configuration that was subject to change) as it could a native SCSI device. Because PCs use CHS addressing internally, the BIOS code on PC SCSI host adapters does CHS-to-LBA translation, and provides a set of CHS drive parameters that tries to match the total number of LBA blocks as closely as possible.

ATA drives can either use their native CHS parameters (only on very early drives; hard drives made since the early 1990s use multiple-zone recording, and thus don't have a set number of sectors per track), use a "translated" CHS profile (similar to what SCSI host adapters provide), or run in ATA LBA mode, as specified by ATA-2. To maintain some degree of compatibility with older computers, LBA mode generally has to be requested explicitly by the host computer. ATA drives larger than 8 GiB are always accessed by LBA, due to the 8 GiB limit described above.

Manufacturers

Most of the world's hard disks are now manufactured by just a handful of large firms: Seagate, Maxtor, Western Digital, Samsung, and the former drive manufacturing division of IBM, now sold to Hitachi. Fujitsu continues to make specialist notebook and SCSI drives but exited the mass market in 2001. Toshiba is a major manufacturer of 2.5-inch notebook drives.

Dozens of former hard drive manufacturers have gone out of business, merged, or closed their hard drive divisions, notably Conner Peripherals (merged with Seagate in 1996), Quantum (now a tape drive specialist with the hard drive division sold to Maxtor), Micropolis (sold in 1996 to Singapore Technologies, who closed it in 1997), JTS (went bankrupt in early 1999), and MiniScribe (who went bankrupt in 1990 after questionable accounting practices; they were eventually purchased by Maxtor).

It is important to note that hard drive manufacturers often use the decimal definition of a gigabyte or megabyte. As a result, after the drive is installed it appears that a few gigabytes or megabytes have disappeared. In reality computers operate based upon the binary numeral system. In the decimal number system a gigabyte is 7.5% smaller than in the binary number system. The term "1.44 MB" often used to describe 1440 KB floppies (actually 1.47 MB or 1.4 MiB) introduced an anomalous definition of "megabyte" as 1 x 10^3 x 2^10 bytes (1 KKiB).

Hard disk usage

From the original use of a hard drive in a single computer, techniques for guarding against hard disk failure were developed such as the redundant array of independent disks (RAID). Hard disks are also found in network attached storage devices, but for large volumes of data are most efficiently used in a Storage Area Network.

External links

The PC Guide: A Brief History of the Hard Disk Drive
Binary versus Decimal
Multi Disk System Tuning HOWTO