Computer virus

In computer security terminology, a virus is a self-replicating program that spreads by inserting copies of itself into other executable code or documents (for a complete definition: see below). Thus, a computer virus behaves in a way similar to a biological virus, which spreads by inserting itself into living cells. Extending the analogy, the insertion of the virus into a program is termed infection, and the infected file (or executable code that is not part of a file) is called a host. Viruses are one of the several types of malware or malicious software. In common parlance, the term virus is often extended to refer to computer worms and other sorts of malware.

While viruses can be intentionally destructive (for example, by destroying data), many other viruses are fairly benign or merely annoying. Some viruses have a delayed payload, which is sometimes called a bomb. For example, a virus might display a message on a specific day or wait until it has infected a certain number of hosts. However, the predominant negative effect of viruses is their uncontrolled self-reproduction, which wastes or overwhelms computer resources.

Today (2004), viruses are somewhat less common than network-borne worms, due to the popularity of the Internet. Anti-virus software, originally designed to protect computers from viruses, has in turn expanded to cover worms and other threats such as spyware.

Definition

A virus is a type of program that can replicate itself by making (possibly modified) copies of itself. The main criterium for classifying a piece of executable code as a virus is that it spreads itself by means of 'hosts'. A virus can only spread from one computer to another when its host is taken to the uninfected computer, for instance by a user sending it over a network or carrying it on a removable disk. Viruses are sometimes confused with worms. A worm, however, can spread itself to other computers without needing to be transferred as part of a host.

Viruses can infect different types of hosts. The most common target are executable files that contain application software or parts of the operating system. Viruses have also infected the executable boot sectors of floppy disks, script files of application programs, and documents that can contain macro scripts. Additionally, viruses can infect files in other ways than simply inserting a copy of their code into the code of the host program. For example, a virus can overwrite its host with the virus code, or it can use a trick to ensure that the virus program is executed when the user wants to execute the (unmodified) host program. Viruses have existed for many different operating systems, including MS-DOS, AmigaDOS, and Mac OS; today, the majority of viruses run on Microsoft Windows.

A legitimate application program that can copy itself as a side-effect of its normal function (e.g. backup software) is not considered a virus. Some programs that were apparently intended as viruses cannot reliably self-replicate, because the infection routine contain bugs. For example, a buggy virus can insert copies of itself into host programs, but these copies never get executed and are thus unable to spread the virus. Self-replicating programs that have very limited spreading capabilities because of bugs are sometimes not considered as being viruses.

Use of the word "virus"

The term "virus" was first used in an academic publication by Fred Cohen in his 1984 paper Experiments with Computer Viruses, where he credits Len Adleman with coining it. However, a mid-1970s science fiction novel by David Gerrold, When H.A.R.L.I.E. was One, includes a description of a fictional computer program called "VIRUS" that worked just like a virus (and was countered by a program called "ANTIBODY"); and John Brunner's 1975 novel The Shockwave Rider describes programs known as "tapeworms" which spread through a network for the purpose of deleting data. The term "computer virus" with current usage also appears in the comic book "Uncanny X-Men" No. 158, published in 1982. And even earlier, in 1973, the phrase "computer virus" was used in the movie Westworld to describe a malcicious program that emerged in the computer system of the theme park. Therefore, we may conclude that although Cohen's use of "virus" may, perhaps, have been the first "academic" use, the term has been used earlier.

The term 'virus' is often used in common parlance to describe all kinds of malware (malicious software), including those that are more properly classified as worms or trojans. Most popular anti-viral software packages defend against all of these types of attack.

The plural of virus is viruses, not virii, which is sometimes used incorrectly, both knowingly and otherwise. See plural of virus.

History

A program called "Elk Cloner" is credited with being the first computer virus to appear "in the wild" -- that is, outside the single computer or lab where it was created. Written in 1982 by Rich Skrenta, it attached itself to the Apple DOS 3.3 operating system and spread by floppy disk.

Since the mid-1990s, viruses which infect operating systems or applications directly have been eclipsed by macro viruses. Written in the scripting languages for Microsoft programs such as Word and Outlook, these viruses spread in the Windows monoculture by infecting documents and sending infected e-mail.

Reasons for creating viruses

Unlike biological viruses, computer viruses do not simply evolve by themselves. They cannot come into existence spontaneously, nor can they be created by bugs in regular programs. They are deliberately created by programmers, or by people who use virus creation software.

Virus writers can have various reasons for creating and spreading malware. Viruses have been written as research projects, pranks, vandalism, to attack the products of specific companies, and to distribute political messages. Some people think that the majority of viruses are created with malicious intent. On the other hand, some virus writers consider their creations to be a work of art, and see virus writing as a creative hobby. Additionally, many virus writers oppose deliberately destructive payload routines. Some viruses were intended as "good viruses". They spread improvements to the programs they infect, or delete other viruses. These viruses are, however, quite rare, still consume system resources, and may accidentally damage systems they infect. Moreover, they normally operate without asking for permission of the owner of the computer. Since self-replicating code causes many complications, it is questionable if a well-intentioned virus can ever solve a problem in a way which is superior to a regular program that does not replicate itself.

Releasing computer viruses (as well as worms) is a crime in most jurisdictions.

Anatomy of viruses

Some viruses just consist of a finder and a replicator. The finder is responsible for finding new files to infect. For each new executable file the finder finds, it calls for the replicator to infect that file.

For simple viruses the replicators task is to:

Open the new file
Append the virus code to the executable file
Save the executables starting point
Change the executables starting point so that it points to the start location of the newly copied virus code
Save the old start location to the virus in a way so that the virus branches to that location right after its execution.
Save the changes to the executable file
Return to the finder so that it can find new files for the replicator to infect.

Replication Strategies

A virus requires several features from its host software to successfully duplicate itself. It must be permitted to execute code and write to memory. For this reason, many viruses attach themselves to useful programs, in the hope that users will run those programs (and therefore the virus).

Before computer networks became widespread, most viruses spread on removable media, particularly floppy disks. In the early days of personal computers, many users regularly exchanged information and programs on floppies. Some viruses spread by infecting programs stored on these disks, while others installed themselves into the disk boot sector, ensuring that they would be run when the user booted the computer from the disk.

As bulletin board systems and online software exchange became popular in the late 1980s and early 1990s, more viruses were written to infect popularly traded software. Shareware and bootleg software were equally common vectors for viruses on BBSes. Within the "pirate scene" of hobbyists trading illicit copies of commercial software, traders in a hurry to obtain the latest applications and games were easy targets for viruses.

Many personal computers are now connected to the Internet and to local-area networks. Today's viruses take advantage of standard network protocols such as the World Wide Web, e-mail, and file sharing systems to spread, blurring the line between viruses and worms.

Methods to avoid detection

In order to avoid detection by users, some viruses employ different kinds of obfuscation. Some old viruses , especially on the MS-DOS platform, make sure that the "last modified" date of a host file stays the same when the file is infected by the virus. This approach does not fool anti-virus software however.

Some viruses can infect files without increasing their sizes or damaging the files. They accomplish this by overwriting unused areas of executable files. These are called cavity viruses. For example the CIH virus, or Chernobyl Virus, infects Portable Executable files. Because those files had many empty gaps, the virus, which was 1 kilobyte in length, did not add to the size of the file.

As computers and operating systems grow larger and more complex, old hiding techniques need to be updated or replaced.

Stealth

Some viruses try to fool anti-virus software by intercepting its requests to the operating system. A virus can hide itself by ensuring that a request of anti-virus software to read an infected file is passed to the virus, instead of to the operating system. The virus can then return an uninfected version of the file to the anti-virus software, so that it seems that the file is clean. Modern anti-virus software employs various techniques to counter stealth mechanisms of viruses. The only completely reliable method to avoid stealth is to boot from a medium that is known to be clean however.

Self-modification

Most modern antivirus programs try to find virus-patterns inside ordinary programs by scanning them for so-called virus signatures. A signature is a characteristic byte-pattern that if part of a certain virus or family of viruses. If a virus scanner finds such a pattern in a file, it notifies the user that the file is infected. The user can then delete or (in some cases) 'clean' the infected file. Some viruses try to make detection by means of signatures difficult or impossible by modifying their code on each infection. That is, each infected file contains a different variant of the virus.

Simple self-modifications

In the past, some viruses modified themselves only in fairly simple ways. For example, they regularly exchanged subroutines in their code. This poses no problems to a somewhat advanced virus scanner however.

Encryption with a variable key

A more advanced method is the use of simple encryption to encode the virus. In this case, the virus consists of a small decrypting module and an encrypted copy of the virus code. If the virus is encrypted with a different key for each infected file, the only part of the virus that remains constant is the decrypting module. In this case, a virus scanner cannot directly detect the virus using signatures, but it can still detect the decrypting module, which still makes indirect detection of the virus possible.

Mostly, the decryption techniques that these viruses employ are fairly simple and mostly done by just xoring each byte with a randomized key that was saved by the parent virus. The use of XOR-operations has the additional advantage that the encryption and decryption routine are the same (a xor b = c, c xor b = a.)

Polymorphic code

Polymorphic code was the first technique that posed a serious threat to virus scanners. A polymorphic virus also infects files with an encrypted copy of itself, which is decoded by a decryption module. In the case of polymorphic viruses however, this decryption module is also modified on each infection. A well-written polymorphic virus therefore has no parts that stay the same on each infection. It is impossible to detect it directly using signatures. While not being able to detect the virus at all when it starts its execution, the anti virus-software can still detect it by decrypting the viruses using an emulator, or by statistical pattern analysis of the encrypted virus body. To enable polymorphic code, the virus has to have a polymorphic engine (also called mutating engine or mutation engine) somewhere in its encrypted body.

Metamorphic code

To avoid being detected by emulation, some viruses rewrite themselves completely each time they are to infect new executables. Viruses that uses this technique are said to be metamorphic. To enable metamorphism, a metamorphic engine is needed. A metamorphic virus is usually very large and complex. W32/Smile consisted of over 14000 lines of assembly code, for example. 90% of it is part of the metamorphic engine.

Viruses and legitimate software

The vulnerability of operating systems to viruses

Another analogy to biological viruses: just as genetic diversity in a population decreases the chance of a single disease wiping out a population, the diversity of software systems on a network similarly limits the destructive potential of viruses.

This became a particular concern in the 1990s, when Microsoft gained market dominance in desktop operating systems and office software. Users of Microsoft software (especially networking software such as Microsoft Outlook and Microsoft Internet Explorer) are particularly vulnerable to the spread of viruses, especially since such complicated software inevitably includes many errors. Integrated applications, applications with scripting languages with access to the file system (eg: Visual Basic Script, or VBS, and applications with networking features) are also particularly vulnerable.

Although Windows is the most popular operating system for virus writers, some viruses also exist on other platforms. It is important to note that any operating system that allows third-party programs to run can theoretically run viruses. However, some operating systems are less secure than others. Unix-based OSes (and NTFS-aware applications on Windows NT based platforms) only allow their users to run executables within their protected space in their own directories.

A well-patched and well-maintained Unix system is very well-secured against viruses. Windows has the same type of scripting ability as Unix-based systems, but doesn't natively block normal users from executing such scripts written by a third-party as Unix does for users who are not running as root. More recently, Microsoft's Outlook (but not Outlook Express) e-mail client has developed similar features when dealing with executable file types that Outlook may download as attachments. Ordinary users would do well to patch their operating systems and e-mail clients to prevent viruses and worms from reproducing through security "holes" which prudence (and most virus scanners) are unable to prevent.

The role of software development

Because software is often designed with security features to prevent unauthorized use of system resources, many viruses must exploit software bugs in a system or application to spread. Software development strategies which produce large numbers of bugs will generally also produce potential exploits.

Closed-source software development as practiced by Microsoft and other proprietary software companies is also seen by some as a security weakness. Open source software such as GNU/Linux kernel, for example, allows all users to look for and fix security problems without relying on a single vendor. Some advocate that proprietary software makers practice vulnerability disclosure to ameliorate this weakness.

Anti-virus software and other countermeasures

Many users install anti-virus software that can detect and eliminate known viruses after the computer downloads or mounts the executable. Some virus scanners can also warn a user if a file is likely to contain a virus based on the file type; some antivirus vendors also claim the effective use of other types of heuristic analysis. Some industry groups do not like this practice because it often increases the number of false positives the anti-virus software detects. They work by examining the contents of the computers memory (its RAM, and boot sector) and the files stored on fixed or removable drives (hard drives, floppy drives), and comparing those files against a database of known virus signatures. Some anti-virus programs are able to scan opened files in addition to sent and received emails 'on the fly' in a similar manner. This practice is known as "on-access scanning." Anti-virus software does not change the underlying capability of host software to transmit viruses. Users must therefore update their software regularly to patch security holes. Anti-virus software also needs to be updated in order to gain knowledge about the latest threats and hoaxes.

References

Fred Cohen's 1984 paper
An editorial on beneficial viruses (con)
For a thorough, hypothetical pro discussion, see: "Are Good Viruses still a Bad idea?"
Malicious Code & Viruses - Articles, Links, and Whitepapers
For instructions on how to reject viruses at SMTP-time instead of spamming innocent people, see: Rejecting Viruses at SMTP-time
VX Heaven - Sources & Guides
Hackpalace Virii
The Wildlist List of viruses and worms 'in the wild' (i.e. regularly encountered by anti-virus companies)