Software testing

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 164.164.170.25 (talk) at 07:02, 11 October 2005 (→‎[[Integration testing]]). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Software testing is a process used to help identify the correctness, completeness and quality of developed computer software. With that in mind, testing can never completely establish the correctness of computer software. Only the process of formal verification can prove that there are no defects. (That said, since the proofs or proof engines themselves are typically complex systems constructed by fallible humans, we aren't entitled to be entirely confident with formal methods.)

There are many approaches to software testing, but effective testing of complex products is essentially a process of investigation, not merely a matter of creating and following rote procedure. One definition of testing is "the process of questioning a product in order to evaluate it", where the "questions" are things the tester tries to do with the product, and the product answers with its behavior in reaction to the probing of the tester. Although most of the intellectual processes of testing are nearly identical to that of review or inspection, the word testing is connoted to mean the dynamic analysis of the product—putting the product through its paces.

The quality of the application can and normally does vary widely from system to system but some of the common quality attributes include reliability, stability, portability, maintainability and usability. Refer to the ISO standard ISO 9126 for a more complete list of attributes and criteria.

Introduction

In general, software engineers distinguish software faults and software failures. In case of a failure, the software does not do what the user expects. A fault is a programming error that may or may not actually manifest as a failure. A fault can also be described as an error in the correctness of the semantic of a computer program. A fault will become a failure if the exact computation conditions are met, one of them being that the faulty portion of computer software executes on the CPU . A fault can also turn into a failure when the software is ported to a different hardware platform or a different compiler, or when the software gets extended.

Software testing may be viewed as a sub-field of software quality assurance but typically exists independently (and there may be no SQA areas in some companies). In SQA, software process specialists and auditors take a broader view on software and its development. They examine and change the software engineering process itself to reduce the amount of faults that end up in the code or deliver faster.

Regardless of the methods used or level of formality involved the desired result of testing is a level of confidence in the software so that the developers are confident that the software has an acceptable defect rate. What constitutes an acceptable defect rate depends on the nature of the software. An arcade video game designed to simulate flying an airplane would presumably have a much higher tolerance for defects than software used to control an actual airliner.

A problem with software testing is that the number of defects in a software product can be very large, and the number of configurations of the product larger still. Bugs that occur infrequently are difficult to find in testing. A rule of thumb is that a system that is expected to function without faults for a certain length of time must have already been tested for at least that length of time. This has severe consequences for projects to write long-lived reliable software.

A common practice of software testing is that it is performed by an independent group of testers after finishing the software product and before it is shipped to the customer. This practice often results in the testing phase being used as project buffer to compensate for project delays. Another practice is to start software testing at the same moment the project starts and it is a continuous process until the project finishes.

Another common practice is for test suites to be developed during technical support escalation procedures. Such tests are then maintained in regression testing suites to ensure that future updates to the software don't repeat any of the known mistakes.

It is commonly believed that the earlier a defect is found the cheaper it is to fix it.

In counterpoint, some emerging software disciplines such as extreme programming and the agile software development movement, adhere to a "test driven software development" model. In this process unit tests are written first, by the programmers (often with pair programming in the extreme programming methodology). Of course these tests fail initially; as they are expected to. Then as code is written it passes incrementally larger portions of the test suites. The test suites are continuously updated as new failure conditions and corner cases are discovered, and they are integrated with any regression tests that are developed.

Unit tests are maintained along with the rest of the software source code and generally integrated into the build process (with inherently interactive tests being relegated to a partially manual build acceptance process).

The software, tools, samples of data input and output, and configurations are all referred to collectively as a test harness.

Alpha testing

In software development, testing is usually required before release to the general public. In-house developers often test the software in what is known as 'ALPHA' testing which is often performed under a debugger or with hardware-assisted debugging to catch bugs quickly.

It can then be handed over to testing staff for additional inspection in an environment similar to how it was intended to be used. This technique is known as black box testing. This is often known as the second stage of alpha testing.

Beta testing

Following that, limited public tests known as beta-versions are often released to groups of people so that further testing can ensure the product has few faults or bugs. Sometimes, beta-versions are made available to the open public to increase the feedback field to a maximal number of future users.

Gamma testing is a little-known informal phrase that refers derisively to the release of "buggy" (defect-ridden) products. It is not a term of art among testers, but rather an example of referential humor. Cynics have referred to all software releases as "gamma testing" since defects are found in almost all commercial, commodity and publicly available software eventually. (Some classes of embedded, and highly specialized process control software are tested far more thoroughly and subjected to other forms of rigorous software quality assurance; particularly those that control "life critical" equipment where a failure can result in injury or death). (see Ivars Peterson's Fatal Defect for counter examples).

White-box and black-box testing

In the terminology of testing professionals (software and some hardware) the phrases "white box", or "glass box", and "black box" testing refer to whether the test case developer has access to the source code of the software under test, and whether the testing is done through (simulated) user interfaces or through the application programming interfaces either exposed by (published) or internal to the target.

In white box testing the test developer has access to the source code and can write code that links into the libraries which are linked into the target software. This is typical of unit tests, which only test parts of a software system. They ensure that components used in the construction are functional and robust to some degree.

In black box testing the test engineer only accesses the software through the same interfaces that the customer or user would, or possibly through remotely controllable, automation interfaces that connect another computer or another process into the target of the test. For example a test harness might push virtual keystrokes and mouse or other pointer operations into a program through any inter-process communications mechanism, with the assurance that these events are routed through the same code paths as real keystrokes and mouse clicks.

In recent years the term grey (or gray in the United States) box testing has come into common usage. The typical grey box tester is permitted to set up or manipulate the testing environment, like seeding a database, and can view the state of the product after their actions, like performing a SQL query on the database to be certain of the values of columns. It is used almost exclusively of client-server testers or others who use a database as a repository of information, but can also apply to a tester who has to manipulate XML files (DTD or an actual XML file) or configuration files directly. It can also be used of testers who know the internal workings or algorithm of the software under test and can write tests specifically for the anticipated results.

Where "alpha" and "beta" refer to stages of the software before release (and also implicitly on the size of the testing community, and the constraints on the testing methods), white box, black box, and grey box refer to the ways in which the tester accesses the target.

Testing during the beta phase (informally called beta testing) is generally constrained to black box techniques (though a core of test engineers are likely to continue with white box testing in parallel to the beta tests). Thus the term "beta test" can refer to the stage of the software (closer to release than being "in alpha") or it can refer to the particular group and process being done at that stage. So a tester might be continuing to work in white box testing while the software is "in beta" (a stage) but he or she would then not be part of "the beta test" (group/activity).

System testing

Most software produced today is modular. System testing is a phase of software testing in which developers see if there are any communications flaws--either not passing information, or passing incorrect information--between modules.

Regression testing

When changes are made to software, a regression test ensures that the changes made in the current software do not affect the functionality of the existing software. Regression testing can be performed either by hand or by software that automates the process. For more information see regression testing.

Test Cases, Suites, Scripts, and Scenarios

Black box testers usually write test cases for the majority of their testing activities. A test case is usually a single step, and its expected result, along with various additional pieces of information. It can occasionally be a series of steps but with one expected result or expected outcome. The optional fields are a test case ID, test step or order of execution number, related requirement(s), depth, test category, author, and check boxes for whether the test is automatable and has been automated. Larger test cases may also contain prerequisite states or steps, and descriptions. A test case should also contain a place for the actual result. These steps can be stored in a word processor document, spreadsheet, database or other common repository. In a database system, you may also be able to see past test results and who generated the results and the system configuration used to generate those results. These past results would usually be stored in a separate table.

The most common term for a collection of test cases is a test suite. The test suite often also contains more detailed instructions or goals for each collection of test cases. It definitely contains a section where the tester identifies the system configuration used during testing. A group of test cases may also contain prerequisite states or steps, and descriptions of the following tests.

Collections of test cases are sometimes incorrectly termed a test plan. They may also be called a test script, or even a test scenario.

Most white box tester write and use test scripts in unit, system, and regression testing. Test scripts should be written for modules with the highest risk of failure and the highest impact if the risk becomes an issue. Most companies that use automated testing will call the code that is used their test scripts.

A scenario test is a test based on a hypothetical story used to help a person think through a complex problem or system. They can be as simple as a diagram for a testing environment or they could be a description written in prose. The ideal scenario test has five key characteristics. It is (a) a story that is (b) motivating, (c) credible, (d) complex, and (e) easy to evaluate. They are usually different from test cases in that test cases are single steps and scenarios cover a number of steps. Test suites and scenarios can be used in concert for complete system tests. See An Introduction to Scenario Testing

Scenario testing is similar to, but not the same as session-based testing, which is more closely related to exploratory testing, but the two concepts can be used in conjunction. See Adventures in Session-Based Testing and Session-Based Test Management.

A Sample Testing Cycle

Although testing varies between organizations, there is a cycle to testing:

  1. Requirements Analysis: Testing should begin in the requirements phase of the software life cycle(SDLC).
  2. Design Analysis: During the design phase, testers work with developers in determining what aspects of a design are testable and under what parameter those testers work.
  3. Test Planning: Test Strategy, Test Plan(s), Test Bed creation.
  4. Test Development: Test Procedures, Test Scenarios, Test Cases, Test Scripts to use in testing software.
  5. Test Execution: Testers execute the software based on the plans and tests and report any errors found to the development team.
  6. Test Reporting: Once testing is completed, testers generate metrics and make final reports on their test effort and whether or not the software tested is ready for release.
  7. Retesting the Defects

Not all errors or defects reported must be fixed by a software development team. Some may be caused by errors in configuring the test software to match the development or production environment. Some defects can be handled by a workaround in the production environment. Others might be deferred to future releases of the software, or the deficiency might be accepted by the business user.

Code Coverage

Code coverage is inherently a white box testing activity. The target software is built with special options or libraries and/or run under a special environment such that every function that is exercised (executed) in the program(s) are mapped back to the function points in the source code. This process allows developers and quality assurance personnel to look for parts of a system that are rarely or never accessed under normal conditions (error handling and the like) and helps reassure test engineers that the most important conditions (function points) have been tested.

Test engineers can look at code coverage test results to help them devise test cases and input or configuration sets that will increase the code coverage over vital functions.

Generally code coverage tools and libraries exact a performance and/or memory or other resource cost which is unacceptable to normal operations of the software. Thus they are only used in the lab. As one might expect there are classes of software that cannot be feasibly subjected to these coverage tests, though a degree of coverage mapping can be approximated through analysis rather than direct testing.

There are also some sorts of defects which are affected by such tools. In particular some race conditions or similarly real time sensitive operations are impossible to detect while run under code coverage environments; and conversely some of these defects are only triggered as a result of the additional overhead of the testing code.

Controversy

There is considerable controversy among testing writers and consultants about what constitutes responsible software testing. The self-declared members of the Context-Driven School of testing (http://www.context-driven-testing.com) believe that there are no "best practices" of testing, but rather that testing is a set of skills that allow the tester to select or invent testing practices to suit each unique situation. This belief directly contradicts standards such as the IEEE 829 test documentation standard, and organizations such as the FDA who promote them.

Some of the major controversies include:

Agile vs. Traditional

Starting around 1990, a new style of writing about testing began to challenge what had come before. The seminal work in this regard is widely considered to be Testing Computer Software, by Cem Kaner. Instead of assuming that testers have full access to source code and complete specifications, these writers, who included James Bach and Cem Kaner, argued that testers must learn to work under conditions of uncertainty and constant change. Meanwhile, an opposing trend toward process "maturity" also gained ground, in the form of the Capability Maturity Model. The agile testing movement (which includes but is not limited to forms of testing practiced on agile development projects) has popularity mainly in commercial circles, whereas the CMM was embraced by government and military software providers.

Exploratory vs. Scripted

Exploratory testing means simultaneous learning, test design, and test execution. Scripted testing means that learning and test design happens prior to test execution. Exploratory testing is very common, but in most writing and training about testing it is barely mentioned and generally misunderstood. Many writers consider it a dangerous practice. Some writers consider it a primary and essential practice.

Manual vs. Automated

Some writers believe that test automation is so expensive relative to its value that it should be used sparingly. Others, such as advocates of agile development, recommend automating 100% of all tests. A challenge with automation is that automated testing requires automated test oracles (an oracle is a mechanism or principle by which a problem in the software can be recognized). Such tools have value in load testing software (by signing on to an application with hundreds or thousands of instances simultaneously), or in checking for intermittent errors in software. The success of automated software testing depends on complete and comprehensive test planning.

Certification

Many certification programs exist to support the professional aspirations of software testers. These include the CSQE program offered by the American Society for Quality, the CSTE program offered by QAI, and the ISEB certification, offered by the British Computer Society. No certification currently offered actually requires the applicant to demonstrate the ability to test software. No certification is based on a widely accepted body of knowledge. This has led some to declare that the testing field is not ready for certification.

Custodiet Ipsos Custodes

One principle in software testing is best summed up by the classical Latin question posed by Juvenal: Quis Custodiet Ipsos Custodes (Who watches the watchmen?), or is alternatively referred informally, as the "Heisenbug" concept. Heisenberg's uncertainty principle makes it clear that any form of observation is also an interaction, that the act of testing can also affect that which is being tested.

In practical terms the test engineer is testing software (and sometimes hardware or firmware) with other software (and hardware and firmware). The tools can have their own defects and the process can fail in ways that are not the result of defects in the target but results as artifacts of the harness.

There are metrics being developed to measure the effectiveness of testing. One method is by analyzing code coverage (This is highly controversial) - where every one can agree what areas are not at all being covered and try to improve coverage on these areas.

Finally, there is the analysis of historical find-rates. By measuring how many bugs are found and comparing them to predicted numbers (based on past experience with similar projects), certain assumptions regarding the effectiveness of testing can be made. While not an absolute measurement of quality, if a project is halfway complete and there have been no defects found, then changes may be needed to the procedures being employed by QA.

See also

Software testing activities

Quotes

  • "An effective way to test code is to exercise it at its natural boundaries" -- Brian Kernighan
  • "Testing is the process of comparing the invisible to the ambiguous, so as to avoid the unthinkable happening to the anonymous."James Bach
  • "Program testing can be used to show the presence of bugs, but never to show their absence!" Dijkstra
  • "Be careful about using the following code -- I've only proven that it works, I haven't tested it." Knuth

References

  • Cem Kaner, Jack Falk, Hung Quoc Nguyen: Testing Computer Software. Second Edition, John Wiley and Sons, 1993, ISBN 0-471-35846-0
  • Cem Kaner, James Bach, Bret Pettichord: Lessons Learned in Software Testing. A Context-Driven Approach. John Wiley & Sons, 2001, ISBN 0-471-08112-4
  • Glenford J. Myers: The Art of Software Testing. John Wiley and Sons, 1979, ISBN 0-471-04328-1
  • Hung Nguyen, Robert Johnson, Michael Hackett: Testing Applications on the Web (2nd Edition): Test Planning for Mobile and Internet-Based Systems ISBN 0-471-20100-6


External links