LinkExchange Member

Semi-Formal Testing

Testing cannot prove that software works - only that it doesn't

Requirements Reviews. Analysis Reviews. Design Reviews. Code Reviews. Unit Tests. System Tests. Integration Tests. Stress Tests. User Acceptance Tests. Regression Tests. Defect Tracking. Request Tracking. Documentation. Documentation. Documentation.

Formal development methodologies, such as the Capability Maturity Model from the Software Engineering Institute, require extensive documentation for everything, particularly testing. The testing procedures need to cover every possible path through the software, with every type of value for the various parameters. Although these methods can almost guarantee the correctness of the software, they are extremely burdensome to implement, especially in organizations that are unable to use professional configuration management personnel. Organizations that are just starting to implement formalized methodologies also have a difficult time due to a feeling of being "overwhelmed." The skills and techniques necessary to fully impliment the formal testing requirements are generally not available in the average developmental team, since they are different than the skills and techniques used by the development team themselves.

Completely formal testing methods are extremely time consuming and documentation intensive. But, informal testing methods have serious flaws as to the completeness of the tests. What is being proposed here is the "Semi-Formal Testing Method."

Semi-Formal Testing Process

The primary approach to Semi-Formal Testing is the testing of software "packages." A Package, to use the UML 1.0 definition, is

a general purpose mechanism for organizing elements into groups. Packages may be nested within other packages. A system may be thought of as a single high-level package, with everything else in the system contained in it.

For testing purposes, a package can be an individual function or procedure, a single class, a single code module, or any combination of these.

As with (completely) formalized testing methods, it is the responsibility of the developer to identify the various test cases that are required to validate the package.

At a minimum, test cases must be identified to satisfy the following conditions:

Every requirement of the package (Requirements Testing or Black Box Testing).
Every known concern (Concern Testing).
Every line of code (Structure Basis Testing - Appendix I).
Every unique path of execution (Data-Flow Testing or White Box Testing - Appendix II).
Boundary conditions (Min-Max Testing - Appendix III).
Bad Data must be identified:
- Too little data (or no data)
- Too much data
- The wrong kind of data (invalid data)
- The wrong size of data
- Uninitialized data
Good Data must be identified:
- Nominal cases (middle-of-the-road, expected values).
- Minimum normal configuration.
- Maximum normal configuration.
- Compatibility with old data.

A complete description of all of these test cases is not necessary - only some of them will be used, although test cases from items 6 and 7 should be used. Once the test cases are identified, the developer needs to select the individual tests that will return the maximum benefit. There are no easy and fixed rules to the selection of the tests, so it is up to the experience of the developer to determine which tests are necessary. The selected test cases are then documented completely.

Once the test cases are documented, the package and test cases are passed into a package review session. In this session the developer, with at least two other developers, examines the package for correctness and test cases for completeness. Comments and problems are documented, and the package is tested if the developers in the review session agree.

It is entirely possible (and probable in most packages) that multiple test cases will test for the same type of defect (i.e. a test case will check several conditions). If this happens, one of the test cases is superfluous, and should be removed.

It is also possible that certain types of conditions can not apply. A package that takes two large integers and does something with them mathematically may not care about the numbers at all. In that case, a test that passes a string would not be appropriate.

All test cases need to have their expected values included with them.

If the review developers accept the unit test cases as complete, or there is an agreement that certain test cases are not required (this must be documented), then the actual unit testing can proceed.

Documentation Requirements

Since several different people will be reviewing the package code, it is necessary to have adequate documentation. Descriptions of the package, associated modules, module, procedures, functions, and even algorithms are necessary. The following documentation requirements are required:

Each package will have a Package Information Form filled out for it.
Each source code module will contain a header comment composed of the following:
1. Source code module logical name
2. Source code module Description
3. Version control header information
  - Archive name
  - Revision number
  - Revision date
  - Author ID
4. Version control log
Each global variable will have a comment block describing its usage
Each module variable will have a comment block describing its usage
Each non-Delphi generated class will have a comment block describing it purpose
Each non-Delphi generated function and procedure will have a comment block describing its purpose
Each non-trivial function and procedure will have an extended comment block describing:
- Function/ Procedure name
- Description
- Input variables and usage
- Output variables and usage
- Side effects
All non-trivial algorithms and processes will be have at least one comment block

Items 2 through 6 are explicit. Item 2.3 is implemented using the PVCS keyword $Header$ (other version control software will have equivalent codes). Item 2.4 is implemented using the PVCS keyword $Log$ . The definition of "non-trivial" in items 7 and 8 are left to the responsible developer and the developers involved in the code review.

In addition to the comments imbedded in the source code, is necessary to fill out the Module Description form.

Everything about packages will be documented. Defects found, by whom, where (module name, version, line #, class, phase, etc.), notes, and impact will be documented. Test cases, and their use (or lack of it) will be documented. This will require an intelligent use of version control and the maintenance of the defect documents. This level of tracking will allow for the accumulation of statistical information that can pinpoint modules and classes that are particularly error prone due to the complexity of their code or coding problems. With this statistical information, it will be possible to improve the development process for software, on upcoming code modules and on future projects.

Package Reviews

In their simplest and most informal form, package reviews are nothing more than having another developer look at the code that forms a package. A more formal Package Review results in greater returns because of several factors:

Multiple developers become familiar with the code that makes up a package
Different developers will notice different items

The following steps are taken to initiate a package code review:

The developer gives copies of the package and the test cases to at least 2 other developers for review.
The reviewing developers examine the package and test cases.
The originating developer and the reviewers meet to discuss the package and test cases.
The originating developer, acting as leader, directs the review.
Each function, class, procedure, algorithm, and test case of the package is examined.
If the reviewers have concerns or problems
- The originating developer, acting as note taker, records the concern and problem, filling out forms as necessary.
- The meeting continues with the next item in the package.
When every function, class, procedure, test case, and algorithm have been gone over, the developers sign review form.
The original developer corrects any defects in the package, documenting each one.

Potential problems for the review include (but are not limited to):

Infinite loops
Memory leaks
Algorithm flaws
Implementation flaws
Inadequate test cases
Code that is never run
Overflows and Underflows
Comment usage
Requirements attainment
Inadequate documentation

In addition to looking for code defects, the developers should also note the following:

Specific implementation issues
Bottleneck identification (speed and memory)
Unit testing concerns (test equivalencies and inadequacies and skipped tests)
Potential overflows and underflows (i.e. range values for numbers, items in a list, etc.)

The actual package review meeting is used to gather the notes, ideas, and concerns of the reviewers and document them in one place. All problems will also be written up, including inadequate documentation.

Package Reviews do not need to be 100% re-run every time. If the only change is the creation of additional test cases, then the additional tests are the only items that need to be checked. If comments were added to the code, then only the level of comments need to be checked.

Unit Tests

Unit tests are made up of a number of specific test cases that can validate a programs execution for a given set of inputs. Relatively simple procedures, functions, or classes might only have a few test cases. Others, because of their complexity, may have tens or even hundreds.

There are seven conditions that the group of test cases must satisfy. The first one, Requirements Testing, is possibly one of the easiest. Requirements Testing is sometimes also known as 'Black Box Testing.' Its premise is rather straight forward: given input X, output Y must be generated. The internal workings of the code are not investigated (hence the nomenclature 'Black Box'). Usually, the requirements are well defined and minimal: there are only a limited number of inputs and outputs expected.

Concern Testing, the second Unit Test condition that must be met, is rather subjective. As the code is developed, certain 'concerns' may arise. These concerns can arise during the identification of the requirements, the analysis or design of the code, the coding itself, or even from code reviews. Although specific tests for these items will probably be covered in other areas (particularly Data-Flow and Boundary tests), they are placed in this category for easy identification and examination. The specific concern will indicate the specific type of test.

Structure Basis Testing, the third unit test condition, is used to test every possible line of code. Appendix II gives the instructions on how to create these test cases. It must be noted that this method can give a large number of test cases in non-trivial routines, and close examination of the specific test cases will be required to identify duplicate or equivalent tests.

Data-Flow Testing, also known as White Box Testing, is used to test every possible logical flow through the code. Appendix III has details on how to identify logical paths through the code. This testing method, more than any of the others, has the potential to create the greatest number of tests.

Min-Max Testing, also known as Boundary Testing, is often done but rarely done to completion. Many developers perform tests with maximum and minimum values of the system, but few rarely take it beyond that. Min-Max Testing also tests beyond the expected boundaries by testing minimum - 1 values and maximum + 1 values. In addition to the normal boundaries, compound boundaries must be checked. These are boundaries that occur because of some sort of arithmetic operation (such as multiplication for upper and lower boundaries) is performed on numbers that are just too big, even though individually they may be well within bounds. Care must be taken in the software during data type conversion, particularly converting strings to other formats (especially numbers). Appendix IV further defines this process.

Another type of testing that is done often but rarely done to completion is Bad Data Testing. There is a variety of bad data that code must cope with, including too little (or no) data, too much data (rarely tested), the wrong kind of data, the wrong size of data, and uninitialized data (especially with pointers).

The easiest type of tests to define are the Good Data Tests. These tests contain such data as nominal cases (middle-of-the-road, expected values), normal minimums (a list with 1 item), normal maximums (a list with 1000 items), and compatibility with old data.

Even though it is difficult (not impossible, just difficult and expensive), these tests can help identify software defects when they can be more easily dealt with - before the modules are integrated with the rest of the system.

There will be a situation when a test case should be run, but because of time, complexity, or financial reasons it is determined to be too much. The test case should be created anyway, and be made inactive for the appropriate reason. The developers involved in the code review will have to concur.

Appendix I: Structured Basis Testing

The idea behind Structured Basis Testing is simple: every line of code must be tested at least once. To determine the number of test cases needed, apply the following rules:

Start with 1 for the straight path through the routine.
Add 1 for each of the following keywords, or their equivalents: if, while, repeat, for, and, and or.
Add 1 for each case in a case statement. If the case statement doesn't have a default case, add 1 more.

The use of some intelligence is required. If a for loop uses constants for its range, then an additional test case for that loop is not necessary. If a case statement is base on some number mod 3, and it only has cases for 0, 1, and 2 (the only possible values of a mod 3 operation), then an additional test case for the lack of a default case is not necessary. The following example is taken from the source code from the TCalendar sample component that is included with Delphi:

"1" for the routine itself	1 function TCalendar.GetCellText(ACol, ARow: Integer): string;
	2 var
	3 DayNum: Integer;
	4 begin
"2" for the if	5 if ARow = 0 then
	6 Result := ShortDayNames[(StartOfWeek + ACol) mod 7 + 1]
	7 else
	8 begin
	9 DayNum := FMonthOffset + ACol + (ARow - 1) * 7;
"3" for the if, "4" for the or	10 if (DayNum < 1) or (DayNum > DaysThisMonth) then
	11 Result := ''
	12 else
	13 Result := IntToStr(DayNum);
	14 end;
	15 end;

We would then have the following test cases that would cover all of the bases in this example:

Case	Test Description	Test Data
1	Nominal case.	All boolean conditions are true.
2	The if is false.	ARow > 0.
3	The first and second parts of the if are false.	1 <= DayNum <= DaysThisMonth
4	Either the first or the second part of the if is true.	DayNum < 1.

Even in a simple routine such as this, there are four tests. We can remove test #2 because it is a precondition for tests 3 and 4.

Appendix II: Data-Flow Testing

Data-flow testing, also known as White Box Testing, is based on the idea that data usage is at least as error prone as control flow. Unlike Structured Basis Testing, Data-Flow Testing does not have any fast and easy rules to follow to determine test cases. With data-flow testing, a test case is required for every combination of flows through a routine. Take the following pseudocode sample:

1	if ( Condition 1 )
2	x = a
3	else
4	x = b;
5
6	if (Condition 2 )
7	y = x + 1
8	else
9	y = x - 1;

Using Structured Basis Testing, two (2) test cases are required. One where Condition 1 is false and Condition 2 is true, and one where Condition 1 is True and Condition 2 is false. These two test cases do not satisfy all logical paths through this pseudocode fragment. To have complete logical flowthrough, two additional test cases are requires: both Condition 1 and Condition 2 are false, and both Condition 1 and Condition 2 are true. Combining these two methods will give a fairly complete testing structure. The only way to become more complete is to test every possible value and every possible combination, a computationally unfeasible proposition.

Appendix III: Min-Max Testing

Testing boundary data required the following test cases:

Maximum Value (or length)
Minimum Value (or length)
Maximum Value + 1 (or length)
Minimum Value - 1 (or length)
Compound

Compound boundaries are created when the input values may be multiplied or otherwise manipulated to create extrema errors.

Test cases for items 3 and 4 may not be necessary if the minimum and maximum values are defined by the class of data. It is not possible to pass a value that is larger than MAXINT as an integer to a routine. It is also not possible to have a string with a length of -1. It IS possible, though, for a user to type in a value larger than MAXINT (32768 is 5 characters, but MAXINT is defined as 32767 in Delphi 1.0). It is also possible to have a string whose length is 0.

Appendix IV: Checklists and Forms

Form: Package Information

Package Logical Name
Creator
Owner
Date Created
Description:
Requirements:
Modules:
Classes:

Package Logical Name	The logical name of the package.
Creator	The creator of the package.
Owner	The current person responsible for the package.
Date Created	The date that this package was created.
Description	The reason for this package to exist.
Requirements	What are the requirements for this package (useful for creating test cases).
Modules	A list of modules that are included in this package.
Classes	Name, description, and usage of the public classes that are used in this package.

Form: Code Module

Module Logical Name
Physical Name
VCS Location
Creator
Owner
Date Created
Description:
Requirements:
Classes:

Module Logical Name	The logical name of the module.
Physical Name	The physical name of the module.
VCS Location	The location of the code (in the version control system).
Creator	The creator of the module.
Owner	The current person responsible for the module.
Date Created	The date that this module was created.
Description	The reason for this module to exist. It is not necessary to maintain information that is expanded by the version control system in the log area.
Requirements	What are the requirements for this module (useful for creating test cases).
Classes	Name, description, and usage of the public classes that are defined in this module.

Form: Test Plan

Module
Test ID			Active? Y N
Creator
Date Created
Test Type:	___ Requirement	___ Concern	___ Structure Basis
___ Data Flow	___ Min-Max	___ Bad Data	___ Other ______
Comments
Description
Process

Module		The name of the module for which this test applies.
Test ID		Unique identifier
Active?		Is this test currently run for unit testing?
Creator		Who created this test (may come from another developer, especially with code reviews.
Date Created		The date that this test was created
Test Type		The conditions that this test satisfies (mark all that apply)
	Requirements	Mark if this test satisfies the Requirements condition.
	Concern	Mark if this test addresses a concern.
	Structure	Basis Mark if this test is part of the Structure Basis specific tests.
	Data Flow	Mark if this test is part of the Data-Flow specific tests.
	Min-Max	Mark if this test works with Min-Max conditions.
	Other	Mark if this test is another type. Describe and explain.
Comments		Any comments, such as origin (code review, including why) go here.
Description		Description of what the test will do.
Process		The process of the test.

Form: Test Cases Completion

1	Y N	Does each requirement that applies to the routine have its own test case?
2	Y N	Has each line of code been tested with at least one test case?
3	Y N	Have all data-flow paths been tested with at least one test case?
4	Y N	Has a list of common errors been used to write test cases to detect errors that have occurred frequently in the past?
5	Y N	Have boundary conditions been tested?
6	Y N	Have compound boundaries been tested?
7	Y N	Do test cases exist for the wrong kind of data?
8	Y N	Are representative, middle-of-the-road values tested?
9	Y N	Is the minimum normal configuration tested?
10	Y N	Is the maximum normal configuration tested?
11	Y N	Is compatibility with old data tested?
12	Y N	Do the test cases have expected results?

For every requirement of the module, a test case must test for it.
Every line of code must be executed.
Every data-flow path must be identified and tested.
In examining other code modules of this type, are test cases present that can identify (or attempt to identify) defects that may be common?
Boundaries include: minimum, maximum, minimum - 1, maximum + 1
Compound boundaries exist when mathematical operations on large (or small) numbers force results that are out of bounds.
Test cases must exist for invalid information (negative string length, negative count).
A test case (or more) must exist for what would be considered to be normal data. The review developers will have to agree what 'normal' is.
A test case (or more) must exist for what would be considered to be normal minimum data. The review developers will have to agree what 'normal minimum' is.
A test case (or more) must exist for what would be considered to be normal maximum data. The review developers will have to agree what 'normal maximum' is.
If there is any old data that can be tested, it must be tested, particularly if the old data generated errors.
For each test, it will be necessary to determine (beforehand) what the expected results will be.

Form: Defect Documentation

Module	Class
Procedure	Line
Date Discovered	Method
Found by	Verified?	Y N
Description
Test Case
Analysis

Module	The name of the module where the defect was discovered.
Class	The class name (if appropriate) of the defect.
Procedure	The name of the procedure or function that contains the defect.
Line	The line number of the defect (if appropriate).
Date Discovered	The date of the defect discovery.
Method	The method of the discovery (review, unit test, etc.)
Found By	The name of the discoverer.
Verified?	Has the existence of this defect been verified?
Description	The description of the (potential) problem.
Test Case	Description of a test case to re-create the problem.
Analysis	Post mortum analysis to determine a method to keep this type of defect from re-occurring.

Form: Package Code Review

Package		Date
Originator
Developers
Developers
Developers
Developers
Passed	Y N
Results
Concerns

Package	The name of the module that is being examined.
Date	The date of the review.
Originator	The responsible developer for the package.
Developer	The names of the developers performing the review (it is not recommended that more that 4 developers are used). A minimum of two developers must be used.
Passed	Indicates if the module should go on to Unit level testing.
Results	Comments and results.
Concerns	Concerns about the code. These will usually become test cases.

Lotsa Bugs

1983: The Strategic Defense Initiative proposal was intended to defend the United States against a nuclear missile attack by using computer-aimed weapons system to shoot down missiles. It was estimated that the software would have required some 10 million to 100 million lines of code. Without the Soviet Union's cooperation in staging nuclear missile attacks to test it, the system would have to work perfectly, bug free, the first time it was ever used. Fortunately, it was never used.
1985: An IRS computer error resulted in 27,000 companies receiving warning notices to pay employee federal withholding taxes that they had, in fact, already paid.
1985 - 1987: At least four people died when they were exposed to lethal doses of radiation from Therac-25 linear accelerator machines used for radiation treatment of cancer. Software errors caused the machine to incorrectly calculate the amount of radiation being delivered to the patient.
1988: A math error caused a "worm" program to multiply 14 times faster than intended, and as a result the Internet was swamped and overwhelmed for a few hours. It was weeks before affected systems recovered from the damage wrought, causing in the hundreds of millions of dollars.
1988: Backup data, corrupted due to software errors, eventually destroyed all the main system data, and backup copies of data, at an automated Black & Decker distribution center in Northampton, England. Employees were eventually forced to climb the racks of inventory in the unlit warehouse with mountain climbing equipment to check stock.
1989: A British bank that (understandably) wishes to remain nameless mistakenly transferred an additional 2 billion to customers in only 1 hour, when a bug permitted payment orders to be issued twice. Since there was no way to distinguish real from duplicate transactions, the bank had to depend on the honesty of its customers to recover the extra payments.
1989: A computer in Paris read files on traffic violations and then mistakenly sent out letters charging 41,000 traffic offenders with crimes including murder, drug trafficking, extortion, and prostitution. Recipients were described as "surprised."
1990: A logic error in its call-handling computers shut down AT&T's long-distance telephone network for 9 hours, the most severe breakdown in the history of the U.S. telephone system. Some 74 million long-distance and 800-number calls were not completed, bringing phone dependent businesses to a standstill.
1991: Telephone outages in local telephone systems in California and along the Eastern seaboard were caused by a bug in the signaling software. The bug was introduced when three lines of code were changed in a multi-million line signaling program. After this tiny change, nobody thought it necessary to retest the software.
1993: An $80 million satellite called Clementine was hopelessly lost in space after a software error caused its thruster rockets to fire continually, consuming all its fuel before its asteroid-rendezvous mission was completed.