Test Case Design and Maintenance

xUnit Patterns ^[2], by Gerard Meszaros, presents several high level objectives for automated tests and individual goals for achieving those objectives:

Objective: Tests should help us improve quality.
- Goal: Tests as specifications
- Goal: Bug repellant
- Goal: Defect localization
Objective: Tests should help us understand the system under test.
- Goal: Tests as documentation
Objective: Tests should reduce, not increase, risk.
- Goal: Tests as safety net
- Goal: Do No Harm
Objective: Tests should be easy to run.
- Goal: Fully Automated Test
- Goal: Self-Checking Test
- Goal: Repeatable Test
Objective: Tests should be easy to write and maintain.
- Goal: Simple Tests
- Goal: Expressive Tests
- Goal: Separation of Concerns
Objective: Tests should require minimal maintenance as the system evolves around them.
- Goal: Robust Test

In this section, we will discuss how the facilities in Boost.Test can support these goals; see xUnit Patterns for the description and rationale of these goals.

Goal: Tests as Specifications

Tests can serve as specifications for the system under test if we write the test for the system first, using test-driven development. We imagine the perfect system that does exactly what we need and write a test as if that system existed. In order to achieve a failing test, we write just enough of the implementation to make the test compile.

A good way to ensure that the implementation causes a test failure is for called methods and functions to throw an exception indicating that the method or function called is not implemented:

#include <stdexcept>

void free_function()
{
    throw std::runtime_error("free_function not implemented.");
}

We get a failing test right away, while still satisfying just enough syntax to satisfy the compiler.

Goal: Bug Repellant

It is well known that the longer a bug remains undetected in a software, the more expensive it is to fix.^[3] If we are practicing test-driven development, then we find bugs in our implementation as soon as we code them. The time to find a bug in this manner can be as little as seconds and was demonstrated in the tutorial "Hello, Test!".

The easiest way to achieve this rapid feedback is to run the tests as part of the build. The details vary depending on your build system.

In Visual Studio 2012, this is most readily achieved by setting a post-build build event as shown below. Because $(TargetPath) may contain spaces, the variable is surrounded by double quotes to ensure that the entire path to the executable is found. The description is given generically using $(TargetName) so that this build event can be copied to any test project's properties and be used without modification. Take care that you define the build event for all configurations and all platforms used by your build system so that the unit tests always run.

user-vs2012-build-event

If you are using a Makefile to compile your code, create a phony target that depends on your unit test executable and specify the command to execute your unit test and create the phony target if the test passes. This will ensure that the unit tests continue to run as long as they fail because the phony target will only be updated when the tests pass.

If you are using Boost.Build and Jamfiles to compile your code, you can use the rules in the testing module to incorporate your unit tests into your build. The unit-test rule will build an executable and run it, failing the build when the executable returns a non-zero exit status. Other rules in the testing module may also be useful.

Goal: Defect Localization

If we keep our test cases focused on only a single behavior in our system under test, then there is only a single cause for any particular unit test to fail. We saw this in "Testing With Exceptions" when we added a new test case to hello_world for a bad stream. By choosing intention revealing names for our test cases, we can identify the source of the failure simply by reading the name of the failed test case.

Since each test case excercises only one scenario for the system under test, we must make at least one test case for each possible scenario. We can use the cyclomatic complexity as a rough proxy for the number of tests cases needed for any particular method or function to ensure that we have sufficient coverage of the system under test.

Goal: Tests as Documentation

When confronted with a new software component or a new code base, how do we understand its behavior? We can read the documentation, but ultimately the behavior is defined by the code executed. Comments and documentation can lag the actual implementation. Sometimes, the only answer to our questions about a software component come from the implementation. If the component is only available to us in binary form, we don't have the luxury of consulting the source code in order to answer our questions.

Unit tests can serve as a form of executable documentation for software components. They tell us how the system responds to the scenarios orchestrated by the test cases.

When faced with a confusing aspect of a software component, we can answer questions about the behavior of the component by writing a unit test that describes our hypothesis about the component's behavior. If the unit test passes, we have verified our hypothesis. If the unit test fails, our hypothesis was incorrect; either way, we have learned something about the component. We can use unit tests to document subtle and unexpected behaviors of components. Unit tests are also the perfect documentation for a bug report on a component maintained by others.

Goal: Tests as Safety Net

When we start building a software system, we are able to keep all the details of the system in our mind because the system is small. As the size of the system increases, it becomes harder and harder to keep all the details of the system in mind as we make modifications. A comprehensive suite of unit tests over the system give us an automated regression test that gives us the confidence of knowing that our changes are not introducing any problems elsewhere in the code.

Goal: Do No Harm

Automated tests should only reduce risk, not introduce risk into the system. To achieve this, we want to keep all test code separated from production code. The easiest way to do this is to put the system under test into a library (static or shared) and link the test executable against the library. We saw this in the tutorial "Hello, Test!", when we separated the system under test into the hello library and the test code into test_hello.cpp.

Goal: Fully Automated Test

Boost.Test supports fully automated tests by allowing us to supply the inputs to the system under test in each test case in order to drive the system into the scenario of interest. Test cases should never rely on user input, or they will not be fully automated tests that can run unattended.

Goal: Self-Checking Test

Boost.Test supports self-checking tests through its rich set of assertions. Each test case supplies the inputs to the system under test and validates the behavior of the system using assertions. Self-checking tests report only bad news and good news results in no notifications. The default output from the test runner only reports failing tests and a summary of all tests executed. The test runner also returns a non-zero status code when a test fails, allowing easy failure of continuous integration builds.

Goal: Repeatable Test

We should get the same results from automated tests every time we run them, provided the implementation of the system under test has not changed between runs. In the context of unit testing, this implies that a test case must control all the collaborators that can influence the system.

The most troublesome collaborators are among the following: * current date and time * device input * file system * system services (Windows registry, networking, etc.) * C style apis

In the book "Working Effectively with Legacy Code", Michael Feathers described a number of techniques for decoupling the system under test from such troublesome collaborators that can cause unit tests to spuriously fail. All the techniques are variations on a theme: introduce a level of indirection to decouple the system under test from a collaborator. C++ offers static polymorphism via templates as well as the usual dynamic polymorphism via interfaces to decouple a system under test from a collaborator.

Goal: Simple Tests

We keep tests simple by exercising only one scenario for each test case. Simple test cases read linearly through the phases of setup, exercise and verify. Trying to exercise too much functionality in a single test case can introduce unnecessary complexity into the test.

Duplication between test cases can make tests hard to read by distracting us from the steady rhythm of setup, exercise and verify of test cases. You may find it useful to apply the rule of three when writing test cases to decide when to extract duplication into a fixture.

Goal: Expressive Tests

Sometimes low-level setup details get in the way of a test reading clearly. This could be the result of a complicated data structure needed in order to exercise the system down a particular code path. Similarly, the verification of a result produced by the system under test may involve a series of assertions that make the test hard to follow. Boost.Test provides fixtures as a way to localize these distracting details by extracting setup and assertion methods into the fixture.

Over time we build up a series of methods in the fixture that allow us to express domain concepts succinctly and clearly in the tests, making them more expressive of the scenario in the domain. Fixtures can be combined through aggregation or inheritance in order to express combinations and hierarchies of domain concepts.

Goal: Separation of Concerns

We keep test code separated from production code with the packaging mechanisms provided by C++. Production code is supplied to the test executable as a library, either static or dynamic. The test code resides in separate source files from the production code. The production code consumed by the test code is compiled with the exact same preprocessor settings as the production code to ensure that tests do not influence the production code.

We keep concerns separated in our tests by testing each concern in its own test case. Each test case exercises a single scenario and our tests exercise the responsibilities or classes individually. If we are practicing test-driven development, we keep the single responsibility principle in mind as we are creating the system to satisfy the evolving tests. When our tests start involving more than one concern, it can be a sign that our system under test is covering more than one responsibility.

Goal: Robust Test

Test cases are robust when small changes to the system under test result in a small number of test cases failing. If a small change to the system results in many test cases failing, then our tests are not robust. If our tests cases do not sufficiently isolate the system under test from its collaborators, then a change to one part of the system can cause tests on seemingly unrelated parts of the system to fail. If we have repeated assertions in many test cases for the same system under test, then many test cases can fail if that single assertion fails. Each of these situations is a case of overlap between test cases; in the first case, the overlap is between the parts of the production code unrelated to the system under test that we exercise and in the second case, the overlap is between test cases on the system under test.

^[2] The term xUnit is a generic term for unit testing frameworks, such as jUnit. The advice in xUnit Patterns applies equally well to Boost.Test.

^[3] Software Testing: Economics