Unit Testing

What is a unit?

Some standards—particularly ANSI/IEEE Std 1008-1987 (IEEE Standard for Software Unit Testing)—use a lax definition of software unit. According to IEEE-Std-1008 a software unit "...may occur at any level of the design hierarchy from a single module to a complete program". I think this is because at the time the standard was written most testing was manual, in which case there isn't much difference between planning/managing the test process for a single class and for a complete program. From an economic point of view though, there is a big difference: testing of each individual component was often considered too expensive and testing frequently skipped this step. Some approaches like Cleanroom were (partly) born as a reaction to the excessive cost of proper unit testing in particular in the context of incremental development life cycles. The introduction of automatic regression tests at the unit level and standard test harnesses has changed this balance.

More modern definitions (like the comp.software.testing FAQ and accepted industry best practice define unit tests in a much more restrictive way:

Unit. The smallest compilable component. A unit typically is the work of one programmer (At least in principle). As defined, it does not include any called sub-components (for procedural languages) or communicating components in general.

Unit Testing. In unit testing called components (or communicating components) are replaced with stubs, simulators, or trusted components. Calling components are replaced with drivers or trusted super-components. The unit is tested in isolation.

For object-oriented programs this means that the unit is usually a class. Some classes (like a simple Stack) might be self-contained, but most call other classes and are in turn called by yet other classes. In an attempt to reduce confusion when things go wrong, you should try to test each class in isolation. If you don't test them in isolation, you are implicitly trusting all classes used by the class you are testing. You are effectivelly saying: I think all the other classes already work and if they don't, I'm prepared to sort out the mess myself. That's what "trusted" means in the above definition. If you don't think the other classes work, you should test in isolation. This is normally more work as writing stubs and drivers is a pain.

When to test?

As the scope of unit testing narrows down from complete programs to individual classes, so does the meaning of integration testing. Any time you test two or more already unit-tested classes together instead of using stubs, you are doing a litle bit of integration testing.

For some systems integration testing is a big issue, because they wait to finish coding before they start unit testing. This is a big mistake, as delaying unit testing means you will be doing it under schedule pressure, making it all-too-easy to drop the tests and just finish the code. Developers should expect to spend between 25% and 50% percent of their time writing unit tests. If they leave testing until they have finished, they can expect to spend the same amount testing as they spend writing the module in the first place. This is going to be extremely painful for them. The idea is to spread the cost of unit testing over the whole implementation phase. This is sometimes called "incremental glass-box testing" (see Marc Rettig's article).

If you wait until you've finished coding before you start unit testing, you'll have to choose an integration strategy. Are you going to start with low level classes first and work your way up until you reach the classes that expose some functionality through an public API or start from the top and write stubs for lower level classes or will you just test them all in one go?

Code Coverage

The greatest doubt I had when writing the standard, was not only how much coverage to mandate but also whether to mandate any coverage at all. It is easy enough to come up with a figure: 85% seems to be pretty standard. But I have to agree with Brian Marick [BM] that there is no evidence supporting this number. In my opinion 100% is reasonable, as anything less means you haven't tested that particular statements at all. Of course it is difficult to have automatic unit tests for certain parts of the code. Hardware interaction and UI code are typical examples. Or panics. If you have acceptance tests that include tests for these parts of the code or review them thoroughly, and if you make a sincere effort to minimise the size and complexity of these parts of the code, you can normally get away with not unit testing them. But I'd rather include any known exceptions to the coverage rule in the standard itself instead of arbitrarily lowering the bar for all the rest of the code.

Three pitfalls to consider if you mandate coverage:

Don't consider tests that don't increase coverage as redundant. This is a big mistake. They might not add coverage, but they might find bugs. Coverage isn't everything. Brian Marick expressed this nicely “Coverage tools don't give commands (make that evaluate true), they give clues (you made some mistakes somewhere around there)”. Use coverage metrics to improve your test design skills.
If you mandate 85% coverage, what's the chance of anybody actually achieving significantly more coverage than that? Nil?
If developers get fixated with achieving coverage, they might try to reach 100% from the start and keep it there. This might be difficult to achieve and impact the productivity. The standard addresses the first point by including guidelines on test case design as an appendix. The standard addresses the second point by mandating 100% code coverage. The risk here is that people will tend to shift code into those areas (external interactions, UI) where coverage isn't mandated. It is the responsibility of everybody reviewing test code to watch out for this. The standard addresses the third point by making coverage measures available from the start but only requiring compliance with them in mayor milestones of the project. Also designing black-box test cases improves the chances of the test cases remaining adequate after some implementation change.

Picking a bigger unit

Testing each individual class can be a pain. In OO systems some classes are very closely related and testing each of them in isolation might mean much more effort in the sense of writing stubs, etc. Taking this to the logical conclusion means you test only the public API of the executable. The main reason against this is that it is extremely difficult to get good coverage using this approach, even if you settle for a relatively easy goal like statement coverage. Also testing individual classes means you actually only need to worry about stubs for external systems for those classes that actually interact with them. For all classes in your component that only interact with other classes in the same component, you might not need stubs at all.

Automatic and Self-documenting

This standard mandates much less unit test documentation than is required by older standards such as ANSI/IEEE Std 829-1983. The main reason for this is that these standards more or less imply manual execution of tests. A lot of documentation is required in order to make this execution repeatable by somebody other than the developer. This is not needed as the standard proposes the creation of self-documenting automatic unit tests. The unit tests created by following this standard are self-documenting–because all use the same command-line arguments–and don't require any manual procedures. When manual procedures are wanted for more in-depth testing, the standard also specifies a location for that information (testharness -h).

Test Results

Another area where standards frequently require more documentation is test results. Which test case failed? Are you sure you followed the test procedure? What were you doing when the test failed? All this is redundant. Tests are automatic and only give two answers for each test case: OK or Not OK. Sometimes standards require information like: Information Why it is redundant Inputs Cannot be chosen. Hard-coded. Expected results For each test: Test Name…OK Actual results For some test: Test Name…[OK or Not OK] Anomalies Report as Defect. Date and Time Printed by the unit test Procedure Step Only one step: run them. Environment No environment required for compliant tests. Attempts to repeat Tests are automated. One failure means report as defect. Testers Whoever reports the defect. Observers Whoever is watching the tester (not very fun with automatic tests…).

Summing it up: “[with other approaches] extensive test documentation is developed for each class. In contrast, our approach minimizes the need to document each test suite by standarizing the structure of all test suites. In other words, the test framework is documented once and reused for each test suite.” [CB].

Beyond unit tests

Apart from unit tests other types of tests are needed. Our products are mainly reusable components (APIs) in contrast with other companies that produce finished applications. This will be even more so as the emphasis of producing GUIs for our components shifts over to techview. Because we're producing components, developer testing is easier as we can create programs that use our components in order to test them. Independent testing and validation is made more difficult for this same reason, as some types of testing would require the testing team to be composed of C++ programmers. This being unlikely, the approach to validation/acceptance testing usually is: 1. Developers provide a reference UI (either the developer's of the component or techview), so that the functionality can be tested interactively. 2. Developers provide a “acceptance test application”. This application could be driven by a simple text file, allowing testers to add their own test cases based on their interpretation of the requirements. These approaches are still valid for testing. The unit testing approach proposed in this standard affects them: 1. By lowering defect rates in acceptance testing. 2. By providing build verification tests. 3. By making sure testers aren't the first ones to run the unit tests. 4. By providing clear guidance on which unit test failures should be raised as defect: any failure means a defect. A reasonable cross-team integration approach might still be necessary and is beyond the scope of this standard as it heavily depends on project specific issues. The only recommendation I would make in that respect is to set-up a cross-team automatic build (with BVT as outlined in the standard).