Testing: A Developer's Perspective

I've frequently found that people talk to each other using terms like "black-box testing" and "unit testing" and everybody seems to think they're talking about the same thing. Well, frequently they aren't. It also seems to be somewhat of a religious issue, with testers, developers and managers each trying to impose they differing views on each other. I'm a developer and though biased, have tried to stay as objective as possible (of course, I would say that, wouldn't I?). Most of the terminology is taken out of the comp.software.testing FAQ and is based on definitions from notorious testing guru Boris Beizer.

Code coverage

There are different levels of code coverage. It's important to know the difference between them. I particular when somebody tells you that their tests achieve 70% coverage without specifying the type. Ed Millner's article "INCOMPLETE TESTS ARE WORSE THAN NONE AT ALL! (his all-caps) has a short summary of coverage types:

If somebody tells you a percentage without specifying the type, there is a good chance he is talking about S0 which is the most basic type.

A very complete overview of code coverage has been writen by Steve Cornett of Bullseye (the authors of C-Cover).

My favorite article on code coverage is Brian Marick's How to Misuse Code Coverage. This should be required reading for anybody worried about their level of code coverage.

Black-box vs. White-box Testing

The difference between both types of testing is in how you design your tests. In which criteria you use to select your test cases.

In black-box testing you use some kind of specification to drive the design. You are making sure that the thing you're testing behaves as advertised.

In white-box testing you have full information on how the thing you're testing is implemented and you can use that knowledge to drive the testing. You can "see inside" the thing you're testing, hence the name white-box testing. Some people prefer to call it glass-box or clear-box testing instead. The purpose of white-box testing is typically to increase the code-coverage achieved by your black-box test cases.

The FAQ points out that lately the terms "behavioral" and "structural" are more popular: Black-box testing only relies on externally "published" behaviour, while white-box testing uses knowledge of the internal structure of the code. The pros and cons of each approach are easily derived from that characterisation: black-box test cases remain valid if the internals of the code change, but it might be hard to achieve very high code coverage; white-box test cases are easily invalidated by changes in implementation details, but you can achieve very high levels of code coverage. As strategy, the best thing is probably to start off with black-box test cases and start adding white-box test cases only if the code coverage stabilises below our expectations. And even then, it might be better to consider redesigning the code under test, as low coverage may indicate code that's to complex to test properly.

There's another type of testing called "gray-box". This means you have some details about the implementation (typically the algorithm that was used), but haven't got the full picture.

Unit Testing

Some standards—particularly ANSI/IEEE Std 1008-1987 (IEEE Standard for Software Unit Testing)—use a lax definition of software unit. According to IEEE-Std-1008 a software unit "may occur at any level of the design hierarchy from a single module to a complete program". I think this is because at the time the standard was written most testing was manual, in which case there isn’t much difference between planning/managing the test process for a single class and for a complete program. From an economic point of view though, there is a big difference: testing of each individual component was often considered too expensive and testing frequently skipped this step.

Some approaches like CleanRoom were (partly) born as a reaction to the excessive cost of proper unit testing in particular in the context of incremental development life cycles. Code reviews were considered a more cost-effective way of ensuring code quality. The introduction of automatic regression tests at the unit level and standard test harnesses has changed this balance.

More modern definitions (like the comp.software.testing FAQ and accepted industry best practice) define unit tests in a much more restrictive way:

Testing in Isolation

You just developed a new version of your company-internal Stack class. It's used by your Account class and it uses an Array class in it's implementation. You don't have any unit tests for the Stack class, so you decide to test the Stack class indirectly by running the tests for the Account class. One of the tests fails. Where is the bug? Is it the new Stack class, is it the Account class (Jimmy just did some changes to it) or is it some weird bug in the underlying Array class?

You can see that life might be easier if you had some tests that verify the functioning of the Stack in isolation. That's what unit testing attempts to do: remove noise so that you can be sure where the bugs are comming from.

Removing calling classes (Account in our case) is straight-forward: you need some kind of "driver". A main program that calls the Stack class directly and checks the results for correctness:

#include <assert.h>

  Stack s;

But even if these tests fail, you're still not sure it's the Stack: it could be the underlying Array class. You could develop the Stack using a simple-but-robust version of the Array class and switch to the ultra-optimized-but-potentially-buggy version once you're sure the Stack works properly. The simple array is acting as a "stub": a limited version that does just enough to be able to test the calling class.


If maintaining all those stubs sounds like too much trouble, you might be right. It's perfectly okay to decide that you trust your Array class and that you won't bother replacing it with a stub. It's just that you need to be aware that you're making that choice.

Stubs normally imply playing around with makefiles or version-control tools to get the right version of the class, depending on whether you're running the test or the production code. Latelly another technique is quite popular: MockObjects.

Integration Testing

Any time you test two or more already-tested components together instead of using stubs, you’re doing integration testing. For some companies this is a big issue, because they wait until they finish coding before they start testing. If that's your case, you’ll have to choose an integration strategy. Are you going to start with low level classes first and work your way up until you reach the top-level application classes or do you start from the top and write stubs for lower level classes. Or will you just big-bang test them all in one go?

Incremental Testing

Testing after finishing all the code is a big mistake, as delaying unit testing means you’ll be doing it under schedule pressure, making it all-too-easy to drop the tests and "just finish the code". Developers should expect to spend between 25% and 50% percent of their time writing unit tests. If they leave testing until they’ve finished, they can expect to spend the same amount testing as they spend writing the module in the first place. This is going to be extremely painful for them. The idea is to spread the cost of unit testing over the whole implementation phase. This is sometimes called "incremental glass-box testing" (Marc Rettig. Practical programmer: testing made palatable. Communications of the ACM. Volume 34, Issue 5 (May 1991)).

Automated Testing

Incremental testing only works if your tests are automated. It follows that you should ruthlessly remove all obstacles to test automation. always ask yourself: how can I make this run automatically?

Problem: The database is to slow.
Solution: Use a stub class that saves stuff in memory instead. If you're testing the persistence part, use stub that works with a flat file.
Problem: My application uses data from life sensors.
Solution: Use a stub class that returns hard-coded values, random values or values with a particular statistical distribution. Or save some values to a file and replay them with a special stub class.
Problem: It's a GUI!
Solution: This is a tough one. Easiest solution is to keep as much code as possible in helper classes that you can test independently (i.e. using the Model-View-Controller paradigm) and then test the GUI with whatever means you've got. Something like Win32::Guitest might come handy.
Problem: I'm writing some serial comms code to control a GPS, but they are expensive: we only have one for the whole team.
Solution: Only the SerialPort class needs to know about the the port. All other classes use SerialPort and never realise that SerialPort is just a stub that reads data from the clipboard. For the real SerialPort class: keep it simple, get it code-revied and test it with whatever means you've got!

Acceptance Tests

The aim of unit or component testing is usually to boost the developers confidence in the correctness of his code. You normally also need some tests that convince your customer that the whole system is functional: acceptance tests. If your company is in the businness of producing class libraries or reusable components, your component tests might be used as acceptance tests.

The approach to validation/acceptance testing usually is:

  1. Provide a written document that an independent group can use to test the program. This can be something like the user manual or derived from the the requirements specification of the program.
  2. Developers provide a test UI, so that the functionality can be tested interactively.
  3. Developers provide a acceptance test application. This application could be driven by a simple text file, allowing testers to add their own test cases based on their interpretation of the requirements.

Option 3 is best, because of the potential for automation. But it might be difficult to convince your clients of it's benefits.