In essentially every embedded system there is some sort of product testing. Typically there is a list of product-level requirements (what the product does), and a set of tests designed to make sure the product works correctly. For many products there is also a set of tests dealing with fault conditions (e.g., making sure that an overloaded power supply will correctly shed load). And many companies think this is enough, but I’ve found that such tests usually fall short in many cases.
The problem is that there are features built into the software that are difficult or near-impossible to test in traditional product-level testing. Take the watchdog timer for example. I have heard in more than one case where a product shipped (at least one version of a product) with the watchdog timer accidentally turned off. Just in case you’re not familiar with the term, a watchdog timer is an electronic timer that is used to detect and recover from computer malfunctions. During normal operation, the computer regularly restarts the watchdog timer to prevent it from elapsing, or “timing out”. (Wikipedia)
How could this happen? Easy: a field problem is reported and the developer turns off watchdog to do single-step debugging. The developer finds and fixes the bug, but forgets to turn the watchdog back on. The product test doesn’t have a way to intentionally crash the software (to see if the watchdog is working) so the new software version ships with watchdog timer still turned off, and the device doesn’t recover without human interaction. That’s a problem if you’re building, let’s say, a Mars rover.
And, well, here we are, needing a Software Test Plan in addition to a Product Test Plan. Maybe the software tests are done by the same testers who do product test, but that’s not the point. The point is you are likely to need some strategy for testing things that are there not because the end product user manual lists them as functions, but rather because the software requirements say they are needed to provide reliability, security, or other properties that aren’t typically thought of as product functions. (“Recovers from software crashes quickly” is typically not something you boast about in the user manual.) For similar reasons, the normal product testers might not even think to test such things, because they are product experts and not software experts.
So to get this right the software folks and product testers are going to have to work together to create a software-specific test plan with the software requirements that need to be tested, even if they have little directly to do with normal product functions. You can put it in product test or not, but I’d suggest making it a separate test plan, because some tests probably need to be done by testers who have particular skill and knowledge in software internals beyond ordinary product testers. Some products have a “diagnostic mode” that, for example, sends test messages on a network interface. Putting the software tests here makes a lot of sense.
But for products that don’t have such a diagnostic mode, you might have to do some ad hoc testing before you build the final system by, for example, manually putting infinite loops into each task to make sure the watchdog picks them up. (Probably I’d use conditional compilation to do that — but have a final product test make sure the conditional compilation flags are off for the final product!)
Here are some examples of areas you might want to put in your software test plan:
- Watchdog timer is turned on and stays turned on; product reboots as desired when it trips.
- Watchdog timer detects timing faults with each and every task, with appropriate recovery (need a way to kill or delay individual tasks to test this).
- Tasks and interrupts are meeting deadlines (watchdog might not be sensitive enough to detect minor deadline misses, but deadline misses usually are a symptom of a deeper problem).
- CPU load is as expected (even if it is not 100%, if you predicted an incorrect number it means you have a problem with your scheduling estimates).
- Maximum stack depth is as expected.
- Correct versions of all code have been included in the build.
- Code included in the build compiles “clean” (no warnings).
- Run-time error logs are clean at the end of normal product testing.
- Fault injection has been done for systems that are safety critical to test whether single points of failure turn up (of course it can’t be exhaustive, but if you find a problem you know something is wrong).
- Exception handlers have all been exercised to make sure they work properly. (For example, if your code hits the “this can never happen” default in a switch statement, does the system do something reasonable, even if that means a system reset?).
Note that some of these are, strictly speaking, not really “tests.” For example, making sure the code compiles free of static analysis warnings isn’t done by running the code. But, it is properly part of a software test plan if you think of the plan as ensuring that the software you’re shipping out meets quality and functionality expectations beyond those that are explicit product functions.
And while we’re at it, if any of the above areas aren’t in your software requirements, they should be. Typically you’re going to miss tests if there is nothing in the requirements saying that your product should have these capabilities.