|
Testing Computer Software Second Edition
Common Software Errors - Control Flow Errors
|
This is the appendix from the best-selling book Testing Computer Software, 2nd ed.
Copyright © 1988 by Cem Kaner Copyright © 1993 by Cem Kaner, Jack Falk, Hung Quoc Nguyen
This is part 7 of 13.
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
[ 5 ]
[ 6 ]
[ 7 ]
[ 8 ]
[ 9 ]
[ 10 ]
[ 11 ]
[ 12 ]
[ 13 ]
|
CONTROL FLOW ERRORS
The control flow of a program describes what it will do next, under what circumstances. A control flow error occurs when the program does the wrong thing next. Extreme control flow errors stop the program or cause it to run amok. Many simple errors lead to spectacular misbehavior.
PROGRAM RUNS AMOK
The program displays garbage onscreen, saves garbage to disk, starts printing forever, or goes to some otherwise totally inappropriate routine. Eventually, it may stop dead. Whatever the exact behavior, the program's actions are out of your control. These are the most spectacular bugs, and are usually the easiest to find and fix.
From the outside, these bugs can look the same. They all make the program go out of control. The descriptions below are examples of the causes of programs running amok. You would not test specifically for one of these errors unless you knew something about the programming language, the programmer's style, or the internal design.
GOTO somewhere
GOTO transfers control to another part of the program. The program jumps to the specified
routine, but this is obviously the wrong place. The program may lock, the screen display may
be inappropriate, etc.
The GOTO command is unfashionable. The structured programming movement is centered
on a belief that GOTO encourages sloppy thinking and coding (Yourdon, 1976).
Errors involving GOTO are especially likely when:
- The program branches backward, going somewhere it's been before. For example, the GOTO may jump to a point just past validity checking or initialization of data or devices.
- The GOTO is indirect, going to an address stored in a variable. When the variable's value changes, the GOTO takes the program somewhere else. It's hard to tell, when reading the code, whether the variable has the right value at the right time.
Come-from logic errors
A routine uses come-from logic if it changes what it does based on what routine called or jumped to it. Errors arise when the routine fails to correctly identify what called it or does the wrong thing after correctly identifying the caller. Calling routines often set flags or other variables to identify themselves, but a few different routines may use the same flag to mean different things, some may reset a flag when they're done with it while others don't, and some may give it values that the called routine doesn't expect. Come-from logic is fragile, and particularly prone to failure during maintenance programming.
Problems in table-driven programs
A table-driven program uses a table (array) of addresses. Depending on the value of some variable(s), the program selects a table entry and jumps to the memory address stored there. The table may be a data file read from disk that can be changed without recompiling the program. Table-driven programming can make code easier to maintain, but it has risks:
- The numbers in the table might be wrong, especially if they were entered by hand. These incorrect addresses could send the program anywhere.
- If the table is long, it is easy to supply the wrong entry for a given case, and easy to miss this when desk-checking the code.
- Suppose the table has five entries, and the program selects one of the five based on the value of a state variable. What if the variable can take on six values? Where does the program go in the sixth case?
- It's easy to forget to update a jump table when modifying the code.
Executing data
You can't tell from a byte's contents whether it holds a character, part of a number, part of a memory location, or a program instruction. The program keeps these different types of information in different places in memory to keep straight which byte holds what type of data. If the program interprets data as instructions, it will try to execute them and will probably lock. It may print odd things on the screen first. Some computers detect execution of "impossible" commands and stop the program with an error message (usually a hexadecimal message flagging program termination or reference to an illegal machine code.)
The program will treat data as if they were instructions under two conditions:
- (a) Data are copied into a memory area reserved for code. The code is overwritten. Examples of how to do this:
- Pointers are variables which store memory addresses. A pointer might hold the starting address of an array; the programmer could put a value in the fourth element of the array by saying store it in the fourth location after the address stored in this pointer. If the address in the pointer is wrong, the data go to the wrong place. If the address is in the code space, the new data overwrite the program.
- Some languages don't check array limits. Suppose you have an array MYARRAY, with three elements, MYARRAY[1], MYARRAY[2], and MYARRAY[3]. What happens if the program tries to store a value in MYARRAY[2044]? If the language doesn't catch this error, the data will be stored in the spot that would have been MYARRAY[2044] if that MYARRAY element existed. This memory location is a few thousand bytes past the end address of MYARRAY. It might be reserved for code, data, or hardware I/O, but not for MYARRAY.
- (b) The program jumps to an area of memory that is reserved for data, and treats it like an area containing code.
- A bad table entry in a table-driven program can lead the program to jump into a data area.
- Some computers divide memory into segments. The computer interprets anything in a code segment as instructions, and anything in a data segment as numbers or characters. If the program misstates a segment's starting address, what the computer interprets as a code segment will probably be a combination of code and data.
Jumping to a routine that isn't resident
To save room, computers may swap pieces of large programs in and out of memory. These pieces are called overlays: when one is in memory, the others aren't. When another is needed, the computer reads it from disk and stores it in the same area of memory used by the first overlay. When routines in the first overlay are again needed, they are again read into the shared area of memory. The routine in memory right now is resident in memory.
Before using a routine that is part of an overlay, the program must check that the right overlay is resident. Otherwise, when it jumps to what should be the starting address of the routine, it may be jumping into the middle of some other routine.
Overlays can also cause performance problems. The programmer might ensure that a routine is resident by always loading it from disk before jumping to it. This wastes a lot of computer time if he calls the routine many times. The program will also waste time if it alternates between routines that are part of two different overlays. This is called thrashing: the program loads the first overlay, executes the first routine, then overwrites it with the second overlay to execute the second routine, reloads the first overlay, etc. It spends most of its time loading overlays rather than getting work done.
Re-entrance
A re-entrant program can be used concurrently by two or more processes. A re-entrant subroutine can call itself or be called by any other routine while it's executing. Some languages don't support re-entrant subroutine calls: if a routine tries to call itself, the program crashes. Even if the language allows re-entrance, a given program or routine might not be. If a routine is serving two processes, how does it keep its data separate, so that what it does for one process doesn't corrupt what it does for the other?
Variables contain embedded command names
Some language dialects ignore spaces. A
phrase like PRINTMYNAME would be interpreted by the language as PRINT MYNAME. The program would attempt to print the value of variable MYNAME. This is an error if the user was trying to define a variable named PRINTMYNAME. This type of error is usually caught by the programmer, but occasional ones do survive.
Wrong returning state assumed
Imagine a subroutine that's supposed to set a device's baud rate. The program calls this routine and assumes that the routine did its job successfully. It starts transmitting through the device as soon as possible. This time, the routine failed. The transmission fails and the program hangs waiting for acknowledgment of the data. As another example, suppose a routine usually scales the data passed to it, passing back numbers that lie between 1 and 10. Under exceptional circumstances, the routine will scale from 0 to 10 instead. Because the calling program assumes that it will never receive a 0, it crashes on a divide by 0 error.
Exception-handling based exits
Suppose that a routine designed to calculate square roots sets an error flag but does no computations when asked to take the square root of a negative number. The idea behind error flags is that the calling program can decide how to deal with the problem. One might print an error message, another display a help screen, and a third might send the number to a slower routine built for complex numbers. Subroutines that flag and reject exceptional conditions can be used under more conditions. However, each time one is called, the caller must check that it did what the programmer expected it to do. If the exit-producing conditions are rare, he may miss them. During testing, they may show up as "irreproducible" bugs.
Return to wrong place
The key difference between a subroutine and a GOTO is that when the subroutine ends, it returns to the part of the program that called it, whereas GOTO never returns. Occasionally, a subroutine can return to the wrong place in a program. The next few sections are examples.
Corrupted stack
When a subroutine finishes, program control returns to the command following the call to the subroutine. The address of that command is stored in a data structure called a stack. The top of the stack holds the address most recently pushed onto it. The subroutine returns to the address stored at the top of the stack. If the stack is only used to hold return addresses, it is called a Call/Return Stack. Most stacks are also used as a temporary spot to stash data.
If a subroutine puts data on the stack and doesn't remove them before finishing, the computer will treat the number(s) at the top of the stack as a return address. The subroutine might "return" anywhere in memory.
Stack under/overflow
The stack might only be able to hold 16, 32, 64, or 128 addresses. Imagine a stack that can only hold 2 return addresses. When the program calls Subroutine 1, it stores a return address on the stack. When Subroutine 1 calls Subroutine 2, another return address goes on the stack. When Routine 2 ends, control goes back to Routine 1, and when Routine 1 ends, control returns to the main body of the program.
What if Subroutine 2 calls Subroutine 3? The stack is storing 2 return addresses already, so it cannot also hold the return address for Subroutine 3. This is a stack overflow condition. Programs (or central processing chips) often compound a stack overflow problem by replacing the oldest stored return addresses with the new one. The program will now return to Subroutine 2 when Subroutine 3 is done. From Subroutine 2 it returns to Subroutine 1. From Subroutine 1 it returns to...???...there is no return address for Subroutine 1. This is a stack underflow.
GOTO rather than RETURN from a subroutine
Subroutine 1 calls routine 2. Routine 2 GOTOs back to 1, rather than returning normally. The return address from routine 2 to 1 is still on the stack. When subroutine 1 finishes, the program will return to the address stored on the stack, which takes it back to subroutine 1. This is rarely intentional.
To avoid this error, subroutine 2 might POP (remove) its return address from the stack when it does its GOTO back to routine 1. Used incorrectly, this can cause stack underflows, returns to the wrong calling routine, and attempts to return to data values stored on the stack with the return addresses.
Interrupts
An interrupt is a special signal that causes the computer to stop the program in progress and branch to an interrupt handling routine. Later, the program restarts from where it was interrupted. Input/output events, including signals from the clock that a specified interval of time has passed, are typical causes of interrupts.
Wrong interrupt vector
When an interrupt signal is generated, the computer has to find the interrupt handling routine, then branch to it. The address of the interrupt handler is stored in a dedicated location in memory. The computer jumps to the address stored in that location. If the computer can distinguish between several different types of interrupts, it finds a given interrupt's handler in a list of addresses stored in a dedicated section of memory. This list is called the interrupt vector.
If wrong addresses are stored in the interrupt vector, any error might be possible in response to an interrupt-generating event. If the addresses are merely out of order, the program is less likely to run amok but it might try to echo characters onscreen in response to a clock signal, or treat keyboard inputs as if they flagged time-outs.
Failure to restore or update interrupt vector
A program can change the interrupt vector by writing new addresses into the appropriate memory locations. If a module temporarily changes the interrupt vector, it might not restore the old address list on exit. Another might fail to make a permanent (or temporary) change to the vector. In either case, the computer will branch to the wrong place after the next interrupt.
Failure to block or unblock interrupts
Programs can block most interrupts, instructing the computer to ignore blockable interrupts. For example, it's traditional to block interrupts just before starting to write data to a disk and to unblock immediately after output to the disk is complete. This prevents many data transmission errors.
Invalid restart after an interrupt
The program is interrupted, then restarted. In some systems, at restart time, the program gets a message or other indication that it was interrupted. The message usually identifies the type of interrupting event (keyboard I/O, time-out, modem I/O, etc.). This is useful. For example, if a program knows it was interrupted, it can repaint the screen with information it was showing before the interrupt. The programmer might easily specify the wrong action or a branch to the wrong location in response to a signal that a certain type of interrupt has been executed. Programmers are as unlikely to catch these errors as error-handling errors.
PROGRAM STOPS
Some languages will stop a program when certain types of errors are detected. Some programs aren't designed to stop, nor are the languages designed to stop them, but they do anyway.
Not all halts are control flow errors. If the program code says "If this happens, halt," the program is supposed to stop. It is a user interface error, but not a control flow error, if this program stops unexpectedly, without a message.
Dead crash
In a dead crash, the computer stops responding to keyboard input, stops printing, and leaves lights on or off (but doesn't change them). It usually locks without issuing any warnings that it's about to crash. The only way to regain control is to turn off the machine or press the reset key.
Dead crashes are usually due to infinite loops. One common loop keeps looking for acknowledgment or data from another device (printer, another computer, disk, etc.). If the program missed the acknowledgment, or never gets one, it may stay in this wait loop forever.
Syntax errors reported at run-time
An interpreted language may not check syntax until run-time. When the language finds a command that it can't interpret, it prints an error message and stops the program. Any line of code that the programmer didn't test may have a syntax error.
Waits for impossible condition, or combination of conditions
The program stops (usually a dead crash) waiting for an event that cannot occur. Common examples:
- I/O failure: The computer sends data to a broken output device, then waits forever for the device to acknowledge receipt. A similar problem arises between processes in a multi-processing system. One process sends a request or data to another, then waits forever for a response that never arrives.
- Deadly embrace: This is a classic multi-processing problem. Two programs run simultaneously. Each needs the same two resources (say, a printer and extra memory for a printer buffer). Each grabs one resource, then waits forever for the other program to finish with the other resource.
- Simple logic errors: For example, a program is supposed to wait for a number between 1 and 5, discarding all other input. However, the code testing the input reads: IF INPUT > 5 AND INPUT < 1. No number can meet this condition so the program waits forever.
Similarly, in multi-processing systems, one process may wait forever for another to send it an impossible value.
Wrong user or process priority
A computer that runs many programs at once switches between them. It runs one program for a while, then switches to a second, to a third, eventually returning to the first. Multiprocessing systems run smoothly because a scheduling program switches back to programs when events like keyboard input happen or when a program has been suspended for too long.
If two programs have been waiting equally long to run, or if the same type of event happens to trigger each, the scheduler must decide which program to run first. It uses a priority system: priorities might be assigned to users or programs. The program being run by a higher priority user will run first.
Some programs run at such low priorities, they may be suspended for hours before being restarted. This may be appropriate. In other cases, priorities were incorrectly assigned or interpreted. Less extreme priority errors are more common but harder to detect unless they trigger race conditions.
LOOPS
There are many ways to code a loop, but they all have some things in common. Here's one example:
1 SET LOOP_CONTROL = 1
2 REPEAT
3 SET VAR = 5
4 PRINT VAR * LOOP_CONTROL
5 SET LOOP_CONTROL = LOOP_CONTROL + 1
6 UNTIL LOOP_CONTROL > 5
7 PRINT VAR
The program sets LOOP_CONTROL to 1, sets VAR to 5, prints the product of VAR and LOOP_CONTROL, adds 1 to LOOP_CONTROL then checks whether LOOP_CONTROL is greater than 5. Since LOOP_CONTROL is only 2, it repeats the code inside the loop (lines 3, 4, and 5). The loop keeps repeating until LOOP_CONTROL reaches 6. Then the program executes the next command after the loop, printing the value of VAR.
LOOP_CONTROL is called the loop control variable. Its value determines how many times the loop is executed. If the expression written after the UNTIL is complex, involving many different variables, it is a loop control expression, rather than a loop control variable. The same types of errors arise in both cases.
Infinite loop
If the condition that terminates the loop is never met, the program will loop forever. Modify the example so that it loops until LOOP_CONTROL was less than 0 (never happens). It will loop forever.
Wrong starting value for the loop control variable
Suppose that, later in the program, there is a GOTO to the start of the loop at line 2. LOOP_CONTROL could have any value. It probably isn't 1. If the programmer expects this loop to repeat five times (as it would if the GOTO was to line 1), he is in for a surprise.
Accidental change of the loop control variable
In the example, the value of LOOP_CONTROL changed inside the loop. A bigger loop might change LOOP_CONTROL in more than one place (especially if it calls a subroutine that uses LOOP_CONTROL), and the program might repeat the loop more or less often than the programmer expects.
Wrong criterion for ending the loop
Perhaps the loop should end when LOOP_CONTROL > 5 rather than when LOOP_CONTROL >_ 5. This is a common mistake. And, if the ending criterion is more complex, it is more prone to error.
Commands that do or don't belong inside the loop
In the example, SET VAR = 5 is inside the loop. The value of VAR doesn't change inside the loop, so VAR is still 5 the second, third, fourth, and fifth times the loop executes. Resetting VAR to 5 each time is wasteful. Some loops repeat thousands of times: unnecessary repetition within them is significant.
Alternatively, suppose VAR did change inside the loop. If the programmer wants VAR to start at 5 each time the loop repeats, he has to say SET VAR = 5 at the head of the loop.
Improper loop nesting
One loop can be nested (completely included) inside another. It is not possible (without error) for one loop to start inside another but to end outside of it.
IF, THEN, ELSE, OR MAYBE NOT
An IF statement has the form:
IF This_Condition IS TRUE
THEN DO Something
ELSE DO Something_Else
For example:
IF VAR > 5
THEN SET VAR_2 = 20
ELSE SET VAR_2 = 10
The THEN clause (SET VAR_2 = 20) is only executed if the condition (VAR > 5) is met. If the condition is not met, the ELSE clause (SET VAR_2 = 10) is executed. Some IF statements only specify what to do if the condition is met. They don't include an ELSE clause. If the condition is not met (VAR =< 5) the program skips the THEN clause and moves on to the next line of code.
Wrong inequalities (e.g., > instead of >_)
The tested condition (VAR > 5) might be incorrect, or incorrectly stated. Programmers often forget to consider the case in which the two variables are equal.
Comparison sometimes yields wrong result
The condition tested by the IF is usually the right one, but not always. Suppose the programmer wants to test whether three variables are the same. He might write IF (VAR + VAR_2 + VAR_3) / 3 = VAR.
If VAR, VAR_2, and VAR_3 are the same, the average of them will have the same value as any one of them. Further, for almost all values, if VAR, VAR_2, and VAR_3 are not the same, their average will not equal VAR. But suppose that VAR is 2, VAR_2 is 1, and VAR_3 is 3. (VAR+VAR_2+ VAR_3) / 3 = VAR, but VAR, VAR_2, and VAR_3 aren't equal. Shortcuts like this that try to combine a few comparisons into one regularly go awry.
Not equal versus equal when there are three cases
The three-case problem often comes up during maintenance programming. The initial code may have restricted VAR's values to 0 and 1, but later changes allow it to be 2 as well. In the original program, an IF statement for VAR = 0 was fine. The THEN clause covered VAR = 0, and, since VAR could only be 0 or 1, the ELSE clause said what to do when VAR = 1. Now that VAR can also be 2, the ELSE clause is probably wrong.
It's risky to compare a variable to only one value (like VAR = 0), leaving all the others to the same ELSE clause. There are so many other possible values: some may arise as originally unanticipated special cases.
Testing floating point values for equality
Floating point calculations are subject to truncation and round off errors. For example, rather than being exactly zero, a variable's value might be 0.000000008 because of small computational errors. This is close, but it wouldn't pass a test of equality (IF VAR = 0).
Confusing inclusive and exclusive OR
Many IF statements test whether one of a group of conditions is true (IF A OR B is true, THEN ...) Unfortunately, "or" is ambiguous:
- inclusive or: satisfied if A is true, B is true or both A and B are true
- exclusive or: satisfied if A is true or B is true, but not if A and B are both true
Incorrectly negating a logical expression
IF statements sometimes take the form, IF A is NOT true, THEN. Programmers often carry out the negation incorrectly or don't think through what the negation means. For example, IF NOT (A or B) THEN ... means IF A is false AND B is false, THEN.... The THEN clause will not be taken if A or B is true, even if the other is false.
Assignment-equal instead of test-equal
In the C language, if (VAR = 5) means SET VAR = 5, then test whether it's nonzero. Programmers often write this instead of if (VAR == 5), which means what some people think if (VAR = 5) should mean.
Commands belong inside the THEN or ELSE clause
Here's a simple example of this type of error:
IF
VAR = VAR_2
THEN SET VAR_2 = 10
SET VAR_2 = 20
Clearly, SET VAR_2 = 20 belongs inside an ELSE clause. As it is now, VAR_2 is always set to 20. Setting VAR_2 to 10 first, when VAR = VAR_2, has no effect.
Commands that don't belong inside either clause
Sometimes the programmer will include a command inside a THEN or an ELSE clause that should always be executed (i.e., in both cases). If he repeats the command inside both clauses, he wastes code space, but this usually doesn't matter. If he includes it only inside one clause, it will be missed whenever the other clause (ELSE or THEN) is executed.
Failure to test a flag
For example, the program calls a subroutine, which is supposed to assign a value to a variable. The subroutine fails, sets its error flag, and leaves the variable alone. The program doesn't check the error flag. Instead, it does its usual IF test on the variable. Whatever code the program executes from here is wrong, or is right only by luck. The value stored in the variable is junk. That's what the subroutine means when it sets the flag.
Failure to clear a flag
A subroutine set its error flag the last time it was called. The flag is still set. This time the subroutine does its task and leaves the error flag alone. The error flag is still set. The program will believe this error flag, ignore the subroutine's output, and do error recovery instead.
MULTIPLE CASES
An IF statement considers only two cases: an expression is either true or false. Commands like CASE, SWITCH, SELECT, and computed GOTO are used when a variable might have many different values and the programmer wants to do one of many different things depending on the value.
The typical command of this type is equivalent to this:
IF VAR is 1 do TASK-1
IF VAR is 2 do TASK-2
IF VAR is 3 do TASK-3
IF VAR is anything else, do DEFAULT-TASK
If there is no default case, the program falls through to the commands following this multiple choice block.
Missing default
A programmer who thinks VAR can only take on the values listed may not write a default case. Because of a bug or later modifications to the code, VAR can take on other values. A default case could catch these, and print any unexpected value of VAR.
Wrong default
Suppose the programmer expects VAR to have only four possible values. He explicitly deals with the first three possibilities, and buries the other one as the "default." Will this default be correct for VAR's unanticipated fifth and sixth values?
Missing cases
VAR can take on five possible values but the programmer forgot to write a CASE statement covering the fifth case.
Case should be subdivided
Some cases cover too much: perhaps one case covers all values of VAR below 30, but the program should do one thing if VAR is below 15 and something else for larger values. The most common example of this problem is the default case. The programmer doesn't think it matters what happens if VAR has certain values, so he covers them all with the default.
Overlapping cases
The CASE statements are equivalent to this:
IF VAR > 5 then do TASK_1
IF VAR > 7 then do TASK_2
etc.
The first and second cases overlap. If VAR is 9, it fits in both cases. Which should be executed? The first task is the usual choice. Sometimes both are. Sometimes the second one is the correct choice.
Invalid or impossible cases
The program executes TASK_16 only if VAR < 6 AND VAR > 18. TASK_16 can never run because VAR can't meet this condition. Similarly, the program might specify a value that VAR can't reach in practice, even though it's not an impossible number. You won't see this type of problem unless you look at the code, but it wastes code space and may reflect fuzzy thinking.
|