-
Outsourced Software Testing Services | Software Test Automation | QA Training | Quality Assurance Consulting | Our Clients | Downloads | About Us | Contact Us
#
LogiGear
search: Search

home >> resources >> Common Software Errors >> Race Conditions

>> Home
>> QA City
>>
Latest articles
Classic articles
Articles by others
Resource directory
>> White papers
>> Newsletter archives
>> RSS feed
>> Books
>> Contact us


For more information:
Contact Us

Printer friendly:
PDF version

QA City

AddThis Social Bookmark Button

Testing Computer Software Second Edition

Common Software Errors - Race Conditions


This is the appendix from the best-selling book
Testing Computer Software, 2nd ed.

Copyright © 1988 by Cem Kaner
Copyright © 1993 by Cem Kaner, Jack Falk, Hung Quoc Nguyen

This is part 9 of 13.

[ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ]
[ 10 ] [ 11 ] [ 12 ] [ 13 ]

RACE CONDITIONS

In the classic race, there are two possible events, call them EVENT_A and EVENT_B. Both events will happen. The issue is which comes first. EVENT_A almost always precedes EVENT_B. There are logical grounds for expecting EVENT_A to precede EVENT_B. However, under rare and restricted conditions, EVENT_B can "win the race," and occur just before EVENT_A. We have a race condition whenever EVENT_B precedes EVENT_A. We have a race condition bug if the program fails when this happens. Usually the program fails because the programmer didn't anticipate the possibility of EVENT_B preceding EVENT_A, so he didn't write any code to deal with it.

Few testers look for race conditions. If they find an "irreproducible" bug, few think about timing issues (races) when trying to reproduce it. Many people find timing issues hard to conceptualize or hard to understand. We provide more than our usual amount of detail in the examples below, hoping that this will make the overall concept easier to understand.

RACES IN UPDATING DATA

Imagine that one routine reads a credit card balance from the disk, adds the amount of the card holder's latest purchase, and writes the new balance back to the disk. A second routine reads the same balance, subtracts the latest payment, and saves the result to disk. A third routine adds foreign currency transactions. Each of these routines can run concurrently. Each runs quickly, and there are many different card holders, so the following scenario is most unlikely:

A credit card has a balance of $1000. The card holder has just made a purchase for $100 and a payment of $500. The correct balance is thus $600. However, the first routine reads the $1000 balance from the disk. While it adds the $100 purchase, the second routine reads the same card holder's balance (still $1000). Then the first routine stores the new balance ($1100) to disk. The second routine subtracts the $500 payment amount from the $1000 balance that it read from the disk. It saves the new balance ($500) to disk. The $100 addition made by the first routine has been completely lost because the second routine read the balance before the first routine had finished updating it.

This is a race condition: it should almost never happen that the second routine will read the balance after the first routine has started changing it but before the first routine finishes. However, it can happen and occasionally it will.

ASSUMPTION THAT ONE EVENT OR TASK HAS FINISHED BEFORE ANOTHER BEGINS

The previous and the next sections provide examples of this type of problem.

ASSUMPTION THAT INPUT WON'T OCCUR DURING A BRIEF PROCESSING INTERVAL

You type a character. The editing program you're testing receives it, moves other displayed characters around on the screen so it can display this one at the cursor location, echoes the received character, then looks for your next input. Naturally, since the computer is faster than the finger, the program should get everything done and be ready for the next input long before you're ready to type it. Accordingly, the program doesn't allow for the possibility that other characters will arrive before it's done with this one. However, a fast typist might enter two, three, or more characters before the editor is ready for them. The editor catches the last one typed and misses the others, which were typed while it was in the middle of dealing with the first one.

ASSUMPTION THAT INTERRUPTS WON'T OCCUR DURING A BRIEF INTERVAL

The program is doing time-critical operations, such as:

  • writing bits to the right place on a spinning disk or a moving cassette tape
  • getting a pen to draw at the right place on a moving sheet of paper
  • responding to a message or acknowledging input within a short time period

The programmer realizes that these operations take very little time. Since it's so unlikely for an interrupt-triggering event to happen in this brief interval, why take the time to block interrupts during it? Usually all goes well, but every now and again the program will be interrupted.

Failure to block interrupts was raised earlier ("Program runs amok: Interrupts"). There the focus was on the problems of interrupts. Here the point is one of timing. Even if part of a program is brief, if it lasts long enough that an interrupt-triggering event can happen during this interval, then some day an interrupt-triggering event will happen during the interval.

RESOURCE RACES: THE RESOURCE HAS JUST BECOME UNAVAILABLE

Two processes both need the same printer. The one that takes control of the printer first gets it. The other has to wait. Even though there's a "race" here, this is not a race condition in concurrent systems. Programs are, or should be, written to expect the printer (or other sharable resources) to be temporarily unavailable.

Suppose, though, that one process checks whether a printer is available. If the printer is busy, the program does something else. If the printer is available, the program starts to use it. Since the program knows the printer is available, it doesn't consider the possibility that the printer is unavailable.

Unfortunately, there is a short window of vulnerability between the time that a process checks whether the printer is available and the time it takes the printer over. It takes a little time to examine the variable that says that the printer is free, call the right routine when the printer is available,find the data it's supposed to print, etc. During this short period, a second routine might take over the printer and start printing.

Some programmers would argue that this is a rare event. They're right. The window of vulnerability is so small that it is hard to set up a situation in which the second process can snatch away the printer just before the first one gets back to it. However, these processes maybe run by customers thousands or millions of times. Even unlikely race conditions will occur in use. If the consequences of a race condition bug are severe enough, it must be fixed even it will only happen once per million times that the program is used.

ASSUMPTION THAT A PERSON, DEVICE, OR PROCESS WILL RESPOND QUICKLY

For example, the program puts a message on the screen and waits for a response for a few seconds. If you don't respond during this time-out interval, the program decides that you aren't there and halts. Similarly, another program trying to initialize a printer will wait only so long. The program will report that the printer is unavailable if it doesn't respond by the end of the time-out interval. Programs also impose time-outs while waiting for messages from other processes.

Very short time-out intervals cause races. If you have to press a key within a few tenths of a second after a program displays its message, you will often lose the race, the program will decide that you aren't there, and stop. If it gives you a few seconds, you will usually win the race, but sometimes the time-out interval will end just as you notice and respond to the message. If the program gives you minutes to respond, it is probably safe to assume that you are not there or are not going to respond. The intervals are different for device, or process responses, but the principle is the same. Some intervals are too short, some are just a little too short, and some are plenty long enough.

If the interval is too short, the programmer has probably anticipated that the program will time out before the person, device or process has had a chance to complete its response. Since this isn't an unusual case, he probably has good recovery code to deal with this. This isn't a classical race condition.

If the interval is just a little too short, the risks are higher. The programmer might believe that if the program doesn't receive a response within the specified period, it will never receive a response. What happens if the response arrives milliseconds after the time-out interval has ended? The program might interpret this as a response to some other message, or it might just crash. This is a classic race because it is unlikely, but not impossible, for the response to occur after the time-out period is over.

OPTIONS OUT OF SYNCH DURING A DISPLAY CHANGE

The computer displays a menu and waits for your response. Triggered by a time-out or by another event (a message or a device input), the program switches to another menu. You press a key just as the program is writing the new menu. Here are two possible errors:

  • Even though it's displaying the new options, the program will interpret your keypress as selecting a choice from the old menu if it hasn't yet updated its list of choices associated with keystrokes.
  • Even though it's displaying the old options, the program will interpret your keypress as selecting a choice from the new menu, because it updated its key-to-option list before displaying the new values onscreen.

This is a real-user problem. Experienced users of a program know when the menu will change. Many make their responses as soon as possible, so they will frequently press a key just as the screen is being repainted.

TASK STARTS BEFORE ITS PREREQUISITES ARE MET

The program starts sending data to the printer just before the printer is ready, starts trying to fill memory with data just before it's assigned a memory area to work with, etc. Perhaps the program is supposed to wait until it receives a specific message from another process before starting the task. But based on other information (such as other messages), the programmer knows that the trigger message will come soon. He starts this task early, to improve performance. The prerequisite tasks are usually completed in time, but occasionally the program is just barely ahead of them.

MESSAGES CROSS OR DON'T ARRIVE IN THE ORDER SENT

Suppose you have $1000 in a bank account, and you try to do three things, in order:

(a) withdraw $1000
(b) deposit $500
(c) withdraw $100

The first withdrawal goes through. The deposit is accepted, but when you try withdraw the $100, you're told that your account's balance is zero, not $500. For some reason, your deposit has taken longer to process than your request for a withdrawal.

Problems of this class are common in message-passing systems: some messages are transmitted along circuitous routes, or their contents have to be verified, or for some other reason they don't arrive at their target process or aren't read by it before another message that was sent later. As a result, until the system catches up, you aren't (are) able to do something that you should (shouldn't) be able to do.

The most annoying version of this involves contradictory messages that cross each other's path. One process requests an action of another. The second process sends a message indicating that it can do that task (e.g., gives a receipt for the $500), but then sends a message saying that it can't (your balance is zero). The verification message that you deposited $500 reaches you and the central database at the same time, and just after your request for $100 reaches the database. To the database, it seemed that you asked for $100, then deposited $500, but because you received early verification, it seems to you that the database should have known about the $500 when it rejected the withdrawal.


-      
newsletter | RSS | site map |
-

1 (800) 322-0333   © 2008 LogiGear Corporation. All rights reserved.   Legal Notice.   Privacy Policy.