In computer science, the test-and-set instruction is an instruction used to write (set) a flag value to a memory location and return its old value as a single atomic (i.e., non-Interrupt) operation. The caller can then "test" the result to see if the state was changed by the call. If multiple processes may access the same memory location, and if a process is currently performing a test-and-set, no other process may begin another test-and-set until the first process's test-and-set is finished. A central processing unit (CPU) may use a test-and-set instruction offered by another electronic component, such as DPRAM; a CPU itself may also offer a test-and-set instruction.
A lock can be built using an atomic test-and-set instruction as follows:
This code assumes that the memory location was initialized to 0 at some point prior to the first test-and-set. The calling process obtains the lock if the old value was 0, otherwise the while-loop spins waiting to acquire the lock. This is called a spinlock. At any point, the holder of the lock can simply set the memory location back to 0 to release the lock for acquisition by another--this does not require any special handling as the holder "owns" this memory location. "Test and test-and-set" is another example.
Maurice Herlihy (1991) proved that test-and-set (1-bit comparand) has a finite consensus number and can solve the wait-free consensus problem for at-most two concurrent processes. In contrast, compare-and-swap (32-bit comparand) offers a more general solution to this problem, and in some implementations wider compare-and-swap (64- or 128-bit comparand) is also available for extended utility.
Whether or not CPU 2 was trying to access the memory location, the DPRAM performs the test given by CPU 1. If the test succeeds, the DPRAM sets the memory location to the value given by CPU 1. Then the DPRAM wipes out its "internal note" that CPU 1 was writing there. At this point, CPU 2 could issue a test-and-set, which would succeed.
Whether or not CPU 2 was trying to access the memory location, the DPRAM now performs CPU 1's test. If the test succeeds, the DPRAM sets memory location A to the value specified by CPU 1. If the test fails, the DPRAM copies the value back from the special register to memory location A. Either operation wipes out the special flag value. If CPU 2 now issues a test-and-set, it will succeed.
The test and set instruction, when used with Boolean values, uses logic like that shown in the following function, except that the function must execute atomically. That is, no other process must be able to interrupt the function mid-execution, thereby seeing a state that only exists while the function executes. That requires hardware support; it cannot be implemented as shown. Nevertheless, the code shown helps to explain the behaviour of test-and-set. NOTE: In this example, 'lock' is assumed to be passed by reference (or by name) but the assignment to 'initial' creates a new value (not just copying a reference).
'''function''' TestAndSet(boolean_ref lock) {
boolean initial = lock;
lock = true;
'''return''' initial;
}
Not only is the code shown not atomic, in the sense of the test-and-set instruction, it also differs from the descriptions of DPRAM hardware test-and-set above. Here, the value being set and the test are fixed and invariant, and the value is updated regardless of the outcome of the test, whereas for the DPRAM test-and-set, the memory is set only when the test succeeds, and the value to set and the test condition are specified by the CPU. Here, the value to set can only be 1, but if 0 and 1 are considered the only valid values for the memory location, and "value is nonzero" is the only allowed test, then this equates to the case described for DPRAM hardware (or, more specifically, the DPRAM case reduces to this under these constraints). From that viewpoint, this can, correctly, be called "test-and-set" in the full, conventional sense of that term. The essential point to note is the general intent and principle of test-and-set: a value is both tested and set in one atomic operation such that no other program thread or process can change the target memory location after it is tested but before it is set. (This is because the location must only be set if it currently has a certain value, not if it had that value sometime earlier.)
In the C programming language, the implementation would be like:
int test_and_set(int* lock_ptr) {
int old_value;
// -- Start of atomic segment --
// This should be interpreted as pseudocode for illustrative purposes only.
// Traditional compilation of this code will not guarantee atomicity, the
// use of shared memory (i.e., non-cached values), protection from compiler
// optimizations, or other required properties.
old_value = *lock_ptr;
*lock_ptr = LOCKED;
// -- End of atomic segment --
return old_value;
}
The code also shows that there are really two operations: an atomic read-modify-write and a test. Only the read-modify-write needs to be atomic. (This is true because delaying the value comparison by any amount of time will not change the result of the test once the value to test has been obtained. Once the code writes the initial value, the result of the test has been established, even if it has not been computed yet — e.g., by the == operator.)
void critical() {
// Spin lock: loop forever until we get the lock.
// We know the lock was successfully obtained after exiting
// this while loop because the test_and_set() function locks
// the lock but returns the _previous_ lock value.
// If (and only if) the previous lock value was 1 then the lock was
// **already** locked by another thread or process, so we stay in the
// loop and retry.
// When the previous lock value was 0, that indicates the
// lock was **not** locked before we locked it. It **is** locked now
// because _we_ locked it, so we own the lock and can exit the spinloop.
while (test_and_set(&lock) == 1);
critical section // only one process can be in this section at a time
// Release the lock when finished with the critical section.
// It was 1 (because we locked it).
// Any other process that "changed" it since then
// was not in the critical section, so the
// other processes also set it to 1, but
// did not obtain the lock or enter the
// critical section, and (either gave up or)
// are still waiting in their own spinloops.
lock = 0;
}
This spin lock function can be called by multiple processes, but it is guaranteed that only one process will be in the critical section at a time. The rest of the processes will keep spinning until they get the lock. It is possible that a process is never granted the lock. In such a case it will loop endlessly. This is a drawback of a spin lock implementation as it doesn't ensure fairness. These issues are further elaborated in the performance section.
tsl reg, flag ; Test and Set Lock; flag is the
; shared variable; it is copied
; into the register reg and flag
; then atomically set to 1.
cmp reg, #0 ; Was flag zero on entry_region?
jnz enter_region ; Jump to enter_region if
; reg is non-zero; i.e.,
; flag was non-zero on entry.
ret ; Exit; i.e., flag was zero on
; entry. If we get here, tsl
; will have set it non-zero; thus,
; we have claimed the resource
; associated with flag.
leave_region:
move flag, #0 ; store 0 in flag
ret ; return to caller
Here tsl is an atomic instruction and flag is the lock variable. The process doesn't return unless it acquires the lock.
Test-and-set scores low on two of them, namely, high bus traffic and unfairness.
When processor P1 has obtained a lock and processor P2 is also waiting for the lock, P2 will keep incurring bus transactions in attempts to acquire the lock. When a processor has obtained a lock, all other processors which also wish to obtain the same lock keep trying to obtain the lock by initiating bus transactions repeatedly until they get hold of the lock. This increases the bus traffic requirement of test-and-set significantly. This slows down all other traffic from cache and Cache coherence misses. It slows down the overall section, since the traffic is saturated by failed lock acquisition attempts. Test-and-test-and-set is an improvement over TSL since it does not initiate lock acquisition requests continuously.
When we consider fairness, we consider if a processor gets a fair chance of acquiring the lock when it is set free. In an extreme situation the processor might starve i.e. it might not be able to acquire the lock for an extended period of time even though it has become free during that time.
Storage overhead for TSL is next to nothing since only one lock is required. Uncontended latency is also low since only one atomic instruction and branch are needed.
|
|