Find liveness bugs effectively
Now that you learned how to write liveness specifications in Coyote, let’s see how to effectively test for liveness bugs.
The presence of monitors that have Hot
and Cold
states implicitly specifies two assertions.
Monitor states that are marked neither Hot
nor Cold
are called warm states. First, any
terminated execution of the program must not have a monitor in a hot state. Second, the program
should not have infinite executions that remain in hot (or warm) states infinitely without
transitioning to a cold state.
While the former is a safety property and easily checked, the latter requires generation of infinite executions, which is not really possible in practice and we must resort to heuristics. Coyote maintains a temperature for each monitor. The temperature goes up by a unit if the monitor transitions to a hot state, it goes to zero on a transition to a cold state and stays the same on transition to a warm state. The Coyote tester looks for executions where the temperature of a monitor exceeds a particular large threshold because it indicates a long suffix stuck in hot/warm states without transitioning to a cold state. The definition of what is large is where you can help the tester.
The tester accepts a flag --max-steps N
. Using this flag, you can say that the program is expected
to execute around N steps per test iteration. Executions substantially longer than N are treated as
potential infinite executions. But what is a step and how does one estimate N? This can be done
using a few iterations of the tester. For example, consider the failover coffee machine using
actors sample program. Assuming you have
built the samples you can test it with
the coyote tool as follows, setting N steps as 200.
From the samples directory:
coyote test ./Samples/bin/net8.0/CoffeeMachineActors.dll -i 10 -ms 200 -s portfolio
The coyote test
tool will produce output, ending with something like the following:
... Testing statistics:
..... Found 0 bugs.
... Exploration statistics:
..... Explored 40 schedules: 40 fair and 0 unfair.
..... Number of scheduling decisions in fair terminating schedules: 153 (min), 457 (avg), 1066 (max).
..... Exceeded the max-steps bound of '200' in 95.00% of the fair schedules.
... Elapsed 1.4882005 sec.
. Done
Note the line that reads Exceeded the max-steps bound of '200' in 95.00% of the fair schedules
. It
means that the program execution exceeded 200 steps several times (95% of the times) to reach
termination. The line above it indicates that execution lengths ranged from 153 steps to 1066 steps,
averaging 457 steps. Going by this output, let’s decide to increase the bound to 1000 and re-run
coyote test
.
coyote test ./Samples/bin/net8.0/CoffeeMachineActors.dll -i 10 -ms 1000 -s portfolio
This time the output will be something like:
... Testing statistics:
..... Found 0 bugs.
... Exploration statistics:
..... Explored 40 schedules: 40 fair and 0 unfair.
..... Number of scheduling decisions in fair terminating schedules: 88 (min), 657 (avg), 2411 (max).
..... Exceeded the max-steps bound of '1000' in 27.50% of the fair schedules.
... Elapsed 3.3885497 sec.
. Done
The testing is a little bit slower: taking a bit more than 3 seconds for the same number of iterations. But the tester hit the bound much fewer times, making the testing much more effective as it more often covers the entire length of program execution. In general, it is not necessary to make this percentage go to zero. Often times, programs can exceed their expected length of execution, either because of bugs, or because of corner-case scheduling that delays important events more than usual. You could pick a max-steps bound that is very large. However, this will slow down the tester (when using unfair schedulers; see the next section for details). Thus, it is recommended that you do a few iterations with the tester before settling down on the desired max-steps bound.
To understand the details behind fair and unfair scheduling that is mentioned in the output above, let’s move on to next section targeted towards more advanced usage of Coyote.
Fair and unfair scheduling
See the following technical paper that explains the concept behind fair and unfair scheduling:
Fair stateless model checking. Madan Musuvathi and Shaz Qadeer. PLDI 2008.
Consider a program with two actors A and B. The actor A continuously sends an event to itself until it receives a message from B. The actor B is ready to send the message to A immediately upon creation. (Contrast this example to Figure 3 of the paper.) This program has an infinite execution: where A is continuously scheduled without giving B a chance. Such an infinite execution is called unfair because B is starved over an infinitely long period of time, which is unrealistic in modern systems.
The coyote test
tool works by taking over the scheduling of the Coyote program. It uses one of
several schedulers: algorithms that decide which actor to schedule next. A scheduler is called
fair if it is not expected to generate unfair executions. For example, the random scheduler, which
makes decisions on the next actor to schedule randomly, is fair. In the program described above, it
is very likely that B will be given a chance to execute. Some schedulers don’t have this property
and are called unfair schedulers. Unfair schedulers have a role to play in finding violations of
safety properties, but not in finding violations of liveness properties. The prioritization
scheduler of Coyote (enabled with the --strategy prioritization
option) is unfair.
Because of their nature, unfair schedulers are expected to generate longer than usual executions.
The unfairness in scheduling can lead to starvation of certain actors, which may stall progress. The
expected length of a program’s execution is best determined by looking at lengths of “fair
terminating executions”, i.e., executions that terminate under a fair scheduler. For this reason, we
provide the fair-prioritization
scheduler (enabled with the --strategy fair-prioritization
option) which uses the prioritization
scheduler for a prefix of each execution and then switches
to the default fair random
scheduler for the remaining of the execution.
When a user supplies the flag --max-steps N
, executions under an unfair scheduler are forced to
stop after N steps. Whereas, an execution under a fair scheduler can go to up 10N steps. Further, if
the execution stays in a hot state for more than 5N steps, a liveness bug is flagged. Alternatively,
you can supply the flag --max-fair-steps M
to limit fair schedulers to explore only up to M steps
(instead of 10N). If you decide to use the --max-fair-steps M
flag, you can optionally set the
unfair steps using the --max-unfair-steps N
flag.
There are other smarter heuristics available in the tester as well that do away with the need for such bounds by looking for lasso shaped executions. If interested, read more about it in the following paper from Microsoft Research.
Lasso Detection using Partial-State Caching. FMCAD 2017.
To avoid having to think which scheduler works best for which situation, we recommend running
coyote test
using the portfolio scheduler (enabled with the --strategy portfolio
option) which
consists of a carefully tuned selection of fair schedulers (including random
, prioritization
and
fair-prioritization
).