Raft consensus protocol with mocks for testing
In the previous example you created an Azure application that uses Coyote and performs messaging using Azure Service Bus. This is a great way to build a reliable application or service. But there is overhead in using an enterprise scale service bus, which limits our ability to fully test the state machine.
Clearly a fault-tolerant server consensus protocol needs to be thoroughly tested, which is what you
will do in this tutorial. First you will
mock the Azure Service Bus which allows the Coyote
tester to perform thousands of tests per second and thereby find
bugs in the application code more efficiently.
Then you will use the
coyote test tool to explore the code using different test strategies until
you achieve a high level of confidence that the code is rock solid.
What you will need
You will also need to:
- Install Visual Studio 2019.
- Install the .NET 6.0 version of the coyote tool.
- Be familiar with the
coyotetool. See using Coyote.
- Clone the Coyote git repo.
Build the sample
You can build the sample by following the instructions here.
Run the Raft.Mocking application
Now you can run
coyote test tool on the Raft.Mocking application:
coyote test ./Samples/bin/net6.0/Raft.Mocking.dll -i 1000 -ms 200 --coverage activity
You should see the test succeed with output like this, including a coverage report and graph:
. Testing ./Samples/bin/net6.0/Raft.Mocking.dll Starting TestingProcessScheduler in process 34068 ... Created '1' testing task. ... Task 0 is using 'random' strategy (seed:1388735316). ..... Iteration #1 ..... Iteration #2 ..... Iteration #3 ..... ..... Iteration #900 ..... Iteration #1000 ... Emitting coverage reports: ..... Writing .\Samples\bin\net6.0\Output\Raft.Mocking.dll\CoyoteOutput\Raft.Mocking.dgml ..... Writing .\Samples\bin\net6.0\Output\Raft.Mocking.dll\CoyoteOutput\Raft.Mocking.coverage.txt ..... Writing .\Samples\bin\net6.0\Output\Raft.Mocking.dll\CoyoteOutput\Raft.Mocking.sci ... Testing statistics: ..... Found 0 bugs. ... Scheduling statistics: ..... Explored 1000 schedules: 0 fair and 1000 unfair. ..... Hit the max-steps bound of '200' in 100.00% of the unfair schedules. ... Elapsed 61.3283634 sec. . Done
Now you are seeing a longer more realistic test run. But if you create a
--verbose log you will
see that in these 61 seconds the test actually tested over 2.4 million async operations!!
In this case you should see
Total event coverage: 100.0% which is a great sign, this means every
possible event has been sent and received by every state of every state machine that you tested here.
This is not the same thing as 100% code coverage, but it is a higher level of abstraction on
coverage. Both are important.
The coverage report has a separate section for each
StateMachine that was covered by
the test. It then lists the overall event coverage as a percentage, then lists the details of each
State. For actors there will be only one state listed, matching the name of the
For example, the
Server section reports the
Candidate state like this:
State: Candidate State event coverage: 100.0% Events received: Microsoft.Coyote.Actors.Timers.TimerElapsedEvent, AppendLogEntriesRequestEvent, AppendLogEntriesResponseEvent, VoteRequestEvent, VoteResponseEvent Events sent: AppendLogEntriesResponseEvent, VoteRequestEvent, VoteResponseEvent Previous states: Candidate, Follower Next states: Candidate, Follower, Leader
This shows 100% coverage for the state, meaning all expected events have been received, it lists
Events received, and it also lists all
Events sent while in this state. Lastly it
shows all recorded state transitions in and out of this state.
Previous states are the states
that transitions to the
Candidate state and
Next states are the states that the
state went to using
RaiseGotoStateEVent transitions. The key thing to look for here is that
during this test run some of the
Server state machines did make it to the
Notice because of the mocking of Azure API’s this application is now able to run 200 steps per iteration and a thousand iterations pretty quickly, much faster than if all those messages were going to Azure and back. This means the test can quickly explore every kind of asynchronous timing of events to find all the bugs. Not only is it faster but it is also systematic in how it explores every possible interleaving of asynchronous operations. This systematic approach ensures the test doesn’t just test the same happy paths over and over (like a stress test does) but instead it is more likely to find one bad path where a bug is hiding.
--coverage report also generates a DGML diagram of all the
messages sent during the test. You can browse these graphs using Visual Studio. The file name in
this case is
Raft.Mocking.dgml and it will look something like this:
Here you see all the
Actor objects in green, and
Monitor in blue and
ExternalCode objects in gray. You can see that all the states were explored in the
Leader state. The
MockStateMachineTimer is a helper
Actor provided as a mock
There are many different
coyote test command line options you can play with to test different things
and really increase your confidence level in the code you are testing. For example there are 4 different
test scheduling options you can play with:
||Choose the random scheduling strategy (this is the default)|
||Choose the priority-based scheduling strategy with
||Choose the fair priority-based scheduling strategy with
||Choose the portfolio scheduling strategy|
These options change how
coyote test explores the large state space of possible schedules for your
async operations. The last option is interesting because it allows you to test many different
scheduling strategies at once:
coyote test ./Samples/bin/net6.0/Raft.Mocking.dll -i 1000 -ms 200 --coverage activity -s portfolio
When you use this the test will print the chosen strategies at the top of the test output:
... Task 3 is using 'fair-prioritization' strategy (seed:3922897588). ... Task 4 is using 'probabilistic' strategy (seed:3469760925). ... Task 2 is using 'probabilistic' strategy (seed:1642014516). ... Task 1 is using 'fair-prioritization' strategy (seed:1466235705). ... Task 0 is using 'random' strategy (seed:3931672516).
You can also increase the number of iterations to 10,000,000 if you want to, then come back tomorrow and see how it did. Clearly, you should now see that Coyote makes it possible to thoroughly test your code automatically, without you having to write a million individual boring unit tests that test every little possibility in terms of sending and receiving messages.
Manual unit testing has a place in software engineering, but what Coyote does is more powerful. With Coyote you can explore a very large number of combinations and find lots of bugs which greatly increases confidence in your code.
One interesting thing about Coyote is that every time you run a test you might explore a slightly
different space and so continuous integration testing where you run tests on every checkin, or
longer test suites every night, has a higher chance of finding that one ridiculously embarrassing
bug before you go live with your service. You might wonder, if there is randomness in the testing,
how will you reproduce bugs that are found? The answer lies in the random seed printed above
(seed:3469760925). This is all you need to re-run an identical test, so interesting random seeds
can also become your regression test suite. Imagine, an entire interesting regression test is
simply one integer. There is one restriction to this, if the product code you are testing changes
the set of non-deterministic choices, then prior saved random seeds are no longer usable, but you
can quickly build a new set by simply running more tests. Comparing this with the cost of
maintaining large static regression test suite code bases, this is quite an improvement.
Instead of building lots of little manual unit tests, you will find that when using Coyote your time
is better spent designing interesting mocks that accurately model all the kinds of weird things that
can happen when interacting with external systems. You also should write lots of interesting
specifications throughout these mocks that use
Assert to ensure everything is running properly
during the test. You also should use the
Monitor pattern to ensure all the global invariants you
care about in your system remain true no matter what a specific test run is doing.
The following diagram illustrates how the
MockClient actor sends
ClientRequestEvents, and how
MockClusterManager subclasses from
ClusterManager. There is also a
IServerManager interface and a
RaftTestScenario class which sets everything up.
Notice that the
Server code you are testing here is the exact same production ready code you used
in the Raft actor service (on Azure) tutorial. You should now see that this is very
cool. You have switched the
Server from running on Azure to running locally with a bunch of
mocks and didn’t have to change one line of
For this test we also inject a
SafetyMonitor into the process by simply registering it on the
runtime like this:
This enables the monitor so that when the
MockServerHost sends the
it can keep track and make sure there is only one leader per term.
MockClusterManager implementation is very simple, since at test time all
are in the same process, a broadcast operation is simply a for-loop over those servers, sending the
broadcast event to each one using
MockServerHost is a bit tricky, since all the
Server actors have to be created before you
start them using the
NotifyJoinedServiceEvent otherwise the
MockClusterManager might be too
quick and start sending events to a
Server instance before it is ready. Now you can see why the
IServerManager interface was designed this way in the original Raft.Azure example.
RaftTestScenario then creates all the actors needed to run the test including all the mock
actors and the real
Server instances. This is all relatively simple compared to the original
Raft.Azure example that was also dealing with setting up the Azure Service Bus.
This mock test setup is able to fully test the
Server implementation and get good coverage. The
IServerManager interface and the abstract
ClusterManager state machine where originally designed
with testability in mind and that’s what makes this easy mocking of the external pieces possible.
The test also includes a coyote
SafetyMonitor which provides a global invariant
check, namely checking there is never more than one
Server that is elected to be the
the same time. The
Monitor class in Coyote shows how to inject additional work that you want to do
at test time only, and have almost no overhead in the production code. Hopefully you agree it is
pretty easy to create a monitor and that monitoring like this is a powerful concept that adds
a lot of value to your Coyote testing process.
In this tutorial you learned:
How to mock external systems like Azure Service Bus to make a Coyote test run fast.
How to use the
coyote testcommand line to explore different test strategies.
How to read a Coyote coverage report and view the coverage graph.
How to inject test logic like the
SafetyMonitorin to the
coyote testscenario that monitors the overall correctness of your system in a way that has minimal overhead in your production code.
How to think about model based testing using random seeds and integrate that into your continuous integration testing process.
You can also explore the
Raft.Nondeterminism.dll version of this sample that injects a bug in the
system by randomly sending duplicate
VoteRequestEvents. Then you can see how the
tool is able to spot the resulting bugs because of the