TLA+ model of CCF’s Raft modifications#

CCF implements some modifications to Raft as it was originally proposed by Ongaro and Ousterhout. Specifically, CCF constrains that only appended entries that were signed by the primary can be committed. Any other entry that has not been signed is rolled back. Additionally, the CCF implementation introduced a variant of the reconfiguration that is different from the one proposed by the original Raft paper. In CCF CFT, reconfigurations are done via one transaction (as described here).

The TLA+ specification models the intended behavior of Raft as it is modified for CCF. Below, we explain several core parts of the specification in more detail.

You can find the full specification in the tla/ folder and more information on TLA+ here. Several good resources exist online, one good example is this guide.

Building blocks of the TLA+ spec#

The core model is maintained in the tla/raft_spec/ccfraft.tla file, however the constants defined in this file are controlled through the model check file tla/raft_spec/MCraft.tla.

This file controls the constants as seen below. In addition to basic settings of how many nodes are to be model checked and their initial configuration, the model allows to place additional limitations on the state space of the program.

\* Limit the terms that can be reached. Needs to be set to at least 3 to
\* evaluate all relevant states. If set to only 2, the candidate_quorum
\* constraint below is too restrictive.
TermLimit_mc == 4

\* Limit number of requests (new entries) that can be made
RequestLimit_mc == 2

\* Limit on number of request votes that can be sent to each other node
RequestVoteLimit_mc == 1

\* Limit number of reconfigurations
ReconfigurationLimit_mc == 1

\* Limit number of duplicate messages sent to the same server
MessagesLimit_mc == 1

\* Limit number of times a RetiredLeader server sends commit notifications
CommitNotificationLimit_mc == 1

\* We made several restrictions to the state space of Raft. However since we
\* made these restrictions, Deadlocks can occur at places that Raft would in
\* real-world deployments handle graciously. 
\* One example of this is if a Quorum of nodes becomes Candidate but can not
\* timeout anymore since we constrained the terms. Then, an artificial Deadlock
\* is reached. We solve this below. If TermLimit is set to any number >2, this is
\* not an issue since breadth-first search will make sure that a similar
\* situation is simulated at term==1 which results in a term increase to 2.
MaxSimultaneousCandidates_mc == 1

Possible state transitions in the model#

During the model check, the model checker will exhaustively search through all possible state transitions. The below snippet defines the possible state transitions at each step. Each of these transitions has additional constraints that have to be fulfilled for the state to be an allowed step. For example, BecomeLeader is only a possible step if the selected node has enough votes to do so.

\* Defines how the variables may transition.
Next == \/ \E i \in PossibleServer : Timeout(i)
        \/ \E i, j \in PossibleServer : RequestVote(i, j)
        \/ \E i \in PossibleServer : BecomeLeader(i)
        \/ \E i \in PossibleServer : ClientRequest(i)
        \/ \E i \in PossibleServer : SignCommittableMessages(i)
        \/ \E i \in PossibleServer : \E c \in SUBSET(PossibleServer) : ChangeConfiguration(i, c)
        \/ \E i,j \in PossibleServer : NotifyCommit(i,j)
        \/ \E i \in PossibleServer : AdvanceCommitIndex(i)
        \/ \E i,j \in PossibleServer : AppendEntries(i, j)
        \/ \E m \in messages : Receive(m)

Variables and their initial state#

The model uses multiple variables that are initialized as seen below. Most variables are used as a TLA function which behaves similar to a Map as known from Python or other programming languages. These variables then map each node to a given value, for example the state variable which maps each node to either Follower, Leader, Retired, or Pending. In the initial state shown below, all nodes states are set to the InitialConfig that is set in MCraft.tla.

\* Define initial values for all variables
InitReconfigurationVars == /\ ReconfigurationCount = 0
                           /\ Configurations = [i \in PossibleServer |-> << << 0, InitialServer >> >> ]
InitMessagesVars == /\ messages = {}
                    /\ messagesSent = [i \in PossibleServer |-> [j \in PossibleServer |-> << >>] ]
                    /\ commitsNotified = [i \in PossibleServer |-> <<0,0>>] \* i.e., <<index, times of notification>>
InitServerVars == /\ currentTerm = [i \in PossibleServer |-> 1]
                  /\ state       = [i \in PossibleServer |-> InitialConfig[i]]
                  /\ votedFor    = [i \in PossibleServer |-> Nil]
InitCandidateVars == /\ votesSent = [i \in PossibleServer |-> FALSE ]
                     /\ votesGranted   = [i \in PossibleServer |-> {}]
                     /\ votesRequested = [i \in PossibleServer |-> [j \in PossibleServer |-> 0]]
\* The values nextIndex[i][i] and matchIndex[i][i] are never read, since the
\* leader does not send itself messages. It's still easier to include these
\* in the functions.
InitLeaderVars == /\ nextIndex  = [i \in PossibleServer |-> [j \in PossibleServer |-> 1]]
                  /\ matchIndex = [i \in PossibleServer |-> [j \in PossibleServer |-> 0]]
InitLogVars == /\ log          = [i \in PossibleServer |-> << >>]
               /\ commitIndex  = [i \in PossibleServer |-> 0]
               /\ clientRequests = 1
               /\ committedLog = << >>
               /\ committedLogDecrease = FALSE
Init == /\ InitReconfigurationVars
        /\ InitMessagesVars
        /\ InitServerVars
        /\ InitCandidateVars
        /\ InitLeaderVars
        /\ InitLogVars

Basic functions and log changes#

Below, we shortly describe some basic functionality of the Raft model. Note that this is only a selection of the model, please refer to the full model for a full picture.

Timeout#

Since TLA does not model time, any node can time out at any moment as a next step. Since this may lead to an infinite state space, we limited the maximum term any node can reach. While this would be overly constraining in any actual program, the model checker will ensure to also explore those states that are feasible within these limits. Since interesting traces can already be generated with one or better two term changes, this approach is feasible to model reconfigurations and check persistence.

\* Server i times out and starts a new election.
Timeout(i) == \* Limit the term of each server to reduce state space
              /\ currentTerm[i] < TermLimit
              \* Limit number of candidates in our relevant server set 
              \* (i.e., simulate that not more than a given limit of servers in each configuration times out)
              /\ Cardinality({ s \in GetServerSetForIndex(i, commitIndex[i]) : state[s] = Candidate}) < MaxSimultaneousCandidates
              \* Only servers that are not already leaders can become candidates  
              /\ state[i] \in {Follower, Candidate}
              /\ state' = [state EXCEPT ![i] = Candidate]
              /\ currentTerm' = [currentTerm EXCEPT ![i] = currentTerm[i] + 1]
              \* Most implementations would probably just set the local vote
              \* atomically, but messaging localhost for it is weaker.
              \*   CCF change: We do this atomically to reduce state space
              /\ votedFor' = [votedFor EXCEPT ![i] = i]
              /\ votesRequested' = [votesRequested EXCEPT ![i] = [j \in PossibleServer |-> 0]]
              /\ votesSent' = [votesSent EXCEPT ![i] = TRUE ]
              /\ votesGranted'   = [votesGranted EXCEPT ![i] = {i}]
              /\ UNCHANGED <<reconfigurationVars, messageVars, leaderVars, logVars>>

Signing of log entries#

In CCF, the leader periodically signs the latest log prefix. Only these signatures are committable in CCF. We model this via special TypeSignature log entries and ensure that the commitIndex can only be moved to these special entries.

\* CCF extension: Signed commits
\* Leader i signs the previous messages in its log to make them committable
\* This is done as a separate entry in the log that has a different
\* message contentType than messages entered by the client.
SignCommittableMessages(i) ==
    /\ LET 
        log_len == Len(log[i])
       IN
        \* Only applicable to Leaders with a log that contains at least one message
        /\ state[i] = Leader
        /\ log_len > 0
        \* Make sure the leader does not create two signatures in a row
        /\ log[i][log_len].contentType /= TypeSignature
        /\ LET
            \* Create a new entry in the log that has the contentType Signature and append it
            entry == [term  |-> currentTerm[i],
                      value |-> clientRequests-1,
               contentType  |-> TypeSignature]
            newLog == Append(log[i], entry)
            IN log' = [log EXCEPT ![i] = newLog]
        /\ UNCHANGED <<reconfigurationVars, messageVars, serverVars, candidateVars, clientRequests,
                    leaderVars, commitIndex, committedLog, committedLogDecrease>>

Reconfiguration steps#

The one transaction reconfiguration is already described here. In the TLA model, a reconfiguration is initiated by the Leader which appends an arbitrary new configuration to its own log. This also triggers a change in the Configurations variable which keeps track of all running configurations.

In the following, this Configurations variable is then checked to calculate a quorum and to check which nodes should be contacted or received messages from.

\* CCF extension: Reconfiguration of servers
\* Leader can propose a change in the current configuration.
\* This will switch the current set of servers to the proposed set, ONCE BOTH
\* sets of servers have committed this message (in the adjusted configuration
\* this means waiting for the signature to be committed)
ChangeConfiguration(i, newConfiguration) == 
    \* Limit reconfigurations
    /\ ReconfigurationCount < ReconfigurationLimit
    \* Only leader can propose changes
    /\ state[i] = Leader
    \* Configuration is non empty
    /\ newConfiguration /= {}
    \* Configuration is a proper subset ob the Possible Servers
    /\ newConfiguration \subseteq PossibleServer
    \* Configuration is not equal to current configuration
    /\ newConfiguration /= Configurations[i][1][2]
    \* Keep track of running reconfigurations to limit state space
    /\ ReconfigurationCount' = ReconfigurationCount + 1
    /\ LET
           entry == [term |-> currentTerm[i],
                    value |-> newConfiguration,
                    contentType |-> TypeReconfiguration]
           newLog == Append(log[i], entry)
           \* Note: New configuration gets the index of its entry. I.e.,
            \* configurations are valid immediately on their own index
           newConf== Append(Configurations[i], << Len(log[i]) + 1, newConfiguration >>)
           IN 
           /\ log' = [log EXCEPT ![i] = newLog]
           /\ Configurations' = [Configurations EXCEPT ![i] = newConf]
    /\ UNCHANGED <<messageVars, serverVars, candidateVars, clientRequests,
                    leaderVars, commitIndex, committedLog, committedLogDecrease>>