Data Persistence¶
Durability¶
Persistence to disk is handled by host-side code, outside the enclave, by code that is not attested and may be controlled by an attacker in the CCF threat model.
As a result, CCF cannot make a formal guarantee about data persistence. Durability relies on the operator maintaining enough healthy nodes, and making regular backups of the ledger. To minimise the risk of data loss on node failure, the CCF host component issues an fflush()
call on every transaction as soon as it becomes committable (i.e. followed by a signature).
The operator can further minimize the risk of data loss by running their CCF-based service on a larger number of nodes.
Directories¶
When a new node has joined the network or when a failed node needs to be recovered, the latest committed snapshot file can be copied to the node before it is started. The node will then automatically resume from the latest snapshot file (see Join or Recover From Snapshot).
The new/recovered node may also need to have access to all .committed
ledger files in some cases, for example, if the node needs to serve historical queries. It is therefore safe to back up all the .committed
ledger and snapshot files. It is recommended to have two separate directories on each node - one being a read-write directory where all the ledger and snapshot files reside and another shared read-only directory where only the .committed
ledger and snapshot files reside.
The read-only directory could be a shared mounted directory which is accessible to all the nodes in the network. The shared read-only ledger and snapshot directories can be specified via the ledger.read_only_directories
and snapshot.read_only_directory
configuration options respectively.
It is recommended to have the most-up-to-date copies of .committed
ledger and snapshot files (see Best Practices) in the read-only directory. Operators must take care to avoid any race conditions in the copy process.
Best Practices¶
It is recommended for operators to backup the ledger and snapshot files as soon as they become committed (i.e. .committed
included in file name). While a majority of nodes will eventually have an identical copy of the ledger, the ledger file should be the most up-to-date on the current primary node. Snapshot files are only generated by the current primary node. As such, monitoring the directories specified by ledger.directory
and snapshots.directory
for the current primary node allows operators to retrieve the latest ledger and snapshot files.
Note
It is the responsibility of the operator to move/copy these files safely to avoid “ledger holes”, i.e. historical ledger files not being available to a new node that started from a recent snapshot.
A low value for ledger.chunk_size
means that smaller ledger files are generated and can thus be backed up by operators more regularly, at the cost of having to manage a large number of ledger files.
Similarly, a low value for snapshots.tx_count
means that snapshots are generated often and that join/recovery time will be short, at the cost of additional workload on the primary node for snapshot generation.
Note
Uncommitted ledger files (which are likely to contain committed transactions) should also be used on recovery, as long as they are copied to the node’s ledger.directory
directory.