When a new node has joined the network or when a failed node needs to be recovered, latest committed snapshot file can be copied to the node before it is started. It will then automatically resume from the latest snapshot file (see Join or Recover From Snapshot).
The new/recovered node may also need to have access to all
.committed ledger files in some cases, for example, if the node needs to serve historical queries. It is therefore safe to back up all the
.committed ledger and snapshot files. It is recommended to have two separate directories on each node - one being a read-write directory where all the ledger and snapshot files reside and another shared read-only directory where only the
.committed ledger and snapshot files reside.
The read-only directory could be a shared mounted directory which is accessible to all the nodes in the network. The shared read-only ledger and snapshot directories can be specified via the
snapshot.read_only_directory configuration options respectively.
It is recommended to have the most-up-to-date copies of
.committed ledger and snapshot files (see Best Practices) in the read-only directory. Operators must take care to avoid any race conditions in the copy process.
It is recommended for operators to backup the ledger and snapshot files as soon as they become committed (i.e.
.committed included in file name). While a majority of nodes will eventually have an identical copy of the ledger, the ledger file should be the most up-to-date on the current primary node. Snapshot files are only generated by the current primary node. As such, monitoring the directories specified by
snapshots.directory for the current primary node allows operators to retrieve the latest ledger and snapshot files.
It is the responsibility of the operator to move/copy these files safely to avoid “ledger holes”, i.e. historical ledger files not being available to a new node that started from a recent snapshot.
A low value for
ledger.chunk_size means that smaller ledger files are generated and can thus be backed up by operators more regularly, at the cost of having to manage a large number of ledger files.
Similarly, a low value for
snapshots.tx_count means that snapshots are generated often and that join/recovery time will be short, at the cost of additional workload on the primary node for snapshot generation.
Uncommitted ledger files (which are likely to contain committed transactions) should also be used on recovery, as long as they are copied to the node’s