Garnet Cluster Replication

Garnet cluster support asynchronous replication following a simple leader-follower (i.e. primary-replica) model. This allows the replica to be exact copies of the primary. Replication can be leveraged to scale reads and mitigate the effects of primary failures by promoting replicas to primary. In this page, we present an overview of the replication protocol and discuss the different persistent/consistency options available to the cluster operator.

Garnet Replication Properties

Garnet cluster implements replication through log shipping. Primaries utilize the Append-Only-File (AOF) to record insert/update operations. A dedicated background task iterates through the log pages and transmits them in bulk to the corresponding replica. Replicas will iterate through the received log pages, replay all insert/update operations in order, and generate their own AOF. Primaries are generally free to take arbitrary checkpoints. When the primary takes a checkpoint, it will insert a checkpoint marker into the AOF. As a side effect of this, when any of the attached replicas receive the corresponding records, it will initiate its own checkpoint. For this reason, an exact version of the data is identified from the following triplet:

[replication-id, checkpoint-version, replication-offset]

Allowing replicas to take their own checkpoints speeds up the recovery process because in the common case replicas can recover from their own checkpoint and replay their AOF before connecting to their primary and resuming synchronization. On rare occasions, the replica must perform a full synchronization by retrieving the latest checkpoint from the primary. Partial synchronization involves recovering from a local checkpoint (if any) and replaying only the primary's AOF log after the corresponding Checkpoint Covered Replication Offset (CCRO). Full synchronization involves retrieving the latest primary checkpoint and recovering from it before starting to accept any new AOF log records.

The replica decides if it will perform either type of synchronization as follows:

If the primary replication-id differs from the replica replication-id, a full synchronization must occur.
If the primary checkpoint version differs from the replica checkpoint, a full synchronization must occur.
If none of the above conditions are true, replica can recover from local checkpoint and then continue replaying from the AOF offsets that is missing.

Replication Performance vs Durability Options

Under normal workload the AOF log will keep growing with every insert/update operation until a checkpoint is taken. Newly added AOF log records will initially reside in memory and then flushed to disk as new records are being added. For write heavye workloads, this could severely impact query performance at the primary. In addition, newly configured replicas, added to the cluster, could face longer synchronization times since they have to replay the AOF log records starting from the corresponding CCRO.

The cluster operator can choose between various replication options to achieve a trade-off between performance and durability. A summary of these options is shown below:

Fast AOF Truncation (FAT) This option forces the primary to aggressively truncate the AOF so it does not spill into disk. It can be used in combination with aof-memory option which determines the maximum AOF memory buffer size. When a replica attaches to a primary with FAT turned on, the AOF is not guaranteed to be truncated which may result in writes being lost. To overcome this issue FAT should be used with ODC.
On Demand Checkpoint (ODC) This option forces the primary to take a checkpoint if no checkpoint is available when replica tries to attach and recover. If a checkpoint becomes or was availalbe and the CCRO has not been truncated, then the primary will lock it to prevent truncation while a replica is recovering. In this case, they AOF log could spill to disk as the AOF in memory buffer becomes full. Under normal operation, the AOF is truncated and no spilling to disk will happen.

Using the aforementioned options, we guarantee that no writes are lost while a replica is attaching and replication remains highly performant. However, due to the asynchronous nature of replication, if a primary goes down before unexpectedly, the replicas might have not received the latest AOF log records, resulting in lost writes.

Creating a Garnet Cluster with Replication

The process of setting up a cluster with replication is similar to setting up a regular cluster while also enabling the aof option. This should done for every instance in the cluster.

GarnetServer.exe --cluster --checkpointdir='data/' --aof

The AOF log file is stored within the checkpointdir folder which by default is the working folder. By default every Garnet instance will take the role of an empty primary. The cluster operator has to initially assign the slots to a subset of those primaries. The rest of the instances can be coverted to replicas by using the CLUSTER REPLICATE command. An example of setting up a cluster with a single primary and 2 replicas is shown below:

PS C:\Dev> redis-cli -h 192.168.1.26 -p 7000 cluster myid
"15373cf386c5090a5f8a25f819c96b04a756381c"
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7000 cluster addslotsrange 0 16383
OK
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7000 cluster set-config-epoch 1
OK
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7001 cluster set-config-epoch 2
OK
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7002 cluster set-config-epoch 3
OK
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7000 cluster meet 192.168.1.26 7001
OK
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7000 cluster meet 192.168.1.26 7002
OK
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7001 cluster replicate 15373cf386c5090a5f8a25f819c96b04a756381c
OK
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7002 cluster replicate 15373cf386c5090a5f8a25f819c96b04a756381c
OK
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7000 cluster nodes
15373cf386c5090a5f8a25f819c96b04a756381c 192.168.1.26:7000@17000,DESKTOP-BT8KHK1 myself,master - 0 0 1 connected 0-16383
5868649e4205bf6ea97e4b7a12f101c3aed96b71 192.168.1.26:7001@17001,DESKTOP-BT8KHK1 slave 15373cf386c5090a5f8a25f819c96b04a756381c 0 0 4 connected
947692ec7fb4a919ae7d0d7e314e9dcbcbd99774 192.168.1.26:7002@17002,DESKTOP-BT8KHK1 slave 15373cf386c5090a5f8a25f819c96b04a756381c 0 0 5 connected
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7001 cluster nodes
5868649e4205bf6ea97e4b7a12f101c3aed96b71 192.168.1.26:7001@17001,DESKTOP-BT8KHK1 myself,slave 15373cf386c5090a5f8a25f819c96b04a756381c 0 0 4 connected
15373cf386c5090a5f8a25f819c96b04a756381c 192.168.1.26:7000@17000,DESKTOP-BT8KHK1 master - 0 0 1 connected 0-16383
947692ec7fb4a919ae7d0d7e314e9dcbcbd99774 192.168.1.26:7002@17002,DESKTOP-BT8KHK1 slave 15373cf386c5090a5f8a25f819c96b04a756381c 0 0 5 connected
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7002 cluster nodes
947692ec7fb4a919ae7d0d7e314e9dcbcbd99774 192.168.1.26:7002@17002,DESKTOP-BT8KHK1 myself,slave 15373cf386c5090a5f8a25f819c96b04a756381c 0 0 5 connected
15373cf386c5090a5f8a25f819c96b04a756381c 192.168.1.26:7000@17000,DESKTOP-BT8KHK1 master - 0 0 1 connected 0-16383
5868649e4205bf6ea97e4b7a12f101c3aed96b71 192.168.1.26:7001@17001,DESKTOP-BT8KHK1 slave 15373cf386c5090a5f8a25f819c96b04a756381c 0 0 4 connected

There is not limit on the number of replicas attached to a given primary. Currently, we do not support chained replication.

Querying a Replica

By default replicas only serve read queries but can also be configure to process write requests. This option is available by issuing once READWRITE command just before executing any write operation in a single client session. Issuing READONLY will toggle back to serving read queries.

If a replica is set to process read-only queries, it will respond with -MOVED to any write requests, redirecting them to the primary that is replicating.

PS C:\Dev> redis-cli -h 192.168.1.26 -p 7001 -c
192.168.1.26:7001> set x 1234
-> Redirected to slot [16287] located at 192.168.1.26:7000
OK
192.168.1.26:7000> set x 1234
OK
192.168.1.26:7000> get x
"1234"
192.168.1.26:7000> exit
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7001 -c
192.168.1.26:7001> get x
"1234"
192.168.1.26:7001> exit
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7002
192.168.1.26:7002> get x
"1234"

Checkpointing & Recovery

As noted above primaries can take arbirtrary checkpoints and propagate that information to the replicas. Replicas will identify the checkpoint record in the AOF file and initiate a checkpoint on its own. Taking a checkpoint at the primary is as easy as calling SAVE or BGSAVEcommand. When the checkpoint operation completes, it is recorded into the AOF log. The replicas will receive that record and initiate its own checkpoint.

An example of checkpoint propagation is shown below:

PS C:\Dev> redis-cli -h 192.168.1.26 -p 7000
192.168.1.26:7000> lastsave
(integer) 0
192.168.1.26:7000> bgsave
Background saving started
192.168.1.26:7000> lastsave
(integer) 1706555027
192.168.1.26:7000> exit
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7001
192.168.1.26:7001> lastsave
(integer) 1706555029
192.168.1.26:7001> exit
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7002
192.168.1.26:7002> lastsave
(integer) 1706555028
192.168.1.26:7002>

Replication Information

The system admin can track the progress of replication by retrieving the replication information section using info replication command. A summary of the available fields returned is shown below:

role: role of the instance being queried connected_slaves: number of replicas connected to the instance. master_failover_state: The state of an ongoing failover, if any. master_replid: replication id of this Garnet server instance. master_replid2: secondary replication id (currently not being used). master_repl_offset: current primary AOF offset. store_current_safe_aof_address: AOF address covered by latest checkpoint store_recovered_safe_aof_address: AOF address covered by the recovered checkpoint object_store_current_safe_aof_address: AOF address covered by latest checkpoint of object store object_store_recovered_safe_aof_address: AOF address covered by recovered checkpoint of object store

192.168.1.26:7000> info replication
# Replication
role:master
connected_slaves:2
master_failover_state:no-failover
master_replid:dd194cb04b18b31373157e22d0ab36df9cc6a04a
master_replid2:
master_repl_offset:136
second_repl_offset:136
store_current_safe_aof_address:0
store_recovered_safe_aof_address:0
object_store_current_safe_aof_address:0
object_store_recovered_safe_aof_address:0
slave0:ip=192.168.1.26,port=7001,state=online,offset=136,lag=0
slave1:ip=192.168.1.26,port=7002,state=online,offset=136,lag=0
192.168.1.26:7000> exit
PS C:\Dev> redis-cli -h 192.168.1.26 -p 7001
192.168.1.26:7001> info replication
# Replication
role:slave
connected_slaves:0
master_failover_state:no-failover
master_replid:dd194cb04b18b31373157e22d0ab36df9cc6a04a
master_replid2:
master_repl_offset:136
second_repl_offset:136
store_current_safe_aof_address:0
store_recovered_safe_aof_address:0
object_store_current_safe_aof_address:0
object_store_recovered_safe_aof_address:0
master_host:192.168.1.26
master_port:7000
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:False
slave_read_repl_offset:136
slave_priority:100
slave_read_only:1
replica_announced:1
192.168.1.26:7001>

Diskless Replication

When AOF gets truncated, full synchronization requires taking a checkpoint and sending that checkpoint over to the attaching replica. This operation can be expensive because it involves multiple I/O operations at the primary and replica. For this reason, we added a variant of full synchronization called diskless replication. This is implemented using a streaming checkpoint that allows clients to continue issuing read and writes at the primary while attaching replicas synchronize. To enable diskless replication the server needs to be started with the following flags

--repl-diskless-sync=true This is used to enable diskless replication

--repl-diskless-sync-delay=<seconds>. This is used to determine how many seconds to wait before starting the full sync, in order to give the opportunity to multiple replicas to attach and receive the streaming checkpoint.

There is no additional requirements to that of using the aforementioned flags in order to leverage diskless replication. The APIs for mapping replicas remain the same (i.e. CLUSTER REPLICATE, REPLICAOF etc.).

Note that diskless replication does not take an actual checkpoint. Hence every time a full sync is performed, the AOF is not automatically truncated (unless FAT flag is used). This happens to ensure durability in the event of a failure which will not be possible if the AOF gets truncated without a persistent checkpoint. However, the store version gets incremented to ensure consistency across different instances that may be fully synced at different times. Users can still utilize SAVE/BGSAVE commands or --aof-size-limit to periodically take a checkpoint and safely truncates the AOF.

Garnet Replication Properties

Replication Performance vs Durability Options​

Creating a Garnet Cluster with Replication

Querying a Replica​

Checkpointing & Recovery​

Replication Information​

Diskless Replication

Replication Performance vs Durability Options

Querying a Replica

Checkpointing & Recovery

Replication Information