Skip to content

Replication Internals

Contributor-facing documentation for FrogDB’s primary-replica replication system: WAL streaming internals, PSYNC protocol details, and the replication state machine.

For operator-facing setup and failover procedures, see Operations: Replication.


Primary: Write -> In-Memory -> WAL -> Replication Stream -> Replicas
|
WAL Archive (disk)
TypeTriggerMechanism
FULLRESYNCNew replica, WAL gap too largeRocksDB checkpoint transfer
PSYNCReconnection within backlogResume from last sequence number

Each dataset history has a unique identifier:

  • 40-character hex string (like Redis)
  • Changes on failover: New primary generates new ID
  • Secondary ID: Promoted replicas remember old primary’s ID
Primary A (repl_id: abc123...)
|
+-- Replica B (tracking abc123...)
|
[Primary A fails, B promoted]
|
Primary B (repl_id: def456..., secondary_id: abc123...)

This allows replicas of A to connect to B and continue incrementally.

fn generate_replication_id() -> String {
let mut bytes = [0u8; 20];
getrandom::getrandom(&mut bytes).expect("random bytes");
hex::encode(bytes) // 20 random bytes -> 40 hex characters
}

When a replica is promoted:

fn on_promotion(&mut self) {
self.secondary_repl_id = Some(self.repl_id.clone());
self.secondary_repl_offset = self.repl_offset;
self.repl_id = generate_replication_id();
}

Other replicas can PSYNC using either the new primary’s ID or the secondary ID.


Primary Replica
| |
|<---- PSYNC ? -1 ------------------| "I have no data"
| |
| Create RocksDB checkpoint |
| |
|----- FULLRESYNC <id> <seq> ------->|
| |
|----- [checkpoint files] ---------->| Transfer checkpoint
| |
| | Load checkpoint
| |
|<---- PSYNC <id> <seq> ------------| Ready for incremental
| |
|----- [WAL stream] --------------->| Continue streaming

If multiple replicas request FULLRESYNC simultaneously, FrogDB creates a single RocksDB checkpoint and streams it to all requesting replicas, amortizing the cost.

1. FULLRESYNC response: +FULLRESYNC <repl_id> <sequence>\r\n
2. Checkpoint header: $<total_size>\r\n
3. File entries (repeated):
*3\r\n
$<name_len>\r\n<filename>\r\n
:<file_size>\r\n
$<file_size>\r\n<file_data>\r\n
4. End marker: *1\r\n $3\r\nEOF\r\n
5. Checksum footer: $40\r\n<sha256_hex_of_all_files>\r\n

Checkpoint Integrity: Primary computes SHA256 of all file contents. Replica verifies after receiving all files. Mismatch triggers retry.


Replica Primary
| |
|----- AUTH [user] <password> ------>| (if primary_auth configured)
|<---- +OK -------------------------|
| |
|----- REPLCONF listening-port 6379->| (optional, for INFO output)
|<---- +OK -------------------------|
| |
|----- REPLCONF capa eof psync2 --->| (announce capabilities)
|<---- +OK -------------------------|
| |
|----- PSYNC <repl_id> <seq> ------>| (initiate sync)
|<---- +FULLRESYNC or +CONTINUE ----|

After initial sync, the primary continuously streams WAL entries to replicas. FrogDB uses RocksDB’s GetUpdatesSince() for incremental sync.

FrogDB uses TCP backpressure (no buffer limits) to avoid the “full sync loop” problem that affects Redis:

  • If a replica is slow, TCP send buffer fills naturally
  • Primary blocks on write (does not buffer unboundedly)
  • No risk of exhausting memory with replication buffers
  • Slow replicas cause primary to slow down rather than trigger full resync

This is a deliberate departure from Redis, which uses bounded output buffers and disconnects slow replicas, sometimes causing repeated full resyncs.


When min_replicas_to_write >= 1, writes wait for replica acknowledgment:

async fn execute_with_sync_replication(
cmd: &ParsedCommand,
handler: &dyn Command,
ctx: &mut CommandContext<'_>,
) -> Response {
let result = handler.execute(ctx, &cmd.args);
let seq = ctx.wal.append(&result.operation)?;
if ctx.config.min_replicas_to_write == 0 {
return result.response; // Async mode
}
let ack_future = ctx.replication_tracker.wait_for_acks(
seq,
ctx.config.min_replicas_to_write,
);
match tokio::time::timeout(ctx.config.replica_ack_timeout, ack_future).await {
Ok(Ok(_)) => result.response,
Ok(Err(_)) | Err(_) => {
// Write ALREADY committed on primary
Response::Error(Bytes::from_static(b"NOREPL Not enough replicas"))
}
}
}

When -NOREPL is returned, the write has already succeeded on the primary and is in the WAL. The error indicates replication durability was not confirmed, not that the write failed.


pub enum NodeRole {
Primary,
Replica { primary_addr: SocketAddr },
Standalone,
}
Command TypeReplica Behavior
Read (READONLY flag)Execute locally, may return stale data
Write (WRITE flag)Return -READONLY error
Replication commandsExecute (PSYNC, REPLCONF, etc.)
Admin commandsDepends on command
MetricDescription
last_write_seqSequence number of last write
pending_sync_writesWrites waiting for replica ACK
norepl_errorsWrites that failed due to replica timeout
replica_ack_wait_timeTime spent waiting for replica ACKs