VLL (Very Lightweight Locking)
Design rationale and internals of FrogDB’s VLL transaction coordination mechanism.
Overview
Section titled “Overview”VLL provides atomic multi-shard operations without traditional mutex-based locking. Key principles:
- Intent-based locking: Operations declare keys they’ll access
- Out-of-order execution: Non-conflicting operations bypass the queue
- Transaction ID ordering: Global monotonic counter prevents deadlocks
- Shard locks for conflicts: Only operations with key conflicts wait
Design Goals
Section titled “Design Goals”- Allow concurrent execution of non-conflicting operations
- Provide atomic semantics for multi-key operations
- Prevent deadlocks via deterministic ordering
- Minimize latency for single-shard operations
VLL Use Cases
Section titled “VLL Use Cases”VLL coordinates atomicity for ALL operations touching multiple internal shards:
| Operation Type | Example Commands | VLL Behavior |
|---|---|---|
| Multi-key commands | MGET, MSET, DEL (multi), RENAME, COPY | Lock all touched shards |
| Transactions | MULTI/EXEC | Lock shards on EXEC |
| Lua scripts | EVAL, EVALSHA | Lock shards before execution |
| Blocking commands | BLPOP (multi-key), BRPOP (multi-key) | Lock during wake evaluation |
| Set operations | SINTER, SUNION, SDIFF, SMOVE | Lock all source shards |
| Sorted set ops | ZUNIONSTORE, ZINTERSTORE, ZDIFF | Lock source and dest shards |
| List operations | LMOVE, BLMOVE, LMPOP | Lock source and dest shards |
Single-node vs Cluster behavior:
- In standalone mode with
allow-cross-slot-standalone = true: VLL coordinates cross-shard atomicity - In cluster mode: Cross-slot operations return
-CROSSSLOTerror (VLL only applies within a node)
How VLL Works
Section titled “How VLL Works”Intent Table
Section titled “Intent Table”Each shard maintains an intent table tracking which keys have pending operations:
pub struct IntentTable { /// key -> list of pending intents intents: HashMap<Bytes, Vec<Intent>>,}
pub struct Intent { txid: u64, mode: LockMode, ready_tx: oneshot::Sender<ShardReadyResult>,}
pub enum LockMode { Shared, // Read-only access (multiple concurrent OK) Exclusive, // Write access (exclusive)}Transaction Flow
Section titled “Transaction Flow”- Acquire txid from global atomic counter
- Register intents on all participating shards (sorted order to prevent deadlocks)
- Wait for all shards to signal ready
- Execute on all shards
- Release intents on all shards
Deadlock Prevention
Section titled “Deadlock Prevention”Lock acquisition in sorted shard order prevents circular waits. The global txid counter ensures a total ordering — operations with lower txid execute before higher ones.
Out-of-Order Execution
Section titled “Out-of-Order Execution”Non-conflicting operations can bypass the queue. If operation A holds a Shared lock on key X, and operation B wants a Shared lock on key X, B can proceed immediately. Only Exclusive locks block.
Atomicity Semantics
Section titled “Atomicity Semantics”VLL provides execution atomicity (isolation) through ordered multi-shard locking:
- Lock Phase: Acquire write locks on all target shards (sorted order prevents deadlock)
- Execute Phase: Execute writes on each shard while holding all locks
- Release Phase: Release all locks
| Scenario | Behavior | Data State |
|---|---|---|
| Single-shard operation | True atomicity (all-or-nothing) | Consistent |
| Multi-shard success | All shards execute with isolation | Consistent, globally ordered |
| Lock acquisition timeout | Client receives -TIMEOUT | No changes |
| Failure during execution | Error returned; partial writes persist | Partial state (no rollback) |
| Coordinator crash mid-op | Locks timeout and release | Partial state possible |
Key Insight: VLL provides execution atomicity (no interleaving) but not failure atomicity (no rollback). This matches Redis and DragonflyDB behavior.
Why VLL vs Traditional Approaches
Section titled “Why VLL vs Traditional Approaches”Traditional approaches and their drawbacks:
- Global Mutexes: Contention at scale, not suitable for shared-nothing
- 2PC with separate coordinator: Expensive overhead, blocking protocol
- Optimistic CC: High abort rate under contention
VLL advantages:
- Ordered locking with deadlock prevention (no 2PC overhead)
- Minimal lock contention (each shard is single-threaded internally)
- Simple failure model (no complex rollback logic)
- Deterministic ordering via txid counter
VLL Queue Configuration
Section titled “VLL Queue Configuration”| Setting | Default | Description |
|---|---|---|
scatter-gather-timeout-ms | 5000 | Timeout for entire multi-shard operation (VLL + scatter-gather) |
vll-max-queue-depth | 10000 | Max pending operations per shard before rejecting new ones |
Queue Overflow: When vll-max-queue-depth is reached, new operations receive -ERR shard queue full, try again later. Metric: frogdb_vll_queue_rejections_total.
Why a Single Timeout? VLL is guaranteed deadlock-free by design (ordered transaction IDs). A separate “VLL queue timeout” would add complexity without benefit. A slow shard is a node health issue, not a locking issue.
Continuation Locks (Lua/MULTI)
Section titled “Continuation Locks (Lua/MULTI)”For operations that need full shard access (Lua scripts, MULTI/EXEC with unknown key access patterns), VLL supports continuation locks that provide exclusive access to an entire shard:
ShardMessage::VllContinuationLock { txid: u64, ready_tx: oneshot::Sender<ShardReadyResult>, execute_rx: oneshot::Receiver<ExecuteSignal>,}This blocks all other operations on the shard until the continuation completes, but allows the script/transaction to access any key without declaring intents upfront.