Storage Layer
Design of the in-memory storage architecture, key metadata, and memory management.
Terminology Note
Section titled “Terminology Note”Throughout this document, “shard” refers to internal shards (thread-local partitions within a single FrogDB node). Each shard has its own HashMap, memory budget, and expiry index. Keys are assigned via: internal_shard = hash(key) % num_shards.
This is distinct from hash slots used in cluster mode (0-16383) for distributing data across nodes. See glossary.md for full definitions.
Store Trait
Section titled “Store Trait”Each shard owns a Store implementation for key-value storage. The Store trait provides methods for get, set, delete, contains, key_type, len, memory_used, and scan operations.
Ownership Semantics
Section titled “Ownership Semantics”Store::get() returns Option<Value> (owned), not Option<&Value> (borrowed).
Rationale:
- Async compatibility: Returning references would require complex lifetime annotations with async command execution. Owned values are simpler to work with across
.awaitpoints. - Simplicity over optimization: The slight overhead of cloning values is acceptable. Optimization can be added later if profiling shows it’s a bottleneck.
- Command execution pattern: Most commands need to inspect values, compute responses, and potentially modify them. Owned values fit this pattern naturally.
Future optimization path: Add a get_ref() method returning Option<&Value> for read-only operations where the caller can guarantee no .await across the borrow.
Value Types
Section titled “Value Types”FrogDB stores values as variants of a Value enum, with a corresponding KeyType enum for type identification (used by the TYPE command and WRONGTYPE errors).
Supported Data Types
Section titled “Supported Data Types”| Type | Description | Implementation Notes |
|---|---|---|
| String | Binary-safe string with integer encoding | Bytes |
| List | Ordered collection | VecDeque<Bytes> |
| Set | Unordered unique members | HashSet<Bytes> |
| Hash | Field-value map | HashMap<Bytes, Bytes> |
| Sorted Set | Scored members with rank queries | HashMap + BTreeMap/SkipList (configurable) |
| Stream | Append-only log with consumer groups | Radix tree + listpack |
| Bitmap | Bit operations on strings | Operations on String value |
| Geospatial | Coordinate-indexed members | Sorted Set + geohash encoding |
| JSON | JSON document with JSONPath queries | serde_json::Value |
| HyperLogLog | Probabilistic cardinality estimation | 12KB fixed structure |
| Bloom Filter | Probabilistic membership test | Bit array + hash functions |
| Cuckoo Filter | Probabilistic membership with deletion | Bucket-based filter |
| Count-Min Sketch | Probabilistic frequency estimation | Matrix of counters |
| Top-K | Approximate heavy hitters | HeavyKeeper algorithm |
| T-Digest | Approximate quantile estimation | Centroid-based digest |
| Time Series | Timestamped samples with aggregation | Sorted by timestamp |
| Vector Set | Similarity search vectors | HNSW index via usearch |
Default Implementation
Section titled “Default Implementation”The default store uses a HashMap with per-key metadata, tracking each key’s value alongside its KeyMetadata (expiry, LRU timestamp, LFU counter, memory size).
Key Metadata
Section titled “Key Metadata”Each key tracks metadata for expiry and eviction:
- expires_at: Expiration time (None = no expiry)
- last_access: Last access time for LRU eviction. NOT persisted — reset to recovery time on startup.
- lfu_counter: Access frequency counter for LFU eviction. Uses logarithmic counter (Redis-compatible). Persisted with value.
- memory_size: Approximate memory size of this entry.
Persistence Note: The last_access field is NOT persisted to disk. After recovery, all keys have fresh access timestamps (idle time = 0). This matches Redis behavior and eviction accuracy self-corrects within minutes. See persistence.md for details.
Clock Sources
Section titled “Clock Sources”| Use Case | Clock Type | Implementation | Rationale |
|---|---|---|---|
| TTL checking | Monotonic | std::time::Instant | Immune to clock adjustments |
| last_access | Monotonic | std::time::Instant | LRU needs relative time only |
| Persistence format | Wall clock | Unix timestamp ms | Portable across restarts |
| Latency metrics | Monotonic | std::time::Instant | Accurate measurement |
| EXPIREAT command | Wall clock | Unix timestamp | User specifies absolute time |
Conversion on Recovery: Persisted Unix timestamps are converted to monotonic Instant values by computing the remaining duration. Already-expired keys are discarded during recovery.
LFU Counter
Section titled “LFU Counter”The lfu_counter uses a probabilistic logarithmic counter (Redis-compatible):
fn lfu_log_incr(counter: u8) -> u8 { if counter == 255 { return 255; } let r: f64 = random(); let base_probability = 1.0 / (counter as f64 * LFU_LOG_FACTOR + 1.0); if r < base_probability { counter + 1 } else { counter }}- New keys start at 5 (not immediately evicted)
- Higher counter = lower increment probability
lfu-log-factordefault: 10
Decay: Applied lazily on access. lfu-decay-time default: 1 (decay by 1 per minute of inactivity).
Memory Accounting
Section titled “Memory Accounting”Memory is tracked per-key and aggregated per-shard. Each entry’s size includes key length, value memory, metadata overhead, and HashMap entry overhead.
HashMap Implementation Choice
Section titled “HashMap Implementation Choice”FrogDB uses griddle::HashMap instead of std::collections::HashMap to avoid memory and latency spikes during resizing.
The Resize Spike Problem: Standard hash tables must resize when the load factor exceeds a threshold, requiring ~2x memory during resize and causing latency spikes of 30-40ms on a single insert.
| Aspect | std::HashMap | griddle::HashMap |
|---|---|---|
| Max insert latency | ~38ms | ~1.8ms |
| Mean insert latency | ~94ns | ~126ns |
| Memory during resize | 2x (allocated upfront) | 2x (but amortized) |
| API compatibility | - | Drop-in replacement |
Griddle spreads resize work across inserts, ensuring no single operation pays the full cost. The trade-off is slightly slower reads during active resizing (must check both old and new tables).
Total Memory (Including COW Buffers)
Section titled “Total Memory (Including COW Buffers)”FrogDB uses forkless snapshots with explicit Copy-on-Write (COW) buffering. Unlike Redis’s fork-based approach, COW buffer memory is explicitly tracked and included in maxmemory enforcement. Total memory usage includes both the store (keys + values) and the snapshot COW buffer (old values pending serialization).
| Aspect | Redis | FrogDB |
|---|---|---|
| Snapshot method | Fork-based COW | Forkless epoch-based COW |
| COW memory tracking | OS-level (not explicit) | Explicit in total_memory_used() |
| maxmemory includes COW | No (operators provision headroom) | Yes (prevents OOM) |
| Memory predictability | Can spike to 2x | Bounded by cow-buffer-max-bytes |
Expiry Index
Section titled “Expiry Index”Each shard maintains an expiry index for efficient TTL handling using a BTreeMap<(Instant, Bytes), ()> for O(log n) insertion/removal with lower memory overhead than a HashSet per expiry time.
Expiry Strategy
Section titled “Expiry Strategy”FrogDB uses hybrid expiry (lazy + active):
- Lazy expiry: Check TTL on every read; delete if expired
- Active expiry: Background task samples keys periodically
Expiry Atomicity
Section titled “Expiry Atomicity”Key expiration is checked at command entry, before execution begins. Once command execution begins, the key will not be expired mid-operation. Background scanner skips keys with active operations (shard-level coordination).
Hash Algorithms
Section titled “Hash Algorithms”FrogDB uses two different hash algorithms for different purposes:
| Purpose | Algorithm | Range | Description |
|---|---|---|---|
| Cluster slot | CRC16 | 0-16383 | Redis-compatible, determines which node owns a key |
| Internal shard | xxhash64 | 0 to num_shards | Determines which thread within a node processes a key |
Both algorithms use the same extract_hash_tag function. Keys with the same hash tag will be colocated on both the same cluster slot AND the same internal shard.
CROSSSLOT Validation
Section titled “CROSSSLOT Validation”Multi-key operations must have all keys in the same hash slot. FrogDB validates this before execution. If any key’s slot differs from the first key’s slot, a CROSSSLOT error is returned before execution begins.
Persistence Integration
Section titled “Persistence Integration”The store provides hooks for persistence via RocksDB. A PersistentStore trait extends Store with methods for persisting writes and loading data on startup.
See persistence.md for RocksDB integration, WAL, and snapshots.