Documentation Index
Fetch the complete documentation index at: https://mintlify.com/nubskr/walrus/llms.txt
Use this file to discover all available pages before exploring further.
Overview
TheStorage component provides a thin, thread-safe wrapper around the Walrus storage engine. Its primary responsibility is enforcing write fencing through lease management—ensuring that only the designated leader for a segment can write to it. This prevents split-brain scenarios and maintains consistency across leadership changes.
Location: distributed-walrus/src/bucket.rs
Architecture
Structure
Location:bucket.rs:15
Fields
| Field | Type | Purpose |
|---|---|---|
engine | Arc<Walrus> | Underlying Walrus storage engine for durable I/O |
active_leases | RwLock<HashSet<String>> | Set of WAL keys this node currently has permission to write |
write_locks | RwLock<HashMap<String, Arc<Mutex<()>>>> | Per-key mutexes to serialize concurrent writers |
Constants
Initialization
new
Creates a new storage instance rooted at the specified directory.
Location: bucket.rs:23
- Creates parent directories if needed
- Creates storage directory
- Optionally disables io_uring if
WALRUS_DISABLE_IO_URINGenv var is set - Sets
WALRUS_DATA_DIRenvironment variable - Initializes Walrus engine with
DATA_NAMESPACE - Returns new
Storageinstance with empty leases and locks
WALRUS_DISABLE_IO_URING- Set to use mmap backend instead of io_uring (useful in containers)WALRUS_DATA_DIR- Automatically set to storage_path for Walrus engine
io_uring support may be unavailable in containerized environments. Set
WALRUS_DISABLE_IO_URING=1 to fall back to mmap-based I/O.Write Operations
append_by_key
Appends data to a specific WAL key with lease verification and write locking.
Location: bucket.rs:44
- Acquires
BucketGuardwhich:- Verifies this node holds a lease for the WAL key via
ensure_lease - Acquires per-key write mutex to serialize concurrent appends
- Verifies this node holds a lease for the WAL key via
- Spawns blocking task to perform
engine.batch_append_for_topic - Returns success or error
- Lease Fencing: Only succeeds if node holds active lease
- Serialization: Multiple concurrent appends to same key are ordered
- Durability: Walrus engine handles fsync and crash consistency
- Returns
NotLeaderForPartitionif lease check fails - Returns Walrus error if underlying append fails
Write Fencing with BucketGuard
TheBucketGuard RAII type enforces the write fencing protocol.
Location: bucket.rs:93
BucketGuard::lock
Location: bucket.rs:99
- Lease Check: Calls
ensure_lease()to verify write permission - Lock Acquisition: Gets per-key mutex via
lock_for_key() - Returns: Guard that holds both lease verification and write lock
ensure_lease (Internal)
Verifies that the current node holds a lease for a WAL key.
Location: bucket.rs:111
- Acquires read lock on
active_leases - Checks if
wal_keyis in the set - Returns
Ok(())if present - Returns error with current lease set logged if absent
Read Operations
read_one
Reads the next available entry from a WAL key.
Location: bucket.rs:53
- Spawns blocking task to call
engine.read_next() - Returns
Some(data)if entry available - Returns
Noneif no entry available - Returns error on I/O failure
- No Lease Check: Reads can be served by any node (useful for historical segments)
- Non-blocking: Uses async task spawning to avoid blocking executor
- Cursor Management: Walrus maintains internal read cursors per topic
- Reading from current segment (leader serves)
- Reading from sealed segments (any node can serve)
- Consumer group implementations
Lease Management
update_leases
Updates the set of active leases, adding new ones and removing stale ones.
Location: bucket.rs:60
- Fast Path: Read lock check if leases already match (common case)
- Slow Path: If mismatch detected:
- Acquires write lock on
active_leases - Removes keys not in
expectedset - Adds keys from
expectedset - Releases write lock
- Acquires write lock on
NodeController::update_leases()every 100ms- After metadata changes
- During retry operations
lock_for_key (Internal)
Retrieves or creates a per-key write mutex.
Location: bucket.rs:76
- Fast Path: Read lock check for existing mutex
- Slow Path: If not found:
- Acquires write lock on
write_locks - Creates new mutex if still absent (double-check pattern)
- Returns cloned Arc to mutex
- Acquires write lock on
Additional Operations
get_topic_size_blocking
Returns the size in bytes of a specific WAL key’s data.
Location: bucket.rs:88
- Blocking: Runs synchronously (use from blocking context)
- No I/O: Fast metadata-only operation
- Use Cases: Monitoring, debugging, capacity planning
This method is currently unused but available for future monitoring implementations.
Write Fencing Guarantees
The lease-based fencing protocol prevents data corruption during leadership changes:Scenario: Leadership Transfer
Key Properties
Mutual Exclusion
Only one node can hold a lease for a key at a time (enforced by metadata)
Fail-Fast
Writes fail immediately if lease is not held, preventing corruption
Lease Revocation
Stale leaders lose write permission within one lease update cycle (100ms)
No Split-Brain
Even with network partitions, only metadata-acknowledged leader can write
Concurrency Control
Per-Key Mutexes
Thewrite_locks map provides fine-grained concurrency:
Benefits:
- Parallelism: Writes to different keys don’t block each other
- Ordering: Writes to same key are serialized in arrival order
- No Deadlocks: Single mutex per operation (no lock ordering issues)
Lock Acquisition Order
All write operations follow the same lock order to prevent deadlocks:- Read Lock on
active_leases(lease check) - Read Lock on
write_locks(mutex lookup) - Mutex for specific WAL key (write serialization)
Performance Characteristics
Append Latency
Dominated by Walrus I/O:- Fast Path: ~100-200μs (SSD with io_uring)
- Lease Check: ~1-5μs (in-memory hash set lookup)
- Lock Contention: Minimal unless concurrent writes to same key
Read Latency
- Cache Hit: ~50-100μs
- Cache Miss: ~200-500μs (depends on storage)
- No Lease Overhead: Reads don’t check leases
Lease Update
- Fast Path (no changes): ~1-10μs (read lock only)
- Slow Path (changes): ~10-50μs (write lock + hash set ops)
Memory Usage
- Per-Key Overhead: ~80 bytes (String + Arc<Mutex<()>>)
- Lease Set: ~50 bytes per active lease
- Typical: 1-10 KB for small clusters
Error Handling
NotLeaderForPartition
NotLeaderForPartition
Cause: Attempted write without holding a lease for the WAL key.Resolution: Controller should re-check metadata and forward to correct leader.Log Example:
write rejected for t_events_s_1 (leases: {:?})Walrus I/O Errors
Walrus I/O Errors
Causes: Disk full, file corruption, permission issuesPropagation: Returned directly from
append_by_key or read_oneRecovery: Depends on error type (may require operator intervention)Lock Poisoning
Lock Poisoning
Cause: Panic while holding a lock (rare)Handling: RwLock poisoning is caught and converted to
None in controllerImpact: Affects single operation, not entire systemIntegration Points
With NodeController
The controller manages the lease lifecycle:With Walrus Engine
Storage delegates I/O to Walrus:
- Namespacing: All keys scoped to
DATA_NAMESPACE - Batching: Uses
batch_append_for_topicfor efficiency - Cursors: Walrus maintains read cursors automatically
- Durability: Walrus handles fsync and crash recovery
Monitoring
Recommended Metrics
Number of WAL keys this node currently holds leases for. Should match metadata’s
owned_topics count.Time spent in
update_leases(). Spikes indicate contention or large lease set changes.End-to-end append latency including lease check and Walrus I/O.
Number of write attempts rejected due to missing lease. High values indicate routing issues.
Debug Logging
Enable withRUST_LOG=walrus::bucket=debug:
Best Practices
Lease Update Frequency
Lease Update Frequency
The default 100ms interval balances responsiveness and overhead. Increase if CPU-bound, decrease if faster failover needed.
Environment Variables
Environment Variables
Always set
WALRUS_DISABLE_IO_URING=1 in Docker/Kubernetes. io_uring often unavailable in containers.Storage Directory
Storage Directory
Use dedicated volume with sufficient IOPS. Walrus performance directly impacts throughput.
Lease Debugging
Lease Debugging
Log lease rejections at WARN level. They indicate metadata/lease desync (usually transient during leadership changes).
Testing Hooks
TheTestControl RPC provides lease manipulation for testing:
Related Components
- NodeController - Determines lease assignments based on metadata
- Metadata - Source of truth for topic ownership