Documentation Index Fetch the complete documentation index at: https://mintlify.com/nubskr/walrus/llms.txt
Use this file to discover all available pages before exploring further.
Topics in Walrus are logical streams that are automatically partitioned into segments for load distribution across the cluster. This segment-based architecture enables automatic load balancing and fault tolerance.
Topic Organization
What is a Topic?
A topic is a named stream of messages (similar to Kafka topics or Kinesis streams). Clients append entries to topics and read them back in order.
// Creating a topic (implicit on first write)
> REGISTER logs
// Writing to a topic
> PUT logs "application started"
> PUT logs "request processed"
// Reading from a topic
> GET logs
OK application started
> GET logs
OK request processed
Topic State
Each topic maintains metadata that is replicated across all nodes via Raft:
TopicState {
current_segment : u64 , // Active segment for new writes
leader_node : NodeId , // Node handling writes to current segment
sealed_segments : HashMap < SegmentId , EntryCount >, // Immutable historical segments
segment_leaders : HashMap < SegmentId , NodeId > // Which node owns each segment
}
Example:
{
"current_segment" : 3 ,
"leader_node" : 2 ,
"sealed_segments" : {
"1" : 1000000 ,
"2" : 950000
},
"segment_leaders" : {
"1" : 3 ,
"2" : 1 ,
"3" : 2
}
}
This shows topic “logs” with:
Segment 1: sealed with 1,000,000 entries, originally led by node 3
Segment 2: sealed with 950,000 entries, originally led by node 1
Segment 3: currently active, led by node 2
Segments
What is a Segment?
A segment is a contiguous sequence of entries within a topic. Each segment:
Has a unique ID (1, 2, 3, …)
Is owned by a single leader node that handles writes
Contains up to WALRUS_MAX_SEGMENT_ENTRIES entries (default: 1,000,000)
Becomes sealed when full, with leadership rotating to the next node
Segment Lifecycle
┌─────────────────────────────────────────────────────────────┐
│ Segment 1 Lifecycle (topic: "logs") │
├─────────────────────────────────────────────────────────────┤
│ │
│ [1] CREATED │
│ └─► Topic "logs" created via Raft │
│ leader_node: 1 │
│ current_segment: 1 │
│ segment_leaders: {1 → 1} │
│ │
│ [2] ACTIVE (receiving writes) │
│ └─► Node 1 holds lease for "logs:1" │
│ Entries appended: 0 → 500,000 → 1,000,000 │
│ Clients write to node 1 │
│ Reads can be served from node 1 │
│ │
│ [3] ROLLOVER TRIGGERED │
│ └─► Monitor detects: 1,000,000 >= threshold │
│ Propose RolloverTopic via Raft │
│ Select next leader: node 2 (round-robin) │
│ │
│ [4] SEALED (immutable) │
│ └─► Metadata updated: │
│ sealed_segments: {1 → 1,000,000} │
│ current_segment: 2 (new active segment) │
│ leader_node: 2 (for new writes) │
│ segment_leaders: {1 → 1, 2 → 2} │
│ │
│ Node 1: Loses write lease for "logs:1" │
│ But retains read access! │
│ Node 2: Gains write lease for "logs:2" │
│ │
│ [5] HISTORICAL READS │
│ └─► Sealed segment data remains on node 1 │
│ Readers with cursor in segment 1 → routed to node 1│
│ No data migration needed │
│ │
└─────────────────────────────────────────────────────────────┘
Segment Naming (WAL Keys)
Internally, segments are identified by WAL keys in the format <topic>:<segment_id>:
// Topic "logs", segment 1
wal_key = "logs:1"
// Topic "metrics", segment 5
wal_key = "metrics:5"
These keys are used for:
Lease management : leases.contains("logs:1")
Storage operations : walrus.batch_append_for_topic("logs:1", data)
Offset tracking : offsets["logs:1"] = 1_000_050
Rollover Mechanics
Triggering Rollover
The monitor loop on each node checks its owned segments every 10 seconds:
// Pseudocode from monitor.rs
fn check_rollovers () {
for ( topic , segment_id ) in metadata . owned_topics ( self . node_id) {
let wal_key = format! ( "{}:{}" , topic , segment_id );
let entry_count = controller . tracked_entry_count ( & wal_key );
if entry_count >= MAX_SEGMENT_ENTRIES {
// Select next leader (round-robin)
let voters = raft . voters ();
let my_index = voters . iter () . position ( |& n | n == self . node_id);
let next_index = ( my_index + 1 ) % voters . len ();
let next_leader = voters [ next_index ];
// Propose rollover via Raft
propose_metadata ( RolloverTopic {
name : topic ,
new_leader : next_leader ,
sealed_segment_entry_count : entry_count
});
}
}
}
Rollover Process
Step 1: Monitor Detection
Node 2 monitors "logs:1"
Entry count: 1,000,050 >= 1,000,000
→ TRIGGER ROLLOVER
Step 2: Select Next Leader
Raft voters: [1, 2, 3]
Current leader: node 2 (index 1)
Next leader: (1 + 1) % 3 = 2 → voters[2] = node 3
Step 3: Raft Proposal
propose_metadata(RolloverTopic {
name: "logs",
new_leader: 3,
sealed_segment_entry_count: 1_000_050
})
Step 4: Metadata Update (applied to all nodes)
// Before rollover
{
current_segment : 1 ,
leader_node : 2 ,
sealed_segments : {},
segment_leaders : { 1 → 2 }
}
// After rollover
{
current_segment : 2 , // New active segment
leader_node : 3 , // New leader for writes
sealed_segments : {
1 → 1_000_050 // Sealed with final count
},
segment_leaders : {
1 → 2 , // Original leader preserved for reads
2 → 3 // New segment leader
}
}
Step 5: Lease Transfer (within 100ms)
Node 2 (old leader):
└─► update_leases() removes "logs:1" from lease set
Writes to "logs:1" → NotLeaderError
Reads from "logs:1" → Still allowed (sealed data)
Node 3 (new leader):
└─► update_leases() grants "logs:2" lease
Writes to "logs:2" → Accepted
Starts accepting new entries
Rollover Timing
Why round-robin leadership?
Round-robin selection ensures even load distribution across the cluster. Each node takes turns leading segments, preventing any single node from becoming a bottleneck. Example with 3 nodes:
Segment 1: Node 1
Segment 2: Node 2
Segment 3: Node 3
Segment 4: Node 1 (cycle repeats)
What happens to in-flight writes during rollover?
Writes that arrive during the 100ms lease sync window:
To old segment : Rejected with NotLeaderError after lease removed
To new segment : Accepted once new leader gains lease
Clients should retry on NotLeaderError (automatic in most clients)
Load Distribution
Automatic Balancing
Segment-based leadership rotation provides automatic load balancing:
Timeline: Topic "logs" with 3 nodes
T0: Create topic
└─► Segment 1 → Node 1 (initial leader via hash)
T1: Segment 1 fills (1M entries)
└─► Rollover: Segment 2 → Node 2
T2: Segment 2 fills (1M entries)
└─► Rollover: Segment 3 → Node 3
T3: Segment 3 fills (1M entries)
└─► Rollover: Segment 4 → Node 1 (cycle repeats)
Write Load Distribution:
Node 1: Segments 1, 4, 7, 10, ... (~33%)
Node 2: Segments 2, 5, 8, 11, ... (~33%)
Node 3: Segments 3, 6, 9, 12, ... (~33%)
Read Distribution
Reads are distributed based on cursor position:
Topic "logs" with 3 segments:
Segment 1: 1,000,000 entries (sealed, node 1)
Segment 2: 950,000 entries (sealed, node 2)
Segment 3: 500,000 entries (active, node 3)
Reader with cursor at segment 1, offset 5000:
└─► Routes to node 1 (original leader of segment 1)
Reader with cursor at segment 2, offset 100:
└─► Routes to node 2 (original leader of segment 2)
Reader with cursor at segment 3, offset 0:
└─► Routes to node 3 (current leader)
Sealed segments remain on their original leader node for reads. This eliminates the need for data migration during rollover, improving performance and simplifying the architecture.
Read Cursors and Segment Advancement
Cursor Structure
ReadCursor {
segment : u64 , // Current segment being read
delivered_in_segment : u64 // Entries consumed in this segment
}
Cursor Advancement
When a reader consumes all entries in a sealed segment, the cursor automatically advances:
// Initial cursor
cursor = { segment : 1 , delivered : 0 }
// After reading 1,000,000 entries from segment 1
cursor = { segment : 1 , delivered : 1_000_000 }
// Next read() call:
if cursor . delivered >= sealed_segments [ 1 ] {
// Segment 1 exhausted, advance to segment 2
cursor . segment = 2 ;
cursor . delivered = 0 ;
}
// Now reading from segment 2
cursor = { segment : 2 , delivered : 0 }
Read Flow Across Segments
┌────────────────────────────────────────────────────────┐
│ Reader consuming topic "logs" │
├────────────────────────────────────────────────────────┤
│ │
│ Cursor: {segment: 1, delivered: 999_995} │
│ │ │
│ ├─► read() → Entry 999,995 (from node 1) │
│ ├─► read() → Entry 999,996 (from node 1) │
│ ├─► read() → Entry 999,997 (from node 1) │
│ ├─► read() → Entry 999,998 (from node 1) │
│ ├─► read() → Entry 999,999 (from node 1) │
│ ├─► read() → Entry 1,000,000 (from node 1) │
│ │ │
│ ├─► Check: 1,000,000 >= sealed_count(1)? │
│ │ YES → Segment 1 exhausted │
│ │ │
│ ├─► Advance cursor: │
│ │ segment = 2 │
│ │ delivered = 0 │
│ │ │
│ ├─► read() → Entry 1 (from node 2, segment 2) │
│ ├─► read() → Entry 2 (from node 2, segment 2) │
│ └─► ... │
│ │
└────────────────────────────────────────────────────────┘
Configuration
Segment Size
Control when segments roll over:
# Default: 1,000,000 entries per segment
export WALRUS_MAX_SEGMENT_ENTRIES = 1000000
# Smaller segments (more frequent rollovers)
export WALRUS_MAX_SEGMENT_ENTRIES = 500000
# Larger segments (less frequent rollovers)
export WALRUS_MAX_SEGMENT_ENTRIES = 5000000
Monitor Interval
Control how often rollovers are checked:
# Default: 10 seconds
export WALRUS_MONITOR_CHECK_MS = 10000
# More frequent checks (faster rollover response)
export WALRUS_MONITOR_CHECK_MS = 5000
# Less frequent checks (lower CPU overhead)
export WALRUS_MONITOR_CHECK_MS = 30000
Best Practices
Segment Sizing Choose segment size based on your workload:
High throughput : Larger segments (reduce rollover overhead)
Even load distribution : Smaller segments (more frequent rotation)
Default 1M entries ≈ 100MB (depending on payload size)
Monitor Interval Balance responsiveness vs overhead:
10s default : Good balance for most workloads
5s : Faster rollover (slightly higher CPU)
30s : Lower overhead (acceptable for large segments)
Topic Creation Topics are created automatically on first write:
Initial leader selected via consistent hash
Explicit creation: REGISTER <topic>
All nodes can create topics (routed via Raft)
Read Patterns Optimize for your read patterns:
Sequential reads : Cursors automatically advance across segments
Historical reads : Access sealed segments from original leader
No replication : Sealed data stays on original node
Architecture Overview Understand the overall system architecture and components
Raft Consensus Learn how Raft coordinates segment metadata