METRICS Command

The METRICS command returns Raft consensus metrics for the node handling the request. Use this to monitor cluster health, leader election, and replication status.

Syntax

METRICS

No parameters required.

Wire Format

Request:

[4 bytes: 7] METRICS

Success Response:

[4 bytes: length] OK <json_payload>

Note: The response is OK followed by a space and the JSON payload (not just the JSON). Error Response:

[4 bytes: length] ERR <error message>

Response Format

The JSON response contains Raft metrics from the Octopii consensus engine:

{
  "node_id": 1,
  "state": "Leader",
  "current_term": 5,
  "commit_index": 142,
  "last_applied": 142,
  "leader_id": 1,
  "voted_for": null,
  "log_length": 143,
  "cluster_size": 3,
  "peers": [
    {
      "node_id": 2,
      "match_index": 142,
      "next_index": 143
    },
    {
      "node_id": 3,
      "match_index": 142,
      "next_index": 143
    }
  ]
}

Response Fields

node_id

integer

The ID of the node that generated these metrics

state

string

Current Raft state: Leader, Follower, or Candidate

current_term

integer

Current election term number (increases with each leader election)

commit_index

integer

Index of the highest log entry known to be committed (replicated to quorum)

last_applied

integer

Index of the highest log entry applied to the metadata state machine

leader_id

integer

Node ID of the current Raft leader, or null if unknown

voted_for

integer

Node ID this node voted for in the current term, or null if no vote cast

log_length

integer

Total number of entries in the Raft log

cluster_size

integer

Number of nodes in the Raft cluster

peers

array

Replication status for peer nodes (only present on leader)

node_id

integer

Peer node ID

match_index

integer

Highest log entry known to be replicated on this peer

next_index

integer

Index of the next log entry to send to this peer

Examples

Interactive Shell

🦭 > METRICS
{
  "node_id": 1,
  "state": "Leader",
  "current_term": 5,
  "commit_index": 142,
  "last_applied": 142,
  "leader_id": 1,
  "voted_for": null,
  "log_length": 143,
  "cluster_size": 3,
  "peers": [
    {"node_id": 2, "match_index": 142, "next_index": 143},
    {"node_id": 3, "match_index": 142, "next_index": 143}
  ]
}

One-off Command

# Get metrics
cargo run --bin walrus-cli -- metrics

# Pretty-print with jq
cargo run --bin walrus-cli -- metrics | jq .

# Extract specific fields
cargo run --bin walrus-cli -- metrics | jq '.state'
cargo run --bin walrus-cli -- metrics | jq '.leader_id'

Programmatic Usage (Rust)

use distributed_walrus::cli_client::CliClient;
use serde_json::Value;

#[tokio::main]
async fn main() -> Result<()> {
    let client = CliClient::new("127.0.0.1:9091");
    
    // Get metrics as JSON string
    let metrics_json = client.metrics().await?;
    
    // Parse JSON
    let metrics: Value = serde_json::from_str(&metrics_json)?;
    println!("Node state: {}", metrics["state"]);
    println!("Leader ID: {}", metrics["leader_id"]);
    println!("Commit index: {}", metrics["commit_index"]);
    
    Ok(())
}

Use Cases

Check Cluster Health

# Verify all nodes see the same leader
for port in 9091 9092 9093; do
    echo "Node on port $port:"
    cargo run --bin walrus-cli -- --addr 127.0.0.1:$port metrics | jq '{node_id, state, leader_id}'
done

Monitor Replication Lag

# Check if followers are caught up (match_index should equal commit_index)
cargo run --bin walrus-cli -- metrics | jq '.peers[] | "Node \(.node_id): match=\(.match_index) next=\(.next_index)"'

Detect Leader Elections

# Watch for term changes (indicates elections)
watch -n 1 'cargo run --bin walrus-cli -- metrics | jq .current_term'

Verify Quorum

# Ensure cluster size is correct (should be 3+ for fault tolerance)
cargo run --bin walrus-cli -- metrics | jq .cluster_size

Understanding Raft Metrics

Node States

Leader

One leader per cluster at a time
Handles all metadata writes (topic creation, rollover)
Replicates log entries to followers
Has peers array with replication status

Follower

Majority of nodes are followers
Replicate log entries from leader
Can become candidate if leader fails
No peers array in metrics

Candidate

Temporary state during leader election
Node is requesting votes from peers
Quickly transitions to leader or follower
Rare to observe (election is fast)

Indexes Explained

commit_index

Highest entry replicated to a quorum
Safe to apply to state machine
Increases as leader replicates entries

last_applied

Highest entry actually applied to metadata
Should match or be slightly behind commit_index
Gap indicates apply loop is processing

match_index (per peer)

What the leader knows about each follower
Used to determine commit_index (quorum)
Lag indicates slow or disconnected follower

next_index (per peer)

Next entry to send to follower
Usually match_index + 1
Rolls back on AppendEntries rejection

Cluster Health Indicators

Healthy Cluster

{
  "state": "Leader",
  "commit_index": 100,
  "last_applied": 100,
  "peers": [
    {"node_id": 2, "match_index": 100, "next_index": 101},
    {"node_id": 3, "match_index": 100, "next_index": 101}
  ]
}

All peers caught up (match_index == commit_index)
last_applied == commit_index
Clear leader elected

Replication Lag

{
  "state": "Leader",
  "commit_index": 100,
  "peers": [
    {"node_id": 2, "match_index": 100, "next_index": 101},
    {"node_id": 3, "match_index": 85, "next_index": 86}
  ]
}

Node 3 is lagging (match_index 85 vs commit_index 100)
May indicate network issues or slow node
Leader will keep retrying replication

No Leader

{
  "state": "Follower",
  "leader_id": null,
  "current_term": 5
}

Cluster is in election
No writes possible until leader elected
Check for network partitions

Split Brain (Should Not Happen)

# Node 1 thinks it's leader
{"state": "Leader", "leader_id": 1}

# Node 2 also thinks it's leader (term should prevent this)
{"state": "Leader", "leader_id": 2}

Raft prevents this with term numbers
If observed, indicates a serious bug

Monitoring and Alerting

Critical Alerts

No leader for > 30 seconds
Cluster size mismatch across nodes
Replication lag > 1000 entries
Frequent term changes (election storm)

Warning Alerts

last_applied behind commit_index by > 100
Peer match_index lagging by > 500 entries
State is Candidate for > 5 seconds

Dashboards

Key metrics to graph:

current_term (leader elections)
commit_index (write throughput)
match_index per peer (replication health)
State transitions (Leader/Follower/Candidate)

Metadata vs. Data

Important: METRICS shows metadata consensus only:

Topic registrations
Segment rollovers
Leader assignments
Node membership

It does not show:

Data write throughput
Entry counts per topic (use STATE)
Storage usage
Client connection counts

Raft is only used for metadata coordination, not data replication.

STATE - View topic-specific metadata and entry counts
REGISTER - Operations that go through Raft consensus

Library API

CLI Commands

Cluster API

METRICS Command

Syntax

Wire Format

Response Format

Response Fields

Examples

Interactive Shell

One-off Command

Programmatic Usage (Rust)

Use Cases

Check Cluster Health

Monitor Replication Lag

Detect Leader Elections

Verify Quorum

Understanding Raft Metrics

Node States

Indexes Explained

Cluster Health Indicators

Healthy Cluster

Replication Lag

No Leader

Split Brain (Should Not Happen)

Monitoring and Alerting

Critical Alerts

Warning Alerts

Dashboards

Metadata vs. Data

Library API

CLI Commands

Cluster API

Documentation Index

​Syntax

​Wire Format

​Response Format

​Response Fields

​Examples

​Interactive Shell

​One-off Command

​Programmatic Usage (Rust)

​Use Cases

​Check Cluster Health

​Monitor Replication Lag

​Detect Leader Elections

​Verify Quorum

​Understanding Raft Metrics

​Node States

​Indexes Explained

​Cluster Health Indicators

​Healthy Cluster

​Replication Lag

​No Leader

​Split Brain (Should Not Happen)

​Monitoring and Alerting

​Critical Alerts

​Warning Alerts

​Dashboards

​Metadata vs. Data

​Related Commands

Syntax

Wire Format

Response Format

Response Fields

Examples

Interactive Shell

One-off Command

Programmatic Usage (Rust)

Use Cases

Check Cluster Health

Monitor Replication Lag

Detect Leader Elections

Verify Quorum

Understanding Raft Metrics

Node States

Indexes Explained

Cluster Health Indicators

Healthy Cluster

Replication Lag

No Leader

Split Brain (Should Not Happen)

Monitoring and Alerting

Critical Alerts

Warning Alerts

Dashboards

Metadata vs. Data

Related Commands