KubeBlocks
BlogsEnterprise
⌘K
​
Blogs

Overview
Quickstart
Architecture

Operations

Lifecycle Management
Vertical Scaling
Horizontal Scaling
Volume Expansion
Switchover
Minor Version Upgrade
Manage Services

Backup & Restore

Backup
Restore

Observability

Observability for etcd Clusters
  1. Resource Hierarchy
  2. Containers Inside Each Pod
  3. High Availability via Raft Consensus
    1. Leader Election
  4. Traffic Routing
  5. Automatic Failover

etcd Architecture in KubeBlocks

This page describes how KubeBlocks deploys an etcd cluster on Kubernetes — covering the resource hierarchy, pod internals, Raft-based consensus, and traffic routing.

Application / Client
Client (if enabled)  {cluster}-etcd-client:2379
Default  {cluster}-etcd-headless (pod DNS)
client API → all pods (etcd routes internally)
Kubernetes Services
{cluster}-etcd-client
ClusterIP · :2379 client
all pods (no roleSelector)
disableAutoProvision: true — not created by default
Optional
→ any pod (etcd forwards to leader transparently)
Pods · Worker Nodes
etcd-0LEADER
🔑
etcd
:2379 client + /metrics · :2380 peer
leader
💾 PVC data-0 · 10Gi
etcd-1FOLLOWER
🔑
etcd
:2379 client + /metrics · :2380 peer
💾 PVC data-1 · 10Gi
etcd-2FOLLOWER
🔑
etcd
:2379 client + /metrics · :2380 peer
💾 PVC data-2 · 10Gi
↔Raft ConsensusWAL replicated to followers · quorum acknowledgment required
Leader Node
Follower Node
Persistent Storage

Resource Hierarchy

KubeBlocks models an etcd cluster as a hierarchy of Kubernetes custom resources:

Cluster  →  Component  →  InstanceSet  →  Pod × N
ResourceRole
ClusterUser-facing declaration — specifies the number of etcd members, storage size, and resources
ComponentGenerated automatically; references a ComponentDefinition that describes container specs, lifecycle actions, and services
InstanceSetKubeBlocks custom workload (replaces StatefulSet); manages pods with stable identities and role awareness
PodActual running etcd member; each pod gets a unique ordinal, a stable DNS name, and its own PVC

etcd requires an odd number of members (typically 3 or 5) to maintain a quorum. KubeBlocks enforces this constraint during cluster provisioning and scaling.

Containers Inside Each Pod

Every etcd pod runs one main application container (plus an inject-bash init container that runs inject-bash.sh to make bash available in the container's tools path, required by lifecycle scripts):

ContainerPortPurpose
etcd2379 (client), 2380 (peer)etcd member serving client requests and participating in Raft consensus; exposes Prometheus metrics at /metrics on port 2379; roleProbe runs /scripts/roleprobe.sh inside this container

Each pod mounts its own PVC for the etcd data directory (/var/run/etcd/default.etcd), ensuring member data survives pod restarts.

High Availability via Raft Consensus

etcd achieves HA through the Raft consensus algorithm, which guarantees linearizable reads and writes across a majority of cluster members:

Raft ConceptDescription
LeaderReceives all client write requests; replicates log entries to followers before acknowledging
FollowerReplicates the leader's log; forwards client reads (or returns the leader's address)
CandidateTemporarily assumes this role during a leader election after a heartbeat timeout
QuorumA majority ((N/2) + 1) of members must acknowledge a log entry before it is committed

A cluster of 3 members can tolerate 1 failure; a cluster of 5 members can tolerate 2 failures.

Leader Election

When the leader pod fails or becomes unreachable:

  1. Followers detect the missing heartbeat after the election timeout (default 1000 ms)
  2. One follower increments its term and transitions to Candidate, requesting votes from peers
  3. The Candidate that collects a quorum of votes becomes the new leader
  4. The new leader immediately begins sending heartbeats and resumes write operations

Total election time is typically 1–3 seconds under normal network conditions.

Traffic Routing

The ComponentDefinition declares two named ComponentServices — client and peer — both with disableAutoProvision: true, so neither is created automatically. The controller also provisions the default headless service for every component:

ServiceTypePortNotes
{cluster}-etcd-clientClusterIP2379 (client)All pods (no roleSelector); disableAutoProvision: true — not created by default, must be explicitly enabled
{cluster}-etcd-headlessHeadless2379, 2380All pods; always created by the workload controller as the default headless service

The headless service ({cluster}-etcd-headless) is the always-present endpoint used for peer communication and operator probes. The optional {cluster}-etcd-client ClusterIP service aggregates port 2379 across all pods; it has no roleSelector because etcd members transparently forward writes to the current leader internally, so any member is a valid client entry point. Enable it explicitly when a stable virtual IP is required.

KubeBlocks runs an exec roleProbe (/scripts/roleprobe.sh) inside the etcd container to detect the leader and update the kubeblocks.io/role pod label, which drives KubeBlocks' own update ordering (leader updated last during rolling updates).

Peer-to-peer Raft traffic (port 2380) flows over the headless service, where each member is addressed by its stable pod DNS name:

{pod-name}.{cluster}-etcd-headless.{namespace}.svc.cluster.local:2380

Automatic Failover

When an etcd member fails, the cluster responds without any manual intervention:

  1. Member becomes unreachable — remaining members detect the missing heartbeat
  2. Raft election — if the lost member was the leader, a new election completes in seconds
  3. Writes resume — the new leader processes client requests as long as quorum is maintained
  4. KubeBlocks detects role change — the exec roleProbe script returns the new leader role; pod labels are updated
  5. Pod label updated — the kubeblocks.io/role=leader label is applied to the new leader pod
  6. Member recovery — when the failed pod restarts, it rejoins the cluster and replays missed log entries from the leader

© 2026 KUBEBLOCKS INC