etcd Architecture in KubeBlocks

This page describes how KubeBlocks deploys an etcd cluster on Kubernetes — covering the resource hierarchy, pod internals, Raft-based consensus, and traffic routing.

Application / Client

Client (if enabled) {cluster}-etcd-client:2379
Default {cluster}-etcd-headless (pod DNS)

client API → all pods (etcd routes internally)

Kubernetes Services

{cluster}-etcd-client

ClusterIP · :2379 client
all pods (no roleSelector)
disableAutoProvision: true — not created by default

Optional

→ any pod (etcd forwards to leader transparently)

Pods · Worker Nodes

etcd-0LEADER

🔑

etcd

:2379 client + /metrics · :2380 peer

leader

💾 PVC data-0 · 10Gi

etcd-1FOLLOWER

🔑

etcd

:2379 client + /metrics · :2380 peer

💾 PVC data-1 · 10Gi

etcd-2FOLLOWER

🔑

etcd

:2379 client + /metrics · :2380 peer

💾 PVC data-2 · 10Gi

↔Raft ConsensusWAL replicated to followers · quorum acknowledgment required

Leader Node

Follower Node

Persistent Storage

Resource Hierarchy

KubeBlocks models an etcd cluster as a hierarchy of Kubernetes custom resources:

Cluster  →  Component  →  InstanceSet  →  Pod × N

Resource	Role
Cluster	User-facing declaration — specifies the number of etcd members, storage size, and resources
Component	Generated automatically; references a `ComponentDefinition` that describes container specs, lifecycle actions, and services
InstanceSet	KubeBlocks custom workload (replaces `StatefulSet`); manages pods with stable identities and role awareness
Pod	Actual running etcd member; each pod gets a unique ordinal, a stable DNS name, and its own PVC

etcd requires an odd number of members (typically 3 or 5) to maintain a quorum. KubeBlocks enforces this constraint during cluster provisioning and scaling.

Containers Inside Each Pod

Every etcd pod runs one main application container (plus an inject-bash init container that runs inject-bash.sh to make bash available in the container's tools path, required by lifecycle scripts):

Container	Port	Purpose
`etcd`	2379 (client), 2380 (peer)	etcd member serving client requests and participating in Raft consensus; exposes Prometheus metrics at `/metrics` on port 2379; roleProbe runs `/scripts/roleprobe.sh` inside this container

Each pod mounts its own PVC for the etcd data directory (/var/run/etcd/default.etcd), ensuring member data survives pod restarts.

High Availability via Raft Consensus

etcd achieves HA through the Raft consensus algorithm, which guarantees linearizable reads and writes across a majority of cluster members:

Raft Concept	Description
Leader	Receives all client write requests; replicates log entries to followers before acknowledging
Follower	Replicates the leader's log; forwards client reads (or returns the leader's address)
Candidate	Temporarily assumes this role during a leader election after a heartbeat timeout
Quorum	A majority (`(N/2) + 1`) of members must acknowledge a log entry before it is committed

A cluster of 3 members can tolerate 1 failure; a cluster of 5 members can tolerate 2 failures.

Leader Election

When the leader pod fails or becomes unreachable:

Followers detect the missing heartbeat after the election timeout (default 1000 ms)
One follower increments its term and transitions to Candidate, requesting votes from peers
The Candidate that collects a quorum of votes becomes the new leader
The new leader immediately begins sending heartbeats and resumes write operations

Total election time is typically 1–3 seconds under normal network conditions.

Traffic Routing

The ComponentDefinition declares two named ComponentServices — client and peer — both with disableAutoProvision: true, so neither is created automatically. The controller also provisions the default headless service for every component:

Service	Type	Port	Notes
`{cluster}-etcd-client`	ClusterIP	2379 (client)	All pods (no roleSelector); `disableAutoProvision: true` — not created by default, must be explicitly enabled
`{cluster}-etcd-headless`	Headless	2379, 2380	All pods; always created by the workload controller as the default headless service

The headless service ({cluster}-etcd-headless) is the always-present endpoint used for peer communication and operator probes. The optional {cluster}-etcd-client ClusterIP service aggregates port 2379 across all pods; it has no roleSelector because etcd members transparently forward writes to the current leader internally, so any member is a valid client entry point. Enable it explicitly when a stable virtual IP is required.

KubeBlocks runs an exec roleProbe (/scripts/roleprobe.sh) inside the etcd container to detect the leader and update the kubeblocks.io/role pod label, which drives KubeBlocks' own update ordering (leader updated last during rolling updates).

Peer-to-peer Raft traffic (port 2380) flows over the headless service, where each member is addressed by its stable pod DNS name:

{pod-name}.{cluster}-etcd-headless.{namespace}.svc.cluster.local:2380

Automatic Failover

When an etcd member fails, the cluster responds without any manual intervention:

Member becomes unreachable — remaining members detect the missing heartbeat
Raft election — if the lost member was the leader, a new election completes in seconds
Writes resume — the new leader processes client requests as long as quorum is maintained
KubeBlocks detects role change — the exec roleProbe script returns the new leader role; pod labels are updated
Pod label updated — the kubeblocks.io/role=leader label is applied to the new leader pod
Member recovery — when the failed pod restarts, it rejoins the cluster and replays missed log entries from the leader

etcd Architecture in KubeBlocks

This page describes how KubeBlocks deploys an etcd cluster on Kubernetes — covering the resource hierarchy, pod internals, Raft-based consensus, and traffic routing.

Application / Client

Client (if enabled) {cluster}-etcd-client:2379
Default {cluster}-etcd-headless (pod DNS)

client API → all pods (etcd routes internally)

Kubernetes Services

{cluster}-etcd-client

ClusterIP · :2379 client
all pods (no roleSelector)
disableAutoProvision: true — not created by default

Optional

→ any pod (etcd forwards to leader transparently)

Pods · Worker Nodes

etcd-0LEADER

🔑

etcd

:2379 client + /metrics · :2380 peer

leader

💾 PVC data-0 · 10Gi

etcd-1FOLLOWER

🔑

etcd

:2379 client + /metrics · :2380 peer

💾 PVC data-1 · 10Gi

etcd-2FOLLOWER

🔑

etcd

:2379 client + /metrics · :2380 peer

💾 PVC data-2 · 10Gi

↔Raft ConsensusWAL replicated to followers · quorum acknowledgment required

Leader Node

Follower Node

Persistent Storage

Resource Hierarchy

KubeBlocks models an etcd cluster as a hierarchy of Kubernetes custom resources:

Cluster  →  Component  →  InstanceSet  →  Pod × N

Resource	Role
Cluster	User-facing declaration — specifies the number of etcd members, storage size, and resources
Component	Generated automatically; references a `ComponentDefinition` that describes container specs, lifecycle actions, and services
InstanceSet	KubeBlocks custom workload (replaces `StatefulSet`); manages pods with stable identities and role awareness
Pod	Actual running etcd member; each pod gets a unique ordinal, a stable DNS name, and its own PVC

etcd requires an odd number of members (typically 3 or 5) to maintain a quorum. KubeBlocks enforces this constraint during cluster provisioning and scaling.

Containers Inside Each Pod

Container	Port	Purpose
`etcd`	2379 (client), 2380 (peer)	etcd member serving client requests and participating in Raft consensus; exposes Prometheus metrics at `/metrics` on port 2379; roleProbe runs `/scripts/roleprobe.sh` inside this container

Each pod mounts its own PVC for the etcd data directory (/var/run/etcd/default.etcd), ensuring member data survives pod restarts.

High Availability via Raft Consensus

etcd achieves HA through the Raft consensus algorithm, which guarantees linearizable reads and writes across a majority of cluster members:

Raft Concept	Description
Leader	Receives all client write requests; replicates log entries to followers before acknowledging
Follower	Replicates the leader's log; forwards client reads (or returns the leader's address)
Candidate	Temporarily assumes this role during a leader election after a heartbeat timeout
Quorum	A majority (`(N/2) + 1`) of members must acknowledge a log entry before it is committed

A cluster of 3 members can tolerate 1 failure; a cluster of 5 members can tolerate 2 failures.

Leader Election

When the leader pod fails or becomes unreachable:

Followers detect the missing heartbeat after the election timeout (default 1000 ms)
One follower increments its term and transitions to Candidate, requesting votes from peers
The Candidate that collects a quorum of votes becomes the new leader
The new leader immediately begins sending heartbeats and resumes write operations

Total election time is typically 1–3 seconds under normal network conditions.

Traffic Routing

Service	Type	Port	Notes
`{cluster}-etcd-client`	ClusterIP	2379 (client)	All pods (no roleSelector); `disableAutoProvision: true` — not created by default, must be explicitly enabled
`{cluster}-etcd-headless`	Headless	2379, 2380	All pods; always created by the workload controller as the default headless service

Peer-to-peer Raft traffic (port 2380) flows over the headless service, where each member is addressed by its stable pod DNS name:

{pod-name}.{cluster}-etcd-headless.{namespace}.svc.cluster.local:2380

Automatic Failover

When an etcd member fails, the cluster responds without any manual intervention:

Member becomes unreachable — remaining members detect the missing heartbeat
Raft election — if the lost member was the leader, a new election completes in seconds
Writes resume — the new leader processes client requests as long as quorum is maintained
KubeBlocks detects role change — the exec roleProbe script returns the new leader role; pod labels are updated
Pod label updated — the kubeblocks.io/role=leader label is applied to the new leader pod
Member recovery — when the failed pod restarts, it rejoins the cluster and replays missed log entries from the leader