This page describes how KubeBlocks deploys an etcd cluster on Kubernetes — covering the resource hierarchy, pod internals, Raft-based consensus, and traffic routing.
{cluster}-etcd-client:2379{cluster}-etcd-headless (pod DNS)KubeBlocks models an etcd cluster as a hierarchy of Kubernetes custom resources:
Cluster → Component → InstanceSet → Pod × N
| Resource | Role |
|---|---|
| Cluster | User-facing declaration — specifies the number of etcd members, storage size, and resources |
| Component | Generated automatically; references a ComponentDefinition that describes container specs, lifecycle actions, and services |
| InstanceSet | KubeBlocks custom workload (replaces StatefulSet); manages pods with stable identities and role awareness |
| Pod | Actual running etcd member; each pod gets a unique ordinal, a stable DNS name, and its own PVC |
etcd requires an odd number of members (typically 3 or 5) to maintain a quorum. KubeBlocks enforces this constraint during cluster provisioning and scaling.
Every etcd pod runs one main application container (plus an inject-bash init container that runs inject-bash.sh to make bash available in the container's tools path, required by lifecycle scripts):
| Container | Port | Purpose |
|---|---|---|
etcd | 2379 (client), 2380 (peer) | etcd member serving client requests and participating in Raft consensus; exposes Prometheus metrics at /metrics on port 2379; roleProbe runs /scripts/roleprobe.sh inside this container |
Each pod mounts its own PVC for the etcd data directory (/var/run/etcd/default.etcd), ensuring member data survives pod restarts.
etcd achieves HA through the Raft consensus algorithm, which guarantees linearizable reads and writes across a majority of cluster members:
| Raft Concept | Description |
|---|---|
| Leader | Receives all client write requests; replicates log entries to followers before acknowledging |
| Follower | Replicates the leader's log; forwards client reads (or returns the leader's address) |
| Candidate | Temporarily assumes this role during a leader election after a heartbeat timeout |
| Quorum | A majority ((N/2) + 1) of members must acknowledge a log entry before it is committed |
A cluster of 3 members can tolerate 1 failure; a cluster of 5 members can tolerate 2 failures.
When the leader pod fails or becomes unreachable:
Total election time is typically 1–3 seconds under normal network conditions.
The ComponentDefinition declares two named ComponentServices — client and peer — both with disableAutoProvision: true, so neither is created automatically. The controller also provisions the default headless service for every component:
| Service | Type | Port | Notes |
|---|---|---|---|
{cluster}-etcd-client | ClusterIP | 2379 (client) | All pods (no roleSelector); disableAutoProvision: true — not created by default, must be explicitly enabled |
{cluster}-etcd-headless | Headless | 2379, 2380 | All pods; always created by the workload controller as the default headless service |
The headless service ({cluster}-etcd-headless) is the always-present endpoint used for peer communication and operator probes. The optional {cluster}-etcd-client ClusterIP service aggregates port 2379 across all pods; it has no roleSelector because etcd members transparently forward writes to the current leader internally, so any member is a valid client entry point. Enable it explicitly when a stable virtual IP is required.
KubeBlocks runs an exec roleProbe (/scripts/roleprobe.sh) inside the etcd container to detect the leader and update the kubeblocks.io/role pod label, which drives KubeBlocks' own update ordering (leader updated last during rolling updates).
Peer-to-peer Raft traffic (port 2380) flows over the headless service, where each member is addressed by its stable pod DNS name:
{pod-name}.{cluster}-etcd-headless.{namespace}.svc.cluster.local:2380
When an etcd member fails, the cluster responds without any manual intervention:
kubeblocks.io/role=leader label is applied to the new leader pod