MongoDB Architecture in KubeBlocks

KubeBlocks supports two distinct MongoDB deployment architectures:

Architecture	Topology	Use Case
Replica Set	Primary + secondaries, oplog replication	Single-dataset HA; datasets that fit on one node
Sharding	Mongos routers + config servers + data shards	Horizontal scaling; datasets too large for a single replica set; high write throughput

Replica Set Architecture

A MongoDB replica set maintains multiple copies of the same dataset across pods. One pod acts as the primary (accepts all writes); the others are secondaries that replicate the primary's oplog and can serve reads.

Application / Client

Read/Write mongo-cluster-mongodb-mongodb:27017
Read-Only mongo-cluster-mongodb-mongodb-ro:27017

RW → roleSelector: primary

RO → roleSelector: secondary

Kubernetes Services

mongo-cluster-mongodb-mongodb

ClusterIP · :27017
selector: kubeblocks.io/role=primary
Endpoints auto-switch with primary

ReadWrite

mongo-cluster-mongodb-mongodb-ro

ClusterIP · :27017
selector: kubeblocks.io/role=secondary
Distribute reads across replicas

ReadOnly

→ primary pod

→ secondary pods

Pods · Worker Nodes

mongodb-0PRIMARY

🍃

mongodb (mongod + Replica Set)

:27017 mongo · primary status

primary

📊

mongodb-exporter

:9216 metrics

⚙ init-syncer (copies syncerctl → /tools)

💾 PVC data-0 · 20Gi

mongodb-1SECONDARY

🍃

mongodb (mongod + Replica Set)

:27017 mongo · secondary status

secondary

📊

mongodb-exporter

:9216 metrics

⚙ init-syncer (copies syncerctl → /tools)

💾 PVC data-1 · 20Gi

mongodb-2SECONDARY

🍃

mongodb (mongod + Replica Set)

:27017 mongo · secondary status

secondary

📊

mongodb-exporter

:9216 metrics

⚙ init-syncer (copies syncerctl → /tools)

💾 PVC data-2 · 20Gi

↔Replica Set Oplog Replicationprimary-0 → secondary-1 · secondary-2 | w:majority write concern

🔗Headless service — stable pod DNS for internal use (replication, HA heartbeat, operator probes); not a client endpoint

Primary / RW Traffic

Secondary / RO Traffic

Replica Set DCS

Persistent Storage

Resource Hierarchy

Cluster  →  Component  →  InstanceSet  →  Pod × N

Resource	Role
Cluster	User-facing declaration — specifies topology, replica count, storage size, and resources
Component	Generated automatically; references a `ComponentDefinition` describing container specs, lifecycle actions, and services
InstanceSet	KubeBlocks custom workload (replaces `StatefulSet`); manages pods with stable identities and role awareness
Pod	Actual running MongoDB instance; each pod gets a unique ordinal and its own PVC

Containers Inside Each Pod

Each replica set pod runs three containers (plus three init containers on startup: init-syncer copies /bin/syncer and /bin/syncerctl to /tools; init-kubectl copies the kubectl binary to dataMountPath/tmp/bin; init-pbm-agent copies pbm, pbm-agent, and pbm-agent-entrypoint to /tools):

Container	Port	Purpose
`mongodb`	27017, 3601 (ha replication)	MongoDB database engine; participates in replica set replication and election; roleProbe runs `/tools/syncerctl getrole` inside this container
`mongodb-backup-agent`	—	Percona Backup for MongoDB (PBM) agent; coordinates cluster-wide consistent backups
`exporter`	9216	Prometheus metrics exporter

Each pod mounts its own PVC for the MongoDB data directory (default /data/mongodb, set by dataMountPath in chart values).

High Availability

MongoDB replica sets use oplog-based replication and a majority-vote (Raft-like) election protocol:

Concept	Description
Primary	Receives all write operations; records changes to the oplog
Secondary	Replicates the primary's oplog; can serve reads when `readPreference` is configured
Election	When the primary fails, secondaries vote; the candidate with the most up-to-date oplog and a majority of votes wins
Write concern	`w:majority` ensures a write is durable on a quorum before acknowledging

A 3-member replica set tolerates 1 failure.

Automatic Failover

Primary pod crashes or becomes unreachable — secondaries stop receiving heartbeat pings
Election timeout — after approximately 10 seconds (electionTimeoutMillis), one secondary calls for an election
Majority vote — the candidate with the most up-to-date oplog and a majority of votes wins and becomes the new primary
KubeBlocks roleProbe detects the change — syncerctl getrole returns primary for the new pod → kubeblocks.io/role=primary label is applied
Service endpoints switch — the {cluster}-mongodb-mongodb ClusterIP service automatically routes writes to the new primary

Failover typically completes within 10–30 seconds.

Traffic Routing

Service	Type	Port	Selector
`{cluster}-mongodb-mongodb`	ClusterIP	27017	`kubeblocks.io/role=primary`
`{cluster}-mongodb-mongodb-ro`	ClusterIP	27017	`kubeblocks.io/role=secondary`
`{cluster}-mongodb`	ClusterIP	27017	all pods (no roleSelector — `everypod`)
`{cluster}-mongodb-headless`	Headless	27017	all pods

Write traffic: connect to {cluster}-mongodb-mongodb:27017 (roleSelector: primary — always routes to the current primary)
Read-only traffic: connect to {cluster}-mongodb-mongodb-ro:27017 (roleSelector: secondary)
{cluster}-mongodb (two-segment name) is the everypod service — it routes to all pods and is not a write-only endpoint

Sharding Architecture

MongoDB Sharding distributes data across multiple independent replica sets (shards) using a shard key. A layer of stateless mongos routers sits in front, and a config server replica set (CSRS) stores the chunk routing metadata.

Application / Client

Connect via per-pod services {cluster}-mongos-mongos-0:27017, {cluster}-mongos-mongos-1:27017 … (one ClusterIP per pod, podService: true)
Or use {cluster}-mongos-headless for DNS-based discovery · Mongos routes each query to the correct shard

standard MongoDB connection → mongos router

Mongos · Query Routers· stateless · no PVC · reads chunk map from config servers

mongos-0

Stateless

🍃mongos:27017

📊exporter:9216

mongos-1

Stateless

🍃mongos:27017

📊exporter:9216

mongos-2

Stateless

🍃mongos:27017

📊exporter:9216

reads chunk routing metadata

forwards query to shard primary

Config Servers · CSRS Replica Set· stores chunk map, shard membership, cluster metadata

config-0PRIMARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

config-1SECONDARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

config-2SECONDARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

↔CSRS Oplog Replicationconfig-0 → config-1, config-2 · w:majority writes

routed writes/reads → shard primary

Data Shards · Independent Replica Sets· each shard = 1 KubeBlocks Sharding Component · owns a range of chunk space

shard-0chunk range A

shard-0-0PRIMARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

shard-0-1SECONDARY

shard-0-2SECONDARY

💾 PVC per pod · 20Gi

shard-1chunk range B

shard-1-0PRIMARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

shard-1-1SECONDARY

shard-1-2SECONDARY

💾 PVC per pod · 20Gi

shard-2chunk range C

shard-2-0PRIMARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

shard-2-1SECONDARY

shard-2-2SECONDARY

💾 PVC per pod · 20Gi

↔Each shard replica set replicates independently via oplog · inter-shard failover is independent

Mongos (query router)

Config Server (CSRS)

Shard Primary

Shard Secondary

Persistent Storage

Resource Hierarchy

The sharding topology uses both Component (for mongos and config-server) and Sharding (for data shards):

Cluster  →  Component (mongos)         →  InstanceSet  →  Pod × N
         →  Component (config-server)  →  InstanceSet  →  Pod × 3
         →  Sharding  (shard)          →  Shard × N    →  InstanceSet  →  Pod × replicas

Resource	Role
Cluster	Specifies topology `sharding`; declares mongos, config-server, and shard specs
Component (mongos)	Stateless query routers; requires config-server to be reachable before routing
Component (config-server)	3-node replica set (CSRS) storing chunk map and shard membership
Sharding	KubeBlocks sharding spec; manages N identical shard Components
Shard	An independent replica set owning a range of chunks; each shard fails over independently

Component Details

Mongos pods (stateless — no PVC; each pod also runs an init-kubectl init container that copies the kubectl binary into the container's tools path for use by lifecycle scripts):

Container	Port	Purpose
`mongos`	27017	MongoDB query router — reads chunk map from CSRS and forwards queries to the correct shard
`exporter`	9216	Prometheus metrics exporter

Config server pods (3-node CSRS replica set; same three init containers as replica set pods — init-syncer, init-kubectl, init-pbm-agent):

Container	Port	Purpose
`mongodb`	27017, 3601 (ha replication)	Config server mongod — stores chunk routing metadata; must use `w:majority` for all config writes; roleProbe runs `/tools/syncerctl getrole`
`mongodb-backup-agent`	—	Percona Backup for MongoDB (PBM) agent
`exporter`	9216	Prometheus metrics exporter

Shard pods (each shard = independent replica set; same three init containers — init-syncer, init-kubectl, init-pbm-agent):

Container	Port	Purpose
`mongodb`	27017, 3601 (ha replication)	Data shard mongod — stores documents assigned to this shard's chunk range; roleProbe runs `/tools/syncerctl getrole`
`mongodb-backup-agent`	—	Percona Backup for MongoDB (PBM) agent
`exporter`	9216	Prometheus metrics exporter

Each shard pod mounts its own PVC for its data directory. At least 3 shards are recommended for balanced distribution (not enforced by the addon).

How Query Routing Works

Client connects to any mongos on port 27017 — no cluster-aware driver required
Mongos reads the chunk map from the CSRS to determine which shard owns the target key range
Mongos forwards the query to the primary of the target shard
For scatter-gather queries (no shard key filter), mongos fans out to all shard primaries and merges results
On chunk migrations or shard additions, mongos automatically discovers the updated routing table

Traffic Routing

Service	Type	Port	Notes
`{cluster}-mongos-mongos-<ordinal>`	ClusterIP (per-pod)	27017	One service per mongos pod (`podService: true`); use all pod addresses as URI seed list
`{cluster}-mongos-headless`	Headless	27017	DNS-based discovery of all mongos pods
`{cluster}-mongos-internal`	ClusterIP	27018	Intra-cluster use only; not for application traffic

Clients connect through mongos using a MongoDB URI seed list, e.g.:

mongodb://{cluster}-mongos-mongos-0:27017,{cluster}-mongos-mongos-1:27017/

Or use {cluster}-mongos-headless for DNS-based discovery. Direct shard or config server access is not intended for application traffic.

Automatic Failover

Each component fails over independently:

Shard primary fails → that shard's replica set elects a new primary (≈10 s); mongos retries on the new primary automatically
Config server primary fails → CSRS elects a new primary; chunk map reads resume; mongos does not lose routing data (it caches the chunk map locally)
Mongos pod fails → clients reconnect to another mongos pod; mongos is stateless so no data is lost

System Accounts

KubeBlocks automatically manages the following MongoDB system accounts. Passwords are stored in Secrets named {cluster}-{component}-account-{name}.

Account	Role	Purpose
`root`	Superuser	Default administrative account used for cluster initialization and management

MongoDB Architecture in KubeBlocks

KubeBlocks supports two distinct MongoDB deployment architectures:

Architecture	Topology	Use Case
Replica Set	Primary + secondaries, oplog replication	Single-dataset HA; datasets that fit on one node
Sharding	Mongos routers + config servers + data shards	Horizontal scaling; datasets too large for a single replica set; high write throughput

Replica Set Architecture

Application / Client

Read/Write mongo-cluster-mongodb-mongodb:27017
Read-Only mongo-cluster-mongodb-mongodb-ro:27017

RW → roleSelector: primary

RO → roleSelector: secondary

Kubernetes Services

mongo-cluster-mongodb-mongodb

ClusterIP · :27017
selector: kubeblocks.io/role=primary
Endpoints auto-switch with primary

ReadWrite

mongo-cluster-mongodb-mongodb-ro

ClusterIP · :27017
selector: kubeblocks.io/role=secondary
Distribute reads across replicas

ReadOnly

→ primary pod

→ secondary pods

Pods · Worker Nodes

mongodb-0PRIMARY

🍃

mongodb (mongod + Replica Set)

:27017 mongo · primary status

primary

📊

mongodb-exporter

:9216 metrics

⚙ init-syncer (copies syncerctl → /tools)

💾 PVC data-0 · 20Gi

mongodb-1SECONDARY

🍃

mongodb (mongod + Replica Set)

:27017 mongo · secondary status

secondary

📊

mongodb-exporter

:9216 metrics

⚙ init-syncer (copies syncerctl → /tools)

💾 PVC data-1 · 20Gi

mongodb-2SECONDARY

🍃

mongodb (mongod + Replica Set)

:27017 mongo · secondary status

secondary

📊

mongodb-exporter

:9216 metrics

⚙ init-syncer (copies syncerctl → /tools)

💾 PVC data-2 · 20Gi

↔Replica Set Oplog Replicationprimary-0 → secondary-1 · secondary-2 | w:majority write concern

🔗Headless service — stable pod DNS for internal use (replication, HA heartbeat, operator probes); not a client endpoint

Primary / RW Traffic

Secondary / RO Traffic

Replica Set DCS

Persistent Storage

Resource Hierarchy

Cluster  →  Component  →  InstanceSet  →  Pod × N

Resource	Role
Cluster	User-facing declaration — specifies topology, replica count, storage size, and resources
Component	Generated automatically; references a `ComponentDefinition` describing container specs, lifecycle actions, and services
InstanceSet	KubeBlocks custom workload (replaces `StatefulSet`); manages pods with stable identities and role awareness
Pod	Actual running MongoDB instance; each pod gets a unique ordinal and its own PVC

Containers Inside Each Pod

Container	Port	Purpose
`mongodb`	27017, 3601 (ha replication)	MongoDB database engine; participates in replica set replication and election; roleProbe runs `/tools/syncerctl getrole` inside this container
`mongodb-backup-agent`	—	Percona Backup for MongoDB (PBM) agent; coordinates cluster-wide consistent backups
`exporter`	9216	Prometheus metrics exporter

Each pod mounts its own PVC for the MongoDB data directory (default /data/mongodb, set by dataMountPath in chart values).

High Availability

MongoDB replica sets use oplog-based replication and a majority-vote (Raft-like) election protocol:

Concept	Description
Primary	Receives all write operations; records changes to the oplog
Secondary	Replicates the primary's oplog; can serve reads when `readPreference` is configured
Election	When the primary fails, secondaries vote; the candidate with the most up-to-date oplog and a majority of votes wins
Write concern	`w:majority` ensures a write is durable on a quorum before acknowledging

A 3-member replica set tolerates 1 failure.

Automatic Failover

Primary pod crashes or becomes unreachable — secondaries stop receiving heartbeat pings
Election timeout — after approximately 10 seconds (electionTimeoutMillis), one secondary calls for an election
Majority vote — the candidate with the most up-to-date oplog and a majority of votes wins and becomes the new primary
KubeBlocks roleProbe detects the change — syncerctl getrole returns primary for the new pod → kubeblocks.io/role=primary label is applied
Service endpoints switch — the {cluster}-mongodb-mongodb ClusterIP service automatically routes writes to the new primary

Failover typically completes within 10–30 seconds.

Traffic Routing

Service	Type	Port	Selector
`{cluster}-mongodb-mongodb`	ClusterIP	27017	`kubeblocks.io/role=primary`
`{cluster}-mongodb-mongodb-ro`	ClusterIP	27017	`kubeblocks.io/role=secondary`
`{cluster}-mongodb`	ClusterIP	27017	all pods (no roleSelector — `everypod`)
`{cluster}-mongodb-headless`	Headless	27017	all pods

Write traffic: connect to {cluster}-mongodb-mongodb:27017 (roleSelector: primary — always routes to the current primary)
Read-only traffic: connect to {cluster}-mongodb-mongodb-ro:27017 (roleSelector: secondary)
{cluster}-mongodb (two-segment name) is the everypod service — it routes to all pods and is not a write-only endpoint

Sharding Architecture

Application / Client

standard MongoDB connection → mongos router

Mongos · Query Routers· stateless · no PVC · reads chunk map from config servers

mongos-0

Stateless

🍃mongos:27017

📊exporter:9216

mongos-1

Stateless

🍃mongos:27017

📊exporter:9216

mongos-2

Stateless

🍃mongos:27017

📊exporter:9216

reads chunk routing metadata

forwards query to shard primary

Config Servers · CSRS Replica Set· stores chunk map, shard membership, cluster metadata

config-0PRIMARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

config-1SECONDARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

config-2SECONDARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

↔CSRS Oplog Replicationconfig-0 → config-1, config-2 · w:majority writes

routed writes/reads → shard primary

Data Shards · Independent Replica Sets· each shard = 1 KubeBlocks Sharding Component · owns a range of chunk space

shard-0chunk range A

shard-0-0PRIMARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

shard-0-1SECONDARY

shard-0-2SECONDARY

💾 PVC per pod · 20Gi

shard-1chunk range B

shard-1-0PRIMARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

shard-1-1SECONDARY

shard-1-2SECONDARY

💾 PVC per pod · 20Gi

shard-2chunk range C

shard-2-0PRIMARY

🍃mongodb:27017

📊exporter:9216

⚙init-syncer (syncerctl → /tools)

shard-2-1SECONDARY

shard-2-2SECONDARY

💾 PVC per pod · 20Gi

↔Each shard replica set replicates independently via oplog · inter-shard failover is independent

Mongos (query router)

Config Server (CSRS)

Shard Primary

Shard Secondary

Persistent Storage

Resource Hierarchy

The sharding topology uses both Component (for mongos and config-server) and Sharding (for data shards):

Cluster  →  Component (mongos)         →  InstanceSet  →  Pod × N
         →  Component (config-server)  →  InstanceSet  →  Pod × 3
         →  Sharding  (shard)          →  Shard × N    →  InstanceSet  →  Pod × replicas

Resource	Role
Cluster	Specifies topology `sharding`; declares mongos, config-server, and shard specs
Component (mongos)	Stateless query routers; requires config-server to be reachable before routing
Component (config-server)	3-node replica set (CSRS) storing chunk map and shard membership
Sharding	KubeBlocks sharding spec; manages N identical shard Components
Shard	An independent replica set owning a range of chunks; each shard fails over independently

Component Details

Mongos pods (stateless — no PVC; each pod also runs an init-kubectl init container that copies the kubectl binary into the container's tools path for use by lifecycle scripts):

Container	Port	Purpose
`mongos`	27017	MongoDB query router — reads chunk map from CSRS and forwards queries to the correct shard
`exporter`	9216	Prometheus metrics exporter

Config server pods (3-node CSRS replica set; same three init containers as replica set pods — init-syncer, init-kubectl, init-pbm-agent):

Container	Port	Purpose
`mongodb`	27017, 3601 (ha replication)	Config server mongod — stores chunk routing metadata; must use `w:majority` for all config writes; roleProbe runs `/tools/syncerctl getrole`
`mongodb-backup-agent`	—	Percona Backup for MongoDB (PBM) agent
`exporter`	9216	Prometheus metrics exporter

Shard pods (each shard = independent replica set; same three init containers — init-syncer, init-kubectl, init-pbm-agent):

Container	Port	Purpose
`mongodb`	27017, 3601 (ha replication)	Data shard mongod — stores documents assigned to this shard's chunk range; roleProbe runs `/tools/syncerctl getrole`
`mongodb-backup-agent`	—	Percona Backup for MongoDB (PBM) agent
`exporter`	9216	Prometheus metrics exporter

Each shard pod mounts its own PVC for its data directory. At least 3 shards are recommended for balanced distribution (not enforced by the addon).

How Query Routing Works

Client connects to any mongos on port 27017 — no cluster-aware driver required
Mongos reads the chunk map from the CSRS to determine which shard owns the target key range
Mongos forwards the query to the primary of the target shard
For scatter-gather queries (no shard key filter), mongos fans out to all shard primaries and merges results
On chunk migrations or shard additions, mongos automatically discovers the updated routing table

Traffic Routing

Service	Type	Port	Notes
`{cluster}-mongos-mongos-<ordinal>`	ClusterIP (per-pod)	27017	One service per mongos pod (`podService: true`); use all pod addresses as URI seed list
`{cluster}-mongos-headless`	Headless	27017	DNS-based discovery of all mongos pods
`{cluster}-mongos-internal`	ClusterIP	27018	Intra-cluster use only; not for application traffic

Clients connect through mongos using a MongoDB URI seed list, e.g.:

mongodb://{cluster}-mongos-mongos-0:27017,{cluster}-mongos-mongos-1:27017/

Or use {cluster}-mongos-headless for DNS-based discovery. Direct shard or config server access is not intended for application traffic.

Automatic Failover

Each component fails over independently:

Shard primary fails → that shard's replica set elects a new primary (≈10 s); mongos retries on the new primary automatically
Config server primary fails → CSRS elects a new primary; chunk map reads resume; mongos does not lose routing data (it caches the chunk map locally)
Mongos pod fails → clients reconnect to another mongos pod; mongos is stateless so no data is lost

System Accounts

KubeBlocks automatically manages the following MongoDB system accounts. Passwords are stored in Secrets named {cluster}-{component}-account-{name}.

Account	Role	Purpose
`root`	Superuser	Default administrative account used for cluster initialization and management