Rebuilding PostgreSQL Replicas in KubeBlocks

This guide demonstrates how to rebuild replicas using both in-place and non-in-place methods.

What is Replica Rebuilding?

Replica rebuilding is the process of recreating a PostgreSQL replica from scratch or from a backup while maintaining:

Data Consistency: Ensures the replica has an exact copy of primary data
High Availability: Minimizes downtime during the rebuild process

During this process:

The problematic replica is identified and isolated
A new base backup is taken from the primary
WAL (Write-Ahead Log) segments are streamed to catch up
The replica rejoins the replication cluster

When to Rebuild a PostgreSQL Instance?

Rebuilding becomes necessary in these common scenarios:

Replica falls too far behind primary (irrecoverable lag), or Replication slot corruption
WAL file gaps that can't be automatically resolved
Data Corruption: with storage-level corruption (disk/volume issues), inconsistent data between primary and replica, etc
Infrastructure Issues: Node failure, storage device failure or cross Zone/Region migration

Prerequisites

Before proceeding, ensure the following:

Environment Setup:
- A Kubernetes cluster is up and running.
- The kubectl CLI tool is configured to communicate with your cluster.
- KubeBlocks CLI and KubeBlocks Operator are installed. Follow the installation instructions here.
Namespace Preparation: To keep resources isolated, create a dedicated namespace for this tutorial:

kubectl create ns demo
namespace/demo created

Deploy a PostgreSQL Cluster

KubeBlocks uses a declarative approach for managing PostgreSQL clusters. Below is an example configuration for deploying a PostgreSQL cluster with 2 replicas (1 primary, 1 replicas).

Apply the following YAML configuration to deploy the cluster:

apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: pg-cluster
  namespace: demo
spec:
  terminationPolicy: Delete
  clusterDef: postgresql
  topology: replication
  componentSpecs:
    - name: postgresql
      serviceVersion: 16.4.0
      labels:
        apps.kubeblocks.postgres.patroni/scope: pg-cluster-postgresql
      disableExporter: true
      replicas: 2
      resources:
        limits:
          cpu: "0.5"
          memory: "0.5Gi"
        requests:
          cpu: "0.5"
          memory: "0.5Gi"
      volumeClaimTemplates:
        - name: data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi

Verifying the Deployment

Monitor the cluster status until it transitions to the Running state:

kubectl get cluster pg-cluster -n demo -w

Expected Output:

NAME         CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS     AGE
pg-cluster   postgresql           Delete               Creating   50s
pg-cluster   postgresql           Delete               Running    4m2s

Once the cluster status becomes Running, your PostgreSQL cluster is ready for use.

TIP

If you are creating the cluster for the very first time, it may take some time to pull images before running.

Connect to the Primary PostgreSQL Replcia and Write Mock Data

Check replica roles with command:

kubectl get pods -n demo -l app.kubernetes.io/instance=pg-cluster -L kubeblocks.io/role

Example Output:

NAME                      READY   STATUS    RESTARTS   AGE     ROLE
pg-cluster-postgresql-0   4/4     Running   0          13m     secondary
pg-cluster-postgresql-1   4/4     Running   0          12m     primary

Step 1: Connect to the Primary Instance

KubeBlocks automatically creates a Secret containing the PostgreSQL postgres credentials. Retrieve the PostgreSQL postgres credentials:

NAME=`kubectl get secrets -n demo pg-cluster-postgresql-account-postgres -o jsonpath='{.data.username}' | base64 -d`
PASSWD=`kubectl get secrets -n demo pg-cluster-postgresql-account-postgres -o jsonpath='{.data.password}' | base64 -d`

Connect to the primary replica through service pg-cluster-postgresql-postgresql which routes data to primary replica.

kubectl exec -ti -n demo pg-cluster-postgresql-0 -- env PGUSER=${NAME} PGPASSWORD=${PASSWD} psql -h pg-cluster-postgresql-postgresql

Step 2: Write Data to the Primary Instance

Connect to the primary instance and write a record to the database:

postgrel> CREATE DATABASE test;
postgrel> \c test;
postgrel> CREATE TABLE t1 (id INT PRIMARY KEY, name VARCHAR(255));
postgrel> INSERT INTO t1 VALUES (1, 'John Doe');

Step 3: Verify Data Replication

Connect to the replica instance (e.g. pg-cluster-postgresql-0) to verify that the data has been replicated:

kubectl exec -ti -n demo pg-cluster-postgresql-0 -- env PGUSER=${NAME} PGPASSWORD=${PASSWD} psql -h 127.0.0.1

NOTE

If the primary instance is 'pg-cluster-postgresql-0', you should connect to 'pg-cluster-postgresql-1' instead. Make sure to check the role of each instance before connecting.

postgrel> \c test;
postgrel> SELECT * FROM test.t1;

Example Output:

 id |   name
----+----------
  1 | John Doe
(1 row)

Rebuild the Replica

KubeBlocks provides two approaches for rebuilding replicas: in-place and non-in-place.

In-Place Rebuild

Workflow:

Original Pod (e.g. 'pg-cluster-postgresql-0') is terminated
New Pod is created with same name, New PVC is provisioned.
Data is synchronized from primary

Rebuild the replica in-place using the following configuration:

apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: pg-rebuild-replica-inplace
  namespace: demo
spec:
  clusterName: pg-cluster
  force: true
  preConditionDeadlineSeconds: 0
  rebuildFrom:
  - componentName: postgresql
    inPlace: true  # set inPlace to true
    instances:
    - name: pg-cluster-postgresql-0
  type: RebuildInstance

In this configuration, "pg-cluster-postgresql-0" refers to the instance name (Pod name) that will be repaired.

Monitor the rebuild operation:

kubectl get ops  pg-rebuild-replica-inplace -n demo -w

Example Output:

NAME                         TYPE              CLUSTER      STATUS    PROGRESS   AGE
pg-rebuild-replica-inplace   RebuildInstance   pg-cluster   Running   0/1        5s
pg-rebuild-replica-inplace   RebuildInstance   pg-cluster   Running   0/1        5s
pg-rebuild-replica-inplace   RebuildInstance   pg-cluster   Running   0/1        46s
pg-rebuild-replica-inplace   RebuildInstance   pg-cluster   Running   1/1        46s
pg-rebuild-replica-inplace   RebuildInstance   pg-cluster   Succeed   1/1        47s

Verify the Pods to confirm the replica ("pg-cluster-postgresql-0") , its PVC and PV have been recreated.

kubectl get po,pvc,pv -l app.kubernetes.io/instance=pg-cluster -ndemo

Example Output:

kubectl get po,pvc,pv -l app.kubernetes.io/instance=pg-cluster -ndemo
NAME                          READY   STATUS    RESTARTS   AGE
pod/pg-cluster-postgresql-0   4/4     Running   0          5m6s
pod/pg-cluster-postgresql-1   4/4     Running   0          14m

NAME                                                 STATUS   VOLUME    CAPACITY   ACCESS MODES  STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/data-pg-cluster-postgresql-0   Bound    pvc-xxx   20Gi       RWO            <SC>          <unset>                 5m6s
persistentvolumeclaim/data-pg-cluster-postgresql-1   Bound    pvc-yyy   20Gi       RWO            <SC>          <unset>                 14m

Connect to the replica and check if the data has been restored:

kubectl exec -ti -n demo pg-cluster-postgresql-0 -- env PGUSER=${NAME} PGPASSWORD=${PASSWD} psql -h 127.0.0.1

postgrel> \c test;
postgrel> select * from t1;
 id |   name
----+----------
  1 | John Doe
(1 row)

Non-In-Place Rebuild

Workflow:

New Pod (e.g. 'pg-cluster-postgresql-2') is created
Data is synchronized from primary
Original Pod is terminated after new replica is ready

Rebuild the replica by creating a new instance:

apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: pg-rebuild-replica-non-inplace
  namespace: demo
spec:
  clusterName: pg-cluster
  force: true
  preConditionDeadlineSeconds: 0
  rebuildFrom:
  - componentName: postgresql
    inPlace: false
    instances:
    - name: pg-cluster-postgresql-0
  type: RebuildInstance

In this configuration, "pg-cluster-postgresql-0" refers to the instance name (Pod name) that will be repaired.

Monitor the rebuild operation:

kubectl get ops  pg-rebuild-replica-inplace -n demo -w

Example Output:

NAME                             TYPE              CLUSTER      STATUS    PROGRESS   AGE
pg-rebuild-replica-non-inplace   RebuildInstance   pg-cluster   Running   0/1        5s
pg-rebuild-replica-non-inplace   RebuildInstance   pg-cluster   Running   0/1        5s
pg-rebuild-replica-non-inplace   RebuildInstance   pg-cluster   Running   0/1        46s
pg-rebuild-replica-non-inplace   RebuildInstance   pg-cluster   Running   1/1        46s
pg-rebuild-replica-non-inplace   RebuildInstance   pg-cluster   Succeed   1/1        47s

kubectl get pods  -l app.kubernetes.io/instance=pg-cluster -n demo -w
NAME                      READY   STATUS    RESTARTS   AGE
pg-cluster-postgresql-0   4/4     Running   0          53m
pg-cluster-postgresql-1   4/4     Running   0          2m52s
pg-cluster-postgresql-2   0/4     Pending   0          0s
pg-cluster-postgresql-2   0/4     Pending   0          4s
pg-cluster-postgresql-2   0/4     Init:0/4   0          4s
pg-cluster-postgresql-2   0/4     Init:1/4   0          5s
pg-cluster-postgresql-2   0/4     Init:2/4   0          6s
pg-cluster-postgresql-2   0/4     Init:3/4   0          7s
pg-cluster-postgresql-2   0/4     PodInitializing   0          8s
pg-cluster-postgresql-2   2/4     Running           0          9s
pg-cluster-postgresql-2   2/4     Running           0          12s
pg-cluster-postgresql-2   2/4     Running           0          14s
pg-cluster-postgresql-2   3/4     Running           0          14s
pg-cluster-postgresql-2   3/4     Running           0          16s
pg-cluster-postgresql-2   4/4     Running           0          3m30s
pg-cluster-postgresql-0   4/4     Terminating       0          4m3s
pg-cluster-postgresql-0   4/4     Terminating       0          4m3s
pg-cluster-postgresql-0   4/4     Terminating       0          4m3s

Connect to the new replica instance ('pg-cluster-postgresql-2') and verify the data:

kubectl exec -ti -n demo pg-cluster-postgresql-2 -- env PGUSER=${NAME} PGPASSWORD=${PASSWD} psql -h 127.0.0.1

postgrel> \c test;
postgrel> select * from t1;
 id |   name
----+----------
  1 | John Doe
(1 row)

Rebuild from Backups

This configuration below shows recovering a failed replica by restoring it from a known backup using backupName:

apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: pg-rebuild-from-backup
  namespace: demo
spec:
  clusterName: pg-cluster
  force: true
  rebuildFrom:
  - backupName: <PG_BACKUP_NAME>
    componentName: postgresql
    inPlace: true
    instances:
    - name: pg-cluster-postgresql-1
  type: RebuildInstance

Rebuild to Specific Node

To rebuild the new replica on the specific node, you may use targetNodeName:

apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: pg-rebuild-from-backup
  namespace: demo
spec:
  clusterName: pg-cluster
  force: true
  rebuildFrom:
  - backupName: <PG_BACKUP_NAME>
    componentName: postgresql
    inPlace: true
    instances:
    - name: pg-cluster-postgresql-1
      targetNodeName: <NODE_NAME> # new pod will be scheduled to the specified nod
  type: RebuildInstance

Summary

Key takeaways:

In-Place Rebuild: Successfully rebuilt the replica and restored the deleted data.
Non-In-Place Rebuild: Created a new replica instance and successfully restored the data.

Both methods effectively recover the replica and ensure data consistency.