The MySQL Operator for Kubernetes supports the lifecycle of a MySQL InnoDB Cluster inside a Kubernetes Cluster. This goes from simplifying the deployment of MySQL Server and MySQL Router instances, including management of TLS certificates and replication setup, over ongoing management of those as well as support for backups, be it one-of backups or following a schedule.
The MySQL Operator for Kubernetes is a controller, in the terms of Kubernetes, that manages MySQL InnoDBClusters (IC) on Kubernetes. In this blogpost two topics, “High Availability” (HA) and “Failover and recovery” (FnR), will be discussed. As of MySQL Operator version 9.3.0-2.2.4 Enterprise Edition it is possible to create ClusterSets, which bring the HA and FnR to a new level.
“MySQL InnoDB ClusterSet provides disaster tolerance for InnoDB Cluster deployments by linking a primary InnoDB Cluster with one or more replicas of itself in alternate locations, such as different datacenters. InnoDB ClusterSet automatically manages replication from the primary cluster to the replica clusters using a dedicated ClusterSet replication channel. If the primary cluster becomes unavailable due to the loss of the data center or the loss of network connectivity to it, you can make a replica cluster active instead to restore the availability of the service” [1]
For an introduction to using the MySQL Operator Helm Charts please check our previous blog post.
High Availability
High availability (HA) is a system characteristic regarding the ability of a system to remain operational and accessible with no (hard to achieve) or at least minimal downtime. HA systems are designed to tolerate failures by eliminating single points of failure (SPoF) and implementing failover mechanisms to keep the system available. Hence, if one component fails, another takes over, keeping the service up and running.
We would like to add an additional level of HA to our K8s MySQL setups by adding additional IC or two, which will become replica clusters, to this setup. We could use nodeSelector and affinity rules for the primary and the replicas to spread them over different availability zones/fault domains (AZ/FD), if possible, in the region where the Kubernetes cluster is located. The datacenter locations of the differen FDs/AZs will be close to each other which will minimize the network latency between the ICs. It is possible to have the ICs spread over different regions and different Kubernetes clusters by using solutions like Cilium or Submariner. Keep in mind that inter-region traffic has higher latency which could be an issue in some cases. In this blog post, however, we will discuss how to build a HA setup in one Kubernetes cluster, which inherently means using a single region.
To achieve this we will need to use the Enterprise Edition of the MySQL Operator for Kubernetes. This means that if you so far have used the CE charts you need to switch to the Enterprise Helm Chart and using the EE container images. The EE images are available in the same registry/repository where the Community Edition (CE) is to be found docker pull container-registry.oracle.com/mysql. Instead of community, the EE images have enterprise in their names. For example, the EE Operator is container-registry.oracle.com/mysql/enterprise-operator:9.3.0-2.2.4 . The Enterprise Helm Chart is downloadable through MOS, and not available directly from an online repository to be added with helm repo add. Once downloaded, the chart is to be added from the archive to the local Helm repository.
$ helm search repo mysql
NAME CHART VERSION APP VERSION DESCRIPTION
mysql-operator-ee/mysql-innodbcluster 2.2.4 9.3.0 MySQL InnoDB Cluster Enterprise Helm Chart for ...
mysql-operator-ee/mysql-operator 2.2.4 9.3.0-2.2.4 MySQL Operator Helm Enterprise Chart for deploy...
We will start a primary IC and then gradually add three replica ICs to it. This time we will use a shell script, which will automate the creation of the ICs for us:
#!/bin/bash
REPLICAS=3
ROUTERS=1
function ns_name() {
echo "clusterns$1"
}
function ic_name() {
echo "mycluster$1"
}
cat > $(ic_name 1)_values.yaml <<EOF
edition: enterprise
credentials:
root:
user: root
password: sakila
host: "%"
tls:
useSelfSigned: true
serverInstances: ${REPLICAS}
router:
instances: ${ROUTERS}
baseServerId: 1000
EOF
for i in `seq 2 4`; do
cat > $(ic_name $i)_values.yaml <<EOF
edition: enterprise
credentials:
root:
user: root
password: sakila
host: "%"
tls:
useSelfSigned: true
serverInstances: ${REPLICAS}
router:
instances: ${ROUTERS}
baseServerId: 20${i}0
initDB:
clusterSet:
targetUrl: $(ic_name 1)-0.$(ic_name 1)-instances.$(ns_name 1).svc.cluster.local
secretKeyRef:
name: clusterset-secret
EOF
done
for i in `seq 1 4`; do
NS=$(ns_name $i)
IC=$(ic_name $i)
kubectl create namespace ${NS}
kubectl -n ${NS} create secret generic clusterset-secret \
--from-literal=rootUser=root \
--from-literal=rootHost=% \
--from-literal=rootPassword="sakila"
helm upgrade --install $IC \
--namespace $NS \
--values ${IC}_values.yaml
mysql-operator-ee/mysql-innodbcluster
READY_STATUS="2/2"
for pod_idx in `seq 0 $(($REPLICAS - 1))`; do
POD_NAME="${IC}-$pod_idx"
while [[ $(kubectl -n $NS get pods --field-selector=metadata.name=$POD_NAME 2> /dev/null | grep "$READY_STATUS" | wc -l ) -eq 0 ]]; do
echo -ne "\t"; kubectl -n $NS get pods --field-selector=metadata.name=$POD_NAME
echo -e "\tWaiting for '$POD_NAME' to get to $READY_STATUS in CLUSTER $IC"
sleep 25
done
done
READY_STATUS="$ROUTERS/$ROUTERS"
NAME="${IC}-router"
while [[ $(kubectl -n $NS get deploy --field-selector=metadata.name=$NAME 2> /dev/null | grep "$READY_STATUS" | wc -l ) -eq 0 ]]; do
echo -ne "\t"; kubectl -n $NS get deploy --field-selector=metadata.name=$NAME
echo -e "\tWaiting for $NAME deployment to get to $READY_STATUS in CLUSTER $IC"
sleep 20
done
done
kubectl -n $(ns_name 1) exec -it $(ic_name 1)-0 -c sidecar -- mysqlsh -uroot -psakila --js -e "print(dba.getClusterSet().status())
For every IC a separate values file, named mycluster{i}_values.yaml will be created. The first part of them is the same. The differences are in the baseServerId and the initDB part.
- The Server IDs need to be unique in the ClusterSet, or in the whole replication topology of the MySQL Servers, so no two servers should have the same ID. For the primary we use 1000, which means we will have 1000, 1001 and 1002. For the replicas we use baseServerId of 20{i}0. As an IC is limited to a maximum of 9 members, we can use the tens to separate the servers from each other. First replica IC will have 2020, 2021, 2022 IDs. The second replica IC will have 2030, 2031, 2032 and so on. If we have serverInstances of 9, the maximum ID will be 20{i}8.
- The initDB part is also crucial, as it specifies how a new IC is to become a part of an existing ClusterSet. For this we need the FQDN of the primary IC of that ClusterSet and also a secret, which has to be local to the namespace of the new IC, containing the root user name and password of the primary IC. When the replica sidecars are started, they will notice this initDB configuration and will initialize themselves from the primary.
We will create each of the ICs in a separate Kubernetes namespace. This can be used, for example, with nodeSelector rules for distributing the instances between different regions. nodeSelector is a simple key-value matching mechanism that restricts pod scheduling to nodes with specific labels (e.g., region=US-ASHBURN-AD-1). This means that the nodes should be labeled beforehand. nodeAffinity is a more flexible approach where the nodes don’t need to be pre-labeled. For more information check the official Kubernetes documentation.
Here is the output of the kubectl command at the end of the script:
{
"clusters": {
"mycluster1": {
"clusterRole": "PRIMARY",
"globalStatus": "OK",
"primary": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306": {
"address": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
},
"mycluster1-1.mycluster1-instances.clusterns1.svc.cluster.local:3306": {
"address": "mycluster1-1.mycluster1-instances.clusterns1.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
},
"mycluster1-2.mycluster1-instances.clusterns1.svc.cluster.local:3306": {
"address": "mycluster1-2.mycluster1-instances.clusterns1.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
}
},
"transactionSet": "32a81b18-3d50-11f0-b33f-3ee5203bc8c7:1-4,4feaaee5-3d50-11f0-8f9d-3ee5203bc8c7:1-252"
},
"mycluster2": {
"clusterRole": "REPLICA",
"clusterSetReplication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Waiting for an event from Coordinator",
"applierWorkerThreads": 4,
"receiver": "mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for source to send event",
"replicationSsl": "TLS_AES_128_GCM_SHA256 TLSv1.3",
"replicationSslMode": "VERIFY_IDENTITY",
"source": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306"
},
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306": {
"address": "mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306",
"memberRole": "PRIMARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
},
"mycluster2-1.mycluster2-instances.clusterns2.svc.cluster.local:3306": {
"address": "mycluster2-1.mycluster2-instances.clusterns2.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
},
"mycluster2-2.mycluster2-instances.clusterns2.svc.cluster.local:3306": {
"address": "mycluster2-2.mycluster2-instances.clusterns2.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
}
},
"transactionSet": "32a81b18-3d50-11f0-b33f-3ee5203bc8c7:1-4,4feaaee5-3d50-11f0-8f9d-3ee5203bc8c7:1-252",
"transactionSetConsistencyStatus": "OK",
"transactionSetErrantGtidSet": "",
"transactionSetMissingGtidSet": ""
},
"mycluster3": {
"clusterRole": "REPLICA",
"clusterSetReplication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Waiting for an event from Coordinator",
"applierWorkerThreads": 4,
"receiver": "mycluster3-0.mycluster3-instances.clusterns3.svc.cluster.local:3306",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for source to send event",
"replicationSsl": "TLS_AES_128_GCM_SHA256 TLSv1.3",
"replicationSslMode": "VERIFY_IDENTITY",
"source": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306"
},
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"mycluster3-0.mycluster3-instances.clusterns3.svc.cluster.local:3306": {
"address": "mycluster3-0.mycluster3-instances.clusterns3.svc.cluster.local:3306",
"memberRole": "PRIMARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
},
"mycluster3-1.mycluster3-instances.clusterns3.svc.cluster.local:3306": {
"address": "mycluster3-1.mycluster3-instances.clusterns3.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
},
"mycluster3-2.mycluster3-instances.clusterns3.svc.cluster.local:3306": {
"address": "mycluster3-2.mycluster3-instances.clusterns3.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
}
},
"transactionSet": "32a81b18-3d50-11f0-b33f-3ee5203bc8c7:1-4,4feaaee5-3d50-11f0-8f9d-3ee5203bc8c7:1-252",
"transactionSetConsistencyStatus": "OK",
"transactionSetErrantGtidSet": "",
"transactionSetMissingGtidSet": ""
},
"mycluster4": {
"clusterRole": "REPLICA",
"clusterSetReplication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Waiting for an event from Coordinator",
"applierWorkerThreads": 4,
"receiver": "mycluster4-0.mycluster4-instances.clusterns4.svc.cluster.local:3306",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for source to send event",
"replicationSsl": "TLS_AES_128_GCM_SHA256 TLSv1.3",
"replicationSslMode": "VERIFY_IDENTITY",
"source": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306"
},
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"mycluster4-0.mycluster4-instances.clusterns4.svc.cluster.local:3306": {
"address": "mycluster4-0.mycluster4-instances.clusterns4.svc.cluster.local:3306",
"memberRole": "PRIMARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
},
"mycluster4-1.mycluster4-instances.clusterns4.svc.cluster.local:3306": {
"address": "mycluster4-1.mycluster4-instances.clusterns4.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
},
"mycluster4-2.mycluster4-instances.clusterns4.svc.cluster.local:3306": {
"address": "mycluster4-2.mycluster4-instances.clusterns4.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "",
"replicationLagFromOriginalSource": "",
"role": "HA",
"status": "ONLINE",
"version": "9.3.0"
}
},
"transactionSet": "32a81b18-3d50-11f0-b33f-3ee5203bc8c7:1-4,4feaaee5-3d50-11f0-8f9d-3ee5203bc8c7:1-252",
"transactionSetConsistencyStatus": "OK",
"transactionSetErrantGtidSet": "",
"transactionSetMissingGtidSet": ""
}
},
"domainName": "mycluster1",
"globalPrimaryInstance": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306",
"metadataServer": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306",
"primaryCluster": "mycluster1",
"status": "HEALTHY",
"statusText": "All Clusters available."
}
Switch-Over and Fail-Over
Switch-over and emergency Fail-over between the primary IC and a replica IC in an InnoDB ClusterSet (ICS) deployment can be triggered by an administrator through the creation of a custom resource of type MySQLClusterSetFailover. Here is a quick example of a switchover, that is a change of the primary IC when there is no network or other kind of disruption, but we would like to have a different primary cluster:
$ cat failover.yaml
apiVersion: mysql.oracle.com/v2
kind: MySQLClusterSetFailover
metadata:
name: incident-x1-no-force
namespace: clusterns2
spec:
clusterName: mycluster2
force: false
$ kubectl apply -f failover.yaml
mysqlclustersetfailover.mysql.oracle.com/incident-x1-no-force created
$ kubectl -n clusterns2 get pods -w
NAME READY STATUS RESTARTS AGE
incident-x1-no-force-hv9h8 0/1 Pending 0 0s
incident-x1-no-force-hv9h8 0/1 Pending 0 0s
incident-x1-no-force-hv9h8 0/1 ContainerCreating 0 0s
incident-x1-no-force-hv9h8 1/1 Running 0 1s
incident-x1-no-force-hv9h8 0/1 Completed 0 12s
incident-x1-no-force-hv9h8 0/1 Completed 0 13s
incident-x1-no-force-hv9h8 0/1 Completed 0 14s
$ kubectl -n clusterns2 logs -f incident-x1-no-force-hv9h8
2025-06-02T11:19:56 - [INFO] [FAILOVER] Trying to SWITCH OVER to mycluster2 . Options {}
2025-06-02T11:19:57 - [INFO] [FAILOVER] Environment provided cluster domain: cluster.local
2025-06-02T11:19:58 - [INFO] [FAILOVER] Before CSet={
"clusters": {
"mycluster1": {
"clusterRole": "PRIMARY",
"globalStatus": "OK",
"primary": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306"
},
"mycluster2": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster3": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster4": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
}
},
"domainName": "mycluster1",
"globalPrimaryInstance": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306",
"primaryCluster": "mycluster1",
"status": "HEALTHY",
"statusText": "All Clusters available."
}
Switching the primary cluster of the clusterset to 'mycluster2'
* Verifying clusterset status
** Checking cluster mycluster4
Cluster 'mycluster4' is available
** Checking cluster mycluster1
Cluster 'mycluster1' is available
** Checking cluster mycluster2
Cluster 'mycluster2' is available
** Checking cluster mycluster3
Cluster 'mycluster3' is available
** Waiting for the promoted cluster to apply pending received transactions...
* Refreshing replication account of demoted cluster
* Synchronizing transaction backlog at mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
* Updating metadata
* Updating topology
** Changing replication source of mycluster1-2.mycluster1-instances.clusterns1.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster1-1.mycluster1-instances.clusterns1.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
* Acquiring locks in ClusterSet instances
** Pre-synchronizing SECONDARIES
** Acquiring global lock at PRIMARY
** Acquiring global lock at SECONDARIES
* Synchronizing remaining transactions at promoted primary
* Updating replica clusters
** Changing replication source of mycluster4-1.mycluster4-instances.clusterns4.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster4-2.mycluster4-instances.clusterns4.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster4-0.mycluster4-instances.clusterns4.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster3-2.mycluster3-instances.clusterns3.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster3-1.mycluster3-instances.clusterns3.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster3-0.mycluster3-instances.clusterns3.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
Cluster 'mycluster2' was promoted to PRIMARY of the clusterset. The PRIMARY instance is 'mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306'
2025-06-02T11:20:02 - [INFO] [FAILOVER] Probing cluster status after the change
2025-06-02T11:20:03 - [INFO] [FAILOVER] 9.3.0+ cluster, ClusterSet enabled
WARNING: You are connected to an instance in state 'Read Only'
Write operations on the InnoDB cluster will not be allowed.
WARNING: You are connected to an instance in state 'Read Only'
Write operations on the InnoDB cluster will not be allowed.
2025-06-02T11:20:05 - [INFO] [FAILOVER] Publishing cluster status
2025-06-02T11:20:05 - [INFO] [FAILOVER] cluster probe: status=ClusterDiagStatus.ONLINE online=[<MySQLPod mycluster2-0>, <MySQLPod mycluster2-1>, <MySQLPod mycluster2-2>]
$ kubectl -n clusterns1 exec -it mycluster1-0 -c sidecar -- mysqlsh -uroot -psakila --js -e "print(dba.getClusterSet().status())"
{
"clusters": {
"mycluster1": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster2": {
"clusterRole": "PRIMARY",
"globalStatus": "OK",
"primary": "mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306"
},
"mycluster3": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster4": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
}
},
"domainName": "mycluster1",
"globalPrimaryInstance": "mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306",
"primaryCluster": "mycluster2",
"status": "HEALTHY",
"statusText": "All Clusters available."
}
Omitted in the pods list are the cluster pods.
Let’s do another switch-over. This time to mycluster4 and then to mycluster1:
$ cat failover_2.yaml
apiVersion: mysql.oracle.com/v2
kind: MySQLClusterSetFailover
metadata:
name: incident-x4-no-force
namespace: clusterns4
spec:
clusterName: mycluster4
force: false
$ kubectl -n clusterns4 get pods -w
NAME READY STATUS RESTARTS AGE
incident-x4-no-force-9gftw 0/1 Pending 0 0s
incident-x4-no-force-9gftw 0/1 Pending 0 0s
incident-x4-no-force-9gftw 0/1 ContainerCreating 0 0s
incident-x4-no-force-9gftw 1/1 Running 0 1s
incident-x4-no-force-9gftw 0/1 Completed 0 12s
incident-x4-no-force-9gftw 0/1 Completed 0 14s
incident-x4-no-force-9gftw 0/1 Completed 0 16s
$ kubectl -n clusterns1 exec -it mycluster1-0 -c sidecar -- mysqlsh -uroot -psakila --js -e "print(dba.getClusterSet().status())"
{
"clusters": {
"mycluster1": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster2": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster3": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster4": {
"clusterRole": "PRIMARY",
"globalStatus": "OK",
"primary": "mycluster4-0.mycluster4-instances.clusterns4.svc.cluster.local:3306"
}
},
"domainName": "mycluster1",
"globalPrimaryInstance": "mycluster4-0.mycluster4-instances.clusterns4.svc.cluster.local:3306",
"primaryCluster": "mycluster4",
"status": "HEALTHY",
"statusText": "All Clusters available."
}
$ cat failover_back_to_first.yaml
apiVersion: mysql.oracle.com/v2
kind: MySQLClusterSetFailover
metadata:
name: incident-x12-no-force
namespace: clusterns1
spec:
clusterName: mycluster1
force: false
$ kubectl apply -f failover_back_to_first.yaml
mysqlclustersetfailover.mysql.oracle.com/incident-x12-no-force created
$ kubectl -n clusterns1 get pods -w
NAME READY STATUS RESTARTS AGE
incident-x12-no-force-kq598 0/1 Pending 0 0s
incident-x12-no-force-kq598 0/1 ContainerCreating 0 0s
incident-x12-no-force-kq598 1/1 Running 0 1s
incident-x12-no-force-kq598 0/1 Completed 0 12s
incident-x12-no-force-kq598 0/1 Completed 0 14s
incident-x12-no-force-kq598 0/1 Completed 0 15s
$ kubectl -n clusterns1 exec -it mycluster1-0 -c sidecar -- mysqlsh -uroot -psakila --js -e "print(dba.getClusterSet().status())"
{
"clusters": {
"mycluster1": {
"clusterRole": "PRIMARY",
"globalStatus": "OK",
"primary": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306"
},
"mycluster2": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster3": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster4": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
}
},
"domainName": "mycluster1",
"globalPrimaryInstance": "mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306",
"primaryCluster": "mycluster1",
"status": "HEALTHY",
"statusText": "All Clusters available."
}
Now the ClusterSet is in the state as we created it in the first place.
The controlled switch-over from the primary cluster to a replica cluster while the primary cluster is still available is performed, for example. if a configuration change or maintenance is required on the primary cluster. MySQL Router automatically routes client applications to the right clusters in an InnoDB ClusterSet deployment.
Disaster Recovery
For the next example, we will perform a fail-over, that is, a switch-over when the primary cluster is not available. To simulate that the primary is not available, we will use a Kubernetes NetworkPolicy that will isolate the primary from outside traffic but let the intra-cluster-GR traffic continue to flow. If you use KinD or k3d, their bundled CNI implementations don’t implement NetworkPolicy. In that case one can switch off kindnet (for KinD), or flannel (for k3d) and use Cilium or Calico. Installing Calico with k3d is pretty straightforward.
$ k3d cluster create cluster42 \
--agents 1 \
--registry-config /home/user/registries.yaml \
--k3s-arg "--flannel-backend=none@server:*" \
--k3s-arg "--disable-network-policy@server:*" \
--k3s-arg "--disable=traefik,servicelb@server:*" \
--k3s-arg "--disable=kube-proxy@server:*" \
--k3s-arg "--cluster-init@server:0" \
--no-lb
# Add Taint to the control-plane node, no user load will be scheduled
$ kubectl taint nodes -l kubernetes.io/role=control-plane node-role.kubernetes.io/control-plane=:NoSchedule
$ kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.1/manifests/calico.yaml
$ kubectl wait --for=condition=Ready pod -l k8s-app=calico-node -n kube-system --timeout=120s
As a side note, in this case I am using the –registry-config k3d option with a file that contains:
mirrors:
"docker.io":
endpoint:
- "https://192.168.20.198:5000"
"192.168.20.198:5000":
endpoint:
- "https://192.168.20.198:5000"
"192.168.20.199:5000":
endpoint:
- "https://192.168.20.199:5000"
I have two private and registries at 192.168.20.198:5000 and 192.168.20.199:5000 . Where the latter is an authenticated registry, while the former is not. I have setup a mirroring rule for the host docker.io, so all requests to docker.io will be intercepted by k3d and server by my 198 container registry. This is an easy way to set up container image caching. I have all images needed prefetched.
Using –cluster-init@server:0 is not strictly obligatory, but if you want to add more control plane nodes later to your k3d cluster, then this option should be passed at cluster creation. Without it, the k3d cluster will never be able to accept additional control plane nodes, for example in a case when the control plane is being upgraded to a newer version.
Then we will extend our script, that creates the ClusterSet with four IC with the following code:
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: isolate-mycluster1
namespace: clusterns1
spec:
podSelector:
matchLabels:
mysql.oracle.com/cluster: mycluster1
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
mysql.oracle.com/cluster: mycluster1
egress:
- to:
- podSelector:
matchLabels:
mysql.oracle.com/cluster: mycluster1
EOF
# wait few secs for the Network Policy to be applied
sleep 2
cat <<EOF | kubectl apply -f -
apiVersion: mysql.oracle.com/v2
kind: MySQLClusterSetFailover
metadata:
name: incident-x1-force
namespace: clusterns2
spec:
clusterName: mycluster2
force: true
EOF
# wait few secs for the failover pod to start
sleep 5
INCIDENT_POD_NAME=$(kubectl -n clusterns2 get pods | grep incident-x1-force | cut -d ' ' -f 1)
echo "Waiting pod/$INCIDENT_POD_NAME to complete"
kubectl wait --for=condition=Complete pod/$INCIDENT_POD_NAME -n clusterns2 --timeout=600s
echo "Log of pod/$INCIDENT_POD_NAME"
echo "==========================================================="
kubectl -n clusterns2 logs pod/$INCIDENT_POD_NAME
echo "==========================================================="
kubectl -n $(ns_name 2) exec -it $(ic_name 2)-0 -- mysqlsh -uroot -psakila --js -e "print(dba.getClusterSet().status())"
The shortened a bit result is:
networkpolicy.networking.k8s.io/isolate-mycluster1 created
mysqlclustersetfailover.mysql.oracle.com/incident-x1-force created
Waiting pod/incident-x1-force-qvxkw to complete
Log of pod/incident-x1-force-qvxkw
===========================================================
...
Failing-over primary cluster of the clusterset to 'mycluster2'
* Verifying primary cluster status
None of the instances of the PRIMARY cluster 'mycluster1' could be reached.
* Verifying clusterset status
** Checking cluster mycluster2
Cluster 'mycluster2' is available
** Checking cluster mycluster3
Cluster 'mycluster3' is available
** Checking cluster mycluster4
Cluster 'mycluster4' is available
** Waiting for instances to apply pending received transactions...
** Checking whether target cluster has the most recent GTID set
* Promoting cluster 'mycluster2'
* Updating metadata
** Changing replication source of mycluster3-1.mycluster3-instances.clusterns3.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster3-2.mycluster3-instances.clusterns3.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster3-0.mycluster3-instances.clusterns3.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster4-1.mycluster4-instances.clusterns4.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster4-2.mycluster4-instances.clusterns4.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster4-0.mycluster4-instances.clusterns4.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
PRIMARY cluster failed-over to 'mycluster2'. The PRIMARY instance is 'mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306'
Former PRIMARY cluster 'mycluster1' was INVALIDATED, transactions that were not yet replicated may be lost.
In case of network partitions and similar, the former PRIMARY Cluster 'mycluster1' may still be online and a split-brain can happen. To avoid it, use <Cluster>.fence_all_traffic() to fence the Cluster from all application traffic, or <Cluster>.fence_writes() to fence it from write traffic only.
2025-06-03T11:19:55 - [INFO] [FAILOVER] Probing cluster status after the change
WARNING: You are connected to an instance in state 'Read Only'
Write operations on the InnoDB cluster will not be allowed.
WARNING: You are connected to an instance in state 'Read Only'
Write operations on the InnoDB cluster will not be allowed.
2025-06-03T11:19:57 - [INFO] [FAILOVER] Publishing cluster status
2025-06-03T11:19:57 - [INFO] [FAILOVER] cluster probe: status=ClusterDiagStatus.ONLINE online=[<MySQLPod mycluster2-0>, <MySQLPod mycluster2-1>, <MySQLPod mycluster2-2>]
===========================================================
{
"clusters": {
"mycluster1": {
"clusterErrors": [
"ERROR: Could not connect to any ONLINE members but there are unreachable instances that could still be ONLINE.",
"WARNING: Cluster was invalidated and must be either removed from the ClusterSet or rejoined"
],
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "UNKNOWN",
"globalStatus": "INVALIDATED",
"status": "UNREACHABLE",
"statusText": "Could not connect to any ONLINE members"
},
"mycluster2": {
"clusterRole": "PRIMARY",
"globalStatus": "OK",
"primary": "mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306"
},
"mycluster3": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster4": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
}
},
"domainName": "mycluster1",
"globalPrimaryInstance": "mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306",
"primaryCluster": "mycluster2",
"status": "AVAILABLE",
"statusText": "Primary Cluster available, there are issues with a Replica cluster."
}
The fail-over succeeded. If there was no NetworkPolicy in place, the fail-over would not succeed, as the existing network connection to the primary cluster would have prevented that. In short, do a fail-over when there is a king of failure, and switch-over for balancing or other reasons.
Now, let’s delete the NetworkPolicy that isolated mycluster1 from clusterns1, and check the clusterset status.
$ kubectl -n clusterns1 delete networkpolicies isolate-mycluster1
networkpolicy.networking.k8s.io "isolate-mycluster1" deleted
$ kubectl -n clusterns1 exec -it mycluster1-0 -c sidecar -- mysqlsh -uroot -psakila --js -e "print(dba.getClusterSet().status())"
{
"clusters": {
"mycluster1": {
"clusterErrors": [
"WARNING: Replication channel from the Primary Cluster is missing",
"WARNING: Cluster was invalidated and must be either removed from the ClusterSet or rejoined"
],
"clusterRole": "REPLICA",
"clusterSetReplication": {},
"clusterSetReplicationStatus": "MISSING",
"globalStatus": "INVALIDATED",
"status": "INVALIDATED",
"statusText": "Cluster was invalidated by the ClusterSet it belongs to."
},
"mycluster2": {
"clusterRole": "PRIMARY",
"globalStatus": "OK",
"primary": "mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306"
},
"mycluster3": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster4": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
}
},
"domainName": "mycluster1",
"globalPrimaryInstance": "mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306",
"primaryCluster": "mycluster2",
"status": "AVAILABLE",
"statusText": "Primary Cluster available, there are issues with a Replica cluster."
}
We see that the mycluster1 is back in the clusterset.
NOTE: It could have been a different story, if some client has committed a transaction to mycluster1 while the NetworkPolicy was being applied and thus this transaction has not been replicated to the replica clusters. Failover will succeed but subsequent cluster rejoin would have failed due to the extra data. In such cases the cluster to be rejoined will need additional administrator attention and execution of manual steps to recover it before rejoining it.
We can fence mycluster1 for writes. Then we can rejoin mycluster1 to the clusterset.
$ kubectl -n clusterns1 exec -it mycluster1-0 -c sidecar -- mysqlsh -uroot -psakila --js -e "dba.getCluster().fenceWrites()"
The Cluster 'mycluster1' will be fenced from write traffic
* Disabling automatic super_read_only management on the Cluster...
* Enabling super_read_only on 'mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306'...
* Enabling super_read_only on 'mycluster1-1.mycluster1-instances.clusterns1.svc.cluster.local:3306'...
* Enabling super_read_only on 'mycluster1-2.mycluster1-instances.clusterns1.svc.cluster.local:3306'...
Cluster successfully fenced from write traffic
$ kubectl -n clusterns2 exec -it mycluster2-0 -c sidecar -- mysqlsh -uroot -psakila --js -e "dba.getClusterSet().rejoinCluster('mycluster1')"
WARNING: Using a password on the command line interface can be insecure.
Rejoining cluster 'mycluster1' to the clusterset
NOTE: Cluster 'mycluster1' is invalidated
* Updating metadata
* Rejoining cluster
** Changing replication source of mycluster1-1.mycluster1-instances.clusterns1.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster1-2.mycluster1-instances.clusterns1.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
** Changing replication source of mycluster1-0.mycluster1-instances.clusterns1.svc.cluster.local:3306 to mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306
Cluster 'mycluster1' was rejoined to the clusterset
$ kubectl -n clusterns2 exec -it mycluster2-0 -c sidecar -- mysqlsh -uroot -psakila --js -e "print(dba.getClusterSet().status())"
{
"clusters": {
"mycluster1": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster2": {
"clusterRole": "PRIMARY",
"globalStatus": "OK",
"primary": "mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306"
},
"mycluster3": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
},
"mycluster4": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
}
},
"domainName": "mycluster1",
"globalPrimaryInstance": "mycluster2-0.mycluster2-instances.clusterns2.svc.cluster.local:3306",
"primaryCluster": "mycluster2",
"status": "HEALTHY",
"statusText": "All Clusters available."
}
HA and DR with multiple Kubernetes Clusters
When there are multiple Kubernetes clusters, the ClusterSet setup becomes more complicated, as the Kubernetes setup is then more complicated. Separate Kubernetes clusters come with separate control planes, thus notifications to resources in one of the clusters won’t be delivered to listeners in the others. A MySQL Operator installation is per Kubernetes cluster. Thus, for every K8s cluster we will need to set up a separate MySQL Operator. Every instance will receive resource updates from its own control plane. Typically, the traffic that crosses the K8s cluster boundary and flows into other K8s clusters is called east-west traffic. East and West are here exemplary names of the two K8s clusters. There are multiple ways to connect software between multiple K8s clusters like Istio, Cilium, Submariner, etc.
- Istio supports HA setups like primary-primary, with multiple Istio control planes, each in every k8s cluster. Non-HA setup is on the contrary a primary-replica, where there is only one control-plane. Istio is used for OSI Layer 7 traffic, with mTLS (mutual TLS), distributed tracing, telemetry, etc. For that Istio uses Envoy proxies. Envoy is used as a sidecar, but there is also a sidecarless mode, catching all traffic from application containers and routing it to its destination, while at the same time applying mTLS, if desired. East-West traffic is also tunneled through Envoy, which stays at the cluster boundary. Istio is currently not a suitable solution for ClusterSet.
- Cilium is a solution that works on a lower level, using eBPF, to route network traffic. eBPF is a Linux kernel technology for advanced routing. By using Cilium the traffic sent from one K8s cluster can be routed to another one. However, all IPs should be on the same network. Multiple VCNs/VPCs should be connected with a peering to achieve that. While Cilium supports overlapping CIDRs it is better not to use this. Cilium is built on a multitude of components but the most important ones are Cilium Agent, Cilium CNI, Cilium Operator, Envoy Proxy, DNS Proxy. The Cilium CLI is used for administering the installations.
The agent, commonly known as Cilium or the Cilium daemon, is the central component of Cilium. It manages key tasks in the K8s cluster, including installing the CNI plugin and loading eBPF programs. To make these operations possible, it works as a DaemonSet on each node with elevated privileges. The agent is vital for maintaining the effective functioning of the cluster’s network and security. - Submariner is like Cilium, although much simpler and with less features. There is no cluster mesh, just tunnels that connect multiple K8s clusters. The tunnels are encrypted and created on demand. The K8s Services are replicated by Submariner in the remote K8s clusters, thus they are resolveable. Submariner takes care to route the traffic to the destination in the remote cluster. Submariner doesn’t support overlapping CIDRs.
We will explore HA with the MySQL Operator for Kubernetes EE and multiple Kubernetes Clusters in a further blog post.
Conclusion
We saw how HA can be added to the MySQL deployments on Kubernetes using the MySQL Operator EE for Kubernetes. We also saw how Disaster Recovery can be done. The recently added ClusterSet support is another differentiator for you to upgrade to MySQL Enterprise Edition.
