In this post, I cover a great feature of Cluster API: The ability to do a rolling upgrade of your Kubernetes cluster. Cluster API makes it simple, and repeatable. I’ve manually upgraded a Kubernetes cluster and it wasn’t too difficult, but why do it manually when you can automate it and have the safety of repeatability?

What is Cluster API?

If you’re not familiar, Cluster API is a Kubernetes subproject focused on providing declarative APIs and tooling to simplify the provisioning, upgrading, and operating of multiple Kubernetes clusters. As an analogy, think of Cluster API as your Java interface and it uses Kubernetes-style interfaces to manage the needed infrastructure for a Kubernetes cluster.

Back to our analogy, to use the Java interface, you need to implement it in a class. Cluster API uses an infrastructure provider model to extend support to multiple infrastructure providers. Almost every infrastructure provider implements one. You can find Oracle’s on GitHub and use it to build clusters in Oracle Cloud Infrastructure (OCI).

For a brief introduction on how to start with our provider, check out our blog on how to create and manage Kubernetes clusters on OCI with a Cluster API or read our documentation for more info on getting started. For more information, check out Kubernetes Cluster API.

Create a new Kubernetes image

To upgrade our worker nodes, we need to use Kubernetes Image Builder to build the image. For prerequisites and other setup, follow the more detailed steps in the Building Images section.

We then set kubernetes info to a newer version than our current cluster version. Right now, the cluster used is is 1.22.9, and we want to upgrade to 1.23.6 Find the current release versions on Kubernetes’ site. We edit images/capi/packer/config/kubernetes.json and change the following parameters:

  "kubernetes_deb_version": "1.23.6-00",
  "kubernetes_rpm_version": "1.23.6-0",
  "kubernetes_semver": "v1.23.6",
  "kubernetes_series": "v1.23"

After the config is updated, we use Ubuntu 20.04 build to create the new image with packer:

$ cd <root_of_image_builder_repo>/images/capi
$ PACKER_VAR_FILES=oci.json make build-oci-ubuntu-2004

This command deploys an instance in OCI to build the image. When done, you get output of the image’s OCID. You can also check that the image built by visiting https://console.us-phoenix-1.oraclecloud.com/compute/images. Save this OCID to use it later.

Upgrade my cluster using Cluster API

One of the main goals of Cluster API is to manage the lifecycle (create, scale, upgrade, and destroy) of Kubernetes-conformant clusters using a declarative API. Automating the upgrade process is a big achievement. I don’t want the cordon/drain nodes to do the rolling update. The tools can do it for me.

I’m going to assume you already have a management and a workload cluster up and running. If not, follow the getting started guide to create the workload cluster. The following example shows how I created my workload cluster:

...
clusterctl generate cluster oci-cluster-phx --kubernetes-version v1.22.9 \  
--target-namespace default \  
--control-plane-machine-count=3 \  
--from https://github.com/oracle/cluster-api-provider-oci/releases/download/v0.3.0/cluster-template.yaml | kubectl apply -f -

Now that we have a workload cluster up and running and a new image, we can upgrade. The upgrade process includes the following steps:

  1. Upgrade the control plane.

  2. Upgrade the worker machines.

Before we start, let’s go ahead and check the version of our running workload cluster. To access our workload cluster, we need to export the Kubernetes config from our management cluster with the following command:

$ clusterctl get kubeconfig oci-cluster-phx -n default > oci-cluster-phx.kubeconfig

When we have the kubeconfig file, we can check the version of our workload cluster:

$ kubectl --kubeconfig=oci-cluster-phx.kubeconfig version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-04-13T19:52:02Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

The server version is v1.22.9, so let’s change that.

Upgrade the control plane

First, let’s make a copy the machine template for the control plane.

$ kubectl get ocimachinetemplate oci-cluster-phx-control-plane -o yaml > control-plane-machine-template.yaml

We need to modify the following parameters:

  • spec.template.spec.imageId: Use the previously created custom image OCID

  • metadata.name: Use a new name. For example: oci-cluster-phx-control-plane-v1-23-6

After the fields are modified, we can apply them to the cluster. This action only creates the new machine template. The next step triggers the update.

$ kubectl apply -f control-plane-machine-template.yaml

ocimachinetemplate.infrastructure.cluster.x-k8s.io/oci-cluster-phx-control-plane configured

We now want to tell the KubeadmControlPlane resource about the new machine template and upgrade the version number. Save this patch file as “kubeadm-control-plane-update.yaml.”

spec:
  machineTemplate:
    infrastructureRef:
      name: oci-cluster-phx-control-plane-v1-23-6
  version: v1.23.6

Then apply the patch:

$ kubectl patch --type=merge KubeadmControlPlane oci-cluster-phx-control-plane --patch-file kubeadm-control-plan-update.yaml

This command triggers the rolling update of the control plane. We can watch the progress of the cluster through the clusterctl command.

$ clusterctl describe cluster oci-cluster-phx                                                                                                                   
NAME                                                                READY  SEVERITY  REASON                   SINCE  MESSAGE
Cluster/oci-cluster-phx                                             False  Warning   RollingUpdateInProgress  98s    Rolling 3 replicas with outdated spec (1 replicas up to date)
├─ClusterInfrastructure - OCICluster/oci-cluster-phx                True                                      4h50m
├─ControlPlane - KubeadmControlPlane/oci-cluster-phx-control-plane  False  Warning   RollingUpdateInProgress  98s    Rolling 3 replicas with outdated spec (1 replicas up to date)
│ └─4 Machines...                                                   True                                      9m17s  See oci-cluster-phx-control-plane-ptg4m, oci-cluster-phx-control-plane-sg67j, ...
└─Workers
  └─MachineDeployment/oci-cluster-phx-md-0                          True                                      10m
    └─3 Machines...                                                 True                                      4h44m  See oci-cluster-phx-md-0-8667c8d69-47nh9, oci-cluster-phx-md-0-8667c8d69-5r4zc, ...

We can also see the rolling update starting to happen with new instances being created:

$ kubectl --kubeconfig=oci-cluster-phx.kubeconfig get nodes -A

NAME                                  STATUS     ROLES                  AGE     VERSION
oci-cluster-phx-control-plane-464zs   Ready      control-plane,master   4h40m   v1.22.5
oci-cluster-phx-control-plane-7vdxp   NotReady   control-plane,master   27s     v1.23.6
oci-cluster-phx-control-plane-dhxml   Ready      control-plane,master   4h48m   v1.22.5
oci-cluster-phx-control-plane-dmk8j   Ready      control-plane,master   4h44m   v1.22.5
oci-cluster-phx-md-0-cnrbf            Ready      <none>                 4h44m   v1.22.5
oci-cluster-phx-md-0-hc6fj            Ready      <none>                 4h45m   v1.22.5
oci-cluster-phx-md-0-nc2g9            Ready      <none>                 4h44m   v1.22.5

Before deleting a control plane instance, it cordons and drains as expected:

oci-cluster-phx-control-plane-dmk8j   NotReady,SchedulingDisabled   control-plane,master   4h52m   v1.22.5

This process takes about 15 minutes. When all control plane nodes are upgraded, you can see the new version using kubectl version:

kubectl --kubeconfig=oci-cluster-phx.kubeconfig version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}

Upgrade the worker nodes

After upgrading the control plane nodes, we can now upgrade the worker nodes. First, we need to copy the machine template for the worker nodes:

$ kubectl get ocimachinetemplate oci-cluster-phx-md-0 -o yaml > worker-machine-template.yaml

Then modify the following parameters:

  • spec.template.spec.imageId: Use the previously created custom image OCID.

  • metadata.name: Add a new name. For example: oci-cluster-phx-md-0-v1-23-6

When the fields are modified, we need to apply them to the cluster. As before, this action only creates the new machine template. The next step starts the update.

$ kubectl apply -f worker-machine-template.yaml

ocimachinetemplate.infrastructure.cluster.x-k8s.io/oci-cluster-phx-md-0-v1-23-6 created

We now want to modify the MachineDeployment for the worker nodes with the new resource we just created. Save this patch file as “worker-machine-deployment-update.yaml.”

spec:
  template:
    spec:
      infrastructureRef:
        name: oci-cluster-phx-md-0-v1-23-6
      version: v1.23.6

Then apply the patch, which triggers the rolling update of the machine deployment:

$ kubectl patch --type=merge MachineDeployment oci-cluster-phx-md-0 --patch-file worker-machine-deployment-update.yaml

Again, we can watch the progress of the cluster with the clusterctl command. But unlike the control plane, the machine deployment handles updating the worker machines. clusterctl describe cluster only shows the machine deployment being updated. If you want to watch the rolling update happen with new instances being created, you use the following command:

$ kubectl --kubeconfig=oci-cluster-phx.kubeconfig get nodes -A

...

oci-cluster-phx-md-0-z59t8                   Ready,SchedulingDisabled   <none>                 55m    v1.22.5

oci-cluster-phx-md-0-z59t8                   NotReady,SchedulingDisabled   <none>                 56m     v1.22.5

If you have pods on the worker machines, you can see them getting migrated to the new machines:

$ kubectl --kubeconfig=oci-cluster-phx.kubeconfig get pods

NAME                          READY   STATUS       AGE     NODE
echoserver-55587b4c46-2q5hz   1/1     Terminating  89m    oci-cluster-phx-md-0-z59t8
echoserver-55587b4c46-4x72p   1/1     Running      5m24s  oci-cluster-phx-md-0-v1-23-6-bqs8l
echoserver-55587b4c46-tmj4b   1/1     Running      29s    oci-cluster-phx-md-0-v1-23-6-btjzs
echoserver-55587b4c46-vz7gm   1/1     Running      89m    oci-cluster-phx-md-0-z79bd

After about 10 or 15 minutes, the workers are updated in our example. The more nodes you have, the longer this rolling update takes. You can check the version of all the nodes to confirm:

$ kubectl --kubeconfig=oci-cluster-phx.kubeconfig get nodes -A

NAME                                          STATUS                     ROLES                  AGE     VERSION
oci-cluster-phx-control-plane-v1-23-6-926gx   Ready                      control-plane,master   18m     v1.23.6
oci-cluster-phx-control-plane-v1-23-6-vfp5g   Ready                      control-plane,master   24m     v1.23.6
oci-cluster-phx-control-plane-v1-23-6-vprqc   Ready                      control-plane,master   30m     v1.23.6
oci-cluster-phx-md-0-v1-23-6-bqs8l            Ready                      <none>                 9m58s   v1.23.6
oci-cluster-phx-md-0-v1-23-6-btjzs            Ready                      <none>                 5m37s   v1.23.6
oci-cluster-phx-md-0-v1-23-6-z79bd            Ready                      <none>                 71s     v1.23.6

MachineDeploy strategies

Cluster API offers two MachineDeployment strategies: RollingUpdate and OnDelete. The example we followed uses RollingUpdate. With this strategy, you can modify maxSurge and maxUnavailable. Both the maxSurge and maxUnavailable values can be an absolute number (for example, 5) or a percentage of desired machines (for example, 10%).

The other strategy option is OnDelete. This methods requires you to fully delete an old machine to drive the update. When the machine is fully deleted, the new one comes up.

For more understanding on how the MachineDeployments with Cluster API work, check out the documentation about MachineDeployments.

Conclusion

We created a new image and pushed a rolling upgrade to our cluster’s control plane and worker nodes all by making a few modifications in our configurations. Whether a cluster is small or large, the upgrade process is the same. If that isn’t a selling point for Cluster API, I don’t know what is.

The Cluster API project is growing rapidly with many new features coming. The OCI Cluster API provider team is working hard to bring all the new great features Cluster API has to offer, such as ClusterClass, MachinePools and ManagedClusters.

For updates on the cluster-api-provider-oci, follow the GitHub repo. We’re excited to contribute to this open source project and hope that you are too.