Kubernetes has become the holy grail for enterprises for running their containerized deployments. Oracle offers Oracle Container Engine for Kubernetes (OKE), a fully managed, scalable, and highly available service that you can use to deploy your containerized workloads in Oracle Cloud Infrastructure (OCI). While OKE makes it a breeze to run at scale of containerized workloads, the business requirements that are quintessential for running critical workloads, such as disaster recovery, backup requirements, are contextual to customer needs.
A robust disaster recovery and backup solution must include backing up the cluster metadata definitions and providing a backup of the data that persists in the Kubernetes cluster. While many technologies are available in the marketplace, this guide aims to provide a solution for the backup of OKE clusters using the open source tool, Velero. You can extend the Velero-based solution to achieve disaster recovery and migrate your containerized Kubernetes cluster from other providers to OCI. You can also use Kasten with OKE for backup and disaster recovery use cases, as explained in this blog.
Velero uses Restic to back up persistent volumes. Restic is a lightweight cloud native backup program that the backup industry has widely adopted. Velero creates Kubernetes objects to enable backup and restore, including deployments, Restic DaemonSets, and custom resource definitions.
Install access to the OKE cluster and kubectl locally.
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.0.0.4 Ready node 2d21h v1.18.10 10.0.0.4 Oracle Linux Server 7.9 5.4.17-2036.100.6.1.el7uek.x86_64 docker://19.3.11
10.0.0.5 Ready node 2d21h v1.18.10 10.0.0.5 Oracle Linux Server 7.9 5.4.17-2036.100.6.1.el7uek.x86_64 docker://19.3.11
Depending on your client environment (Linux, Mac, or Windows), install steps vary. You can also initially install Velero from OCI Cloud Shell.
On Mac, you can install Velero with the following command:
brew install velero
Now that Velero is installed locally, Velero can create the appropriate Kubernetes resources as with the following code:
velero install \
--provider aws \
--bucket velero \
--prefix oke \
--use-restic \
--secret-file /Users/xxxx/velero/velero/credentials-velero \
--backup-location-config s3Url=https://tenancyname.compat.objectstorage.region.oraclecloud.com,region=region,s3ForcePathStyle="true" \
--plugins velero/velero-plugin-for-aws:v1.1.0 \
--use-volume-snapshots=false
OCI Object Storage is S3 compliant, and so the AWS S3 references are used by provider and object storage, as shown in the previous code block, for the purposes of backup. This functionality also enables organizations to seamlessly migrate their EKS workloads to OCI. The parameter use-restic enables Velero to use restic to backup persistent volume. On install, Velero creates few Kubernetes resources and they are by default created in velero namespace.
The secret file refers to the credentials that Velero uses to back up to OCI Object Storage bucket. As mentioned in Managing User Credentials, you must generate these credentials as a customer secret key. The user profile that backs up to Object Storage needs the ability to manage the bucket into which the backups are written.
Sample credentials file look like the following block:
aws_access_key_id=40xxxxxxxxxxxxxxxxxxxxxxxxxxxxxa32f8494a
aws_secret_access_key=YyuSZxxxxxxxxxxxxxxxxxxxxxxxxxxxxxRDYzNnv0c=
kubectl get pods --namespace velero -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
restic-2lp99 1/1 Running 2 23h 10.234.0.26 10.0.0.5
restic-6jz9k 1/1 Running 0 23h 10.234.0.137 10.0.0.4
velero-84f5449954-46hnk 1/1 Running 0 23h 10.234.0.136 10.0.0.4
Now that we have enabled Velero, let’s create a simple Nginx deployment that uses persistent volume claims.
Create the storage class (cluster resource).
Create the persistent volume (cluster resource).
Create the namespace where the pod and persistent volume claim (PVC) reside.
Create PVC (namespace scoped).
Create the pod.
To create the storage class, run the following command:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: oci-fss1
provisioner: oracle.com/oci-fss
parameters:
# Insert mount target from the FSS here
mntTargetId: ocid1.mounttarget.oc1.us_ashburn_1.aaaaaa4np2sra5lqmjxw2llqojxwiotboaww25lxxxxxxxxxxxxxiljr
In the storage class, we present the OCI File Storage service mount target to Kubernetes. Read more about File Storage service in the documentation.
To create the required namespace, run the following command:
kubectl create namespace testing
Create the persistent storage by running the following commands:
apiVersion: v1
kind: PersistentVolume
metadata:
name: oke-fsspv1
spec:
storageClassName: oci-fss1
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany
mountOptions:
- nosuid
nfs:
# Replace this with the IP of your FSS file system in OCI
server: 10.0.0.3
# Replace this with the Path of your FSS file system in OCI
path: /testpv
readOnly: false
To create the persistent volume claim, run the following command:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: oke-fsspvc1
spec:
storageClassName: oci-fss1
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Gi
volumeName: oke-fsspv1
You can verify that the pod is running by inputting the following string:
kubectl get pods --namespace testing -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oke-fsspod3 1/1 Running 0 22h 10.234.0.28 10.0.0.5
Let’s verify that the pod is using the PVC and that File Storage is available and mounted in the pod.
kubectl exec -it oke-fsspod3 -n testing -- bash
root@oke-fsspod3:/# mount |egrep -i nfs
10.0.0.3:/testpv on /usr/share/nginx/html type nfs
root@oke-fsspod3:/# cd /usr/share/nginx/html
root@oke-fsspod3:/usr/share/nginx/html# ls -lrt *.dmp|wc -l 65
root@oke-fsspod3:/usr/share/nginx/html# ls -lrt *.dmp|head -5
-rw-r--r--. 1 root root 75853 Feb 8 04:37 randomfile1.dmp
-rw-r--r--. 1 root root 77341 Feb 8 04:37 randomfile2.dmp
-rw-r--r--. 1 root root 76599 Feb 8 04:37 randomfile3.dmp
-rw-r--r--. 1 root root 75066 Feb 8 04:38 randomfile4.dmp
-rw-r--r--. 1 root root 75008 Feb 8 04:38 randomfile5.dmp
Velero expects the pod to be annotated with the volume name. You can add the volume name with the following command:
kubectl -n testing annotate pod/oke-fsspod3 backup.velero.io/backup-volumes=oke-fsspv1
Back up the OKE cluster by issuing the following command:
./velero backup create backup-full-cluster-demo --default-volumes-to-restic=true
Backup request "backup-full-cluster-demo" submitted successfully.
Run `velero backup describe backup-full-cluster-demo` or `velero backup logs backup-full-cluster-demo` for more details.
./velero backup describe backup-full-cluster-demo
Name: backup-full-cluster-demo
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.18.10
velero.io/source-cluster-k8s-major-version=1
By default in the current release, Velero tries to restore with dynamic provisioning of persistent volumes. So, you want to back up the statically created persistent volume separately. You can accomplish this task with the following command:
./velero backup create backup-pv-only-demo --default-volumes-to-restic=true --include-resources pve
Backup request "backup-pv-only-demo" submitted successfully.
Run `velero backup describe backup-pv-only-demo` or `velero backup logs backup-pv-only-demo` for more details.
% ./velero backup describe backup-pv-only-demo --details
Name: backup-pv-only-demo
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.18.10
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=18
Phase: Completed
To create a case for restoring, let’s delete the PVC, pod, namespace, and cluster-scoped persistent volume. Accidental loss, operator error, or disaster recovery can cause this deletion to occur. To verify that the data is restored, let’s also delete the random files that were created in the pod:
kubectl exec -it oke-fsspod3 -n testing -- bash
root@oke-fsspod3:/# cd /usr/share/nginx/html
root@oke-fsspod3:/usr/share/nginx/html#
root@oke-fsspod3:/usr/share/nginx/html# ls -lrt *.dmp |wc -l
65
root@oke-fsspod3:/usr/share/nginx/html# rm *.dmp
root@oke-fsspod3:/usr/share/nginx/html# ls -lrt *.dmp |wc -l
ls: cannot access ’*.dmp’: No such file or directory
0
To delete the pod and associated resources, run the following command:
~ % kubectl delete pod oke-fsspod3 -n testing
pod "oke-fsspod3" deleted
~ % kubectl delete pvc oke-fsspvc1 -n testing
persistentvolumeclaim "oke-fsspvc1" deleted
~ % kubectl delete namespace testing
namespace "testing" deleted
~ % kubectl delete pv oke-fsspv1
persistentvolume "oke-fsspv1" deleted
velero % kubectl get pv
No resources found
The pod and the persistent volume have been removed. Now, we can restore. Let’s check on the existing backups and issue the appropriate restore commands:
% velero % ./velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
backup-full-cluster-demo Completed 0 0 2021-02-07 21:46:23 -0600 CST 28d default
backup-full-cluster-demo-1 Completed 0 0 2021-02-07 21:52:25 -0600 CST 28d default
backup-full-cluster-demo-2 Completed 0 0 2021-02-07 22:44:09 -0600 CST 29d default
backup-pv-demo-1 Completed 0 0 2021-02-07 21:55:08 -0600 CST 29d default
backup-pv-demo-2 Completed 0 0 2021-02-07 22:46:23 -0600 CST 29d default
Restore the persistent volumes with the following command:
velero % ./velero restore create --from-backup backup-pv-demo-2
Restore request "backup-pv-demo-2-20210208215719" submitted successfully.
Run `velero restore describe backup-pv-demo-2-20210208215719` or `velero restore logs backup-pv-demo-2-20210208215719` for more details.
velero % ./velero restore describe backup-pv-demo-2-20210208215719
Name: backup-pv-demo-2-20210208215719
Namespace: velero
Labels:
Annotations:
Phase: Completed
Started: 2021-02-08 21:57:22 -0600 CST
Completed: 2021-02-08 21:57:23 -0600 CST
Backup: backup-pv-demo-2
Namespaces:
Included: all namespaces found in the backup
Excluded:
Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings:
Label selector:
Restore PVs: auto
% kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
oke-fsspv1 100Gi RWX Retain Available testing/oke-fsspvc1 oci-fss1 31s
Let’s restore the cluster now. While we need to restore only one pod, in this example, we issue a full cluster restore to demonstrate cluster restore capability.
% ./velero restore create --from-backup backup-full-cluster-demo-2
Restore request "backup-full-cluster-demo-2-20210208220018" submitted successfully.
Run `velero restore describe backup-full-cluster-demo-2-20210208220018` or `velero restore logs backup-full-cluster-demo-2-20210208220018` for more details.
% ./velero restore describe backup-full-cluster-demo-2-20210208220018
Name: backup-full-cluster-demo-2-20210208220018
Namespace: velero
Labels:
Annotations:
Phase: Completed
Started: 2021-02-08 22:00:20 -0600 CST
Completed: 2021-02-08 22:01:04 -0600 CST
% kubectl get pods --namespace testing -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oke-fsspod3 1/1 Running 0 12m 10.234.0.30 10.0.0.5
% kubectl get pods --namespace testing -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oke-fsspod3 1/1 Running 0 12m 10.234.0.30 10.0.0.5
% kubectl exec -it oke-fsspod3 -n testing -- bash
root@oke-fsspod3:/# cd /usr/share/nginx/html
root@oke-fsspod3:/usr/share/nginx/html# ls -lrt *.dmp |wc -l
65
As verified in the previous block, the pod and the associated persistent volume are restored. We’ve tested with process to work with k8s v1.18.10 and v1.17.13. For more details, refer to the documentation.
This blog post focuses on enabling basic backup and recovery of OKE using persistent volumes. As with most other backup tools, Velero allows you to schedule periodic automated backups. You can also use Velero in the following use cases:
Change Object Storage to a different region to enable regional disaster recovery for your OKE cluster. This process must conform with your organization’s data residency requirements.
Migrate your Kubernetes deployments from other cloud providers to OKE to utilize the performance, security, and price benefits of OCI.
Migrate from your on-premises Kubernetes.
Combining the robustness and scalability of Oracle Cloud Container Engine for Kubernetes with Velero’s disaster recovery capabilities helps organizations realize the production-ready nature of the Kubernetes platform.