Protect your OKE Workloads with DR-as-a-Service (DRaaS) for Oracle Cloud

April 25, 2022 | 9 minute read
Robert Ronan
Principal Product Manager
Text Size 100%:

 

This is a guest blog written by Renuka Bharambe from the Rackware team. 

RackWare SWIFT is a fully automated solution that allows you to enable backup and DR between your OKE setups across regions. SWIFT uses DR policies to plan scheduled backups of your OKE workloads. In the event of any type of extended outage, RackWare SWIFT allows you to failover your workloads into that geographically remote cloud location and be up and running in minutes. 

 

This blog walks you step-by-step through how to get started with SWIFT to protect your OKE workloads.

 

Step 1

Login to your Oracle Infrastructure account and from Oracle Cloud Marketplace, launch RackWare SWIFT instance by selecting the required version and compartment.

 

Step 1

 

Step1-2

 

Provide all the required instance parameters and click on Create button to launch the instance.

 

 

Step 2:

Once the instance is Running, connect to the instance (for additional steps, click here ). Login to the SWIFT server (over SSH) and run the following command to set password for 'admin' user.

 

sudo swiftcli user modify admin --password <password>

 

or contact your Org-Admin to set a password for your user.

 

SWIFT GUI can now be accessed at: https://<public_ip_of_instance>/swift/dashboard/

 

 

Step 3:

Configuring TRAIPOD (Transient Rackware Agent Image POD):

TRAI is essentially an image, which is deployed with the SWIFT product, and a user can register it under their private container registry used under the K8S. This image runs as a POD and Service combination in the user K8S (source as well as target K8S) during syncs.

TRAIPOD image, i.e., TRAI, is built with a ‘docker’ tool. Currently, it is a docker compatible image, though it should run with all known container runtimes seen with the K8S (containerd, docker, etc.).

The TRAI image is deployed with SWIFT. You can find it on SWIFT server at location

/opt/SWIFT/traipod/rackware-trai-docker.tar.gz

Load this image and push it to the required registry. To be able to pull this TRAI image, we need to create secrets in the source and target namespaces. The following command can be used to create the secret.

kubectl create secret docker-registry <secretname> --namespace=<namespacename> --docker-server=<registry-server-name> --docker-username=<username> --docker-password=<password> --docker-email=<email>

We input this TRAIPOD image name/tag and secret name during sync for both source and target.

 

 

Step 4:

On the SWIFT GUI login page, please download and go through the Operations guide and Prerequisites/Installation guide to get a detailed step by step guidance for getting started.

Login using the user and password created in step 2.

Step 4

 

Step 5:

The landing page will be seen as below.

Step 5

 

Step 6:

On the Container Cluster Administration page (Container Clusters tab in the left side panel), click on 'Add' button and add the OKE cluster by selecting Cloud Type as Oracle.

Step 6

 

 

Step6-2Ob

 

Fill in all the required oracle infrastructure account details and click on 'Add'.

 

Step 6-3

Similarly add the target cluster to SWIFT.

 

 

Step 7:

Expand the 'Business Continuity & DR' tab in the left side panel and navigate to DR Policies page.

Step 7

 

Step 8:

In the 'New' DR policy pop-up window, set the required sync type, schedule and email alerts for the policy to be created.

 

Step8ob

 

Step 9:

Once the policy is created, select the policy and click on 'Apply' button to apply the policy to a new replication. Select Start Immediately to start the syncs immediately as per the schedule or Start Later to start DR at any other time.

We can apply the policy to any existing replication or a new replication.

 

Step 9

In case of new replication, provide all the source and target cluster details including control port and data port for both source and target cluster for SWIFT to use.

 

 

Step 10:

The replication will start as per the policy schedule and can be seen on the Application Replications -> All Replications page. 

 

Step 10Step 10-2

 

Step 11:

A successful replication will show replication status as 'Completed'. We have hence performed a successful

Step 11

 

 

Failover and Fallback :

 

The failover operation is the process of switching production to a backup facility (normally your recovery site). That is, when a DR event occurs and the source site is down, the DR side needs to come in picture. In such cases, user may execute the ‘failover’ operation.

Using SWIFT, user may also perform a drill failover, to test if failover will work fine on the occurrence of actual disaster.

A fallback operation is the process of returning production to its original location after a disaster or a scheduled maintenance period.

 

To perform a failover, navigate to the Business Continuity & DR -> DR Policies page, locate the policy associated with the source and click on the failover button:

 

FailOver1

Set the Drill failover checkbox to true to perform a test failover that is a drill failover will not do an actual failover sync.

FailOver2

 

The failover operation will do the following things:

  • For DR policy of type ‘stage1-2’, stage2 sync will be performed if last stage1 sync was successful. If there is no outstanding stage1 to be replicated to DR< then staged syncs will be skipped.
  • For DR policy of type ‘PTS’, PTS sync will be executed. If sync fails, the failover operation will also fail.

After failover is performed, the state of the DR policy will be changed to ‘Failed Over’ and the subsequent scheduling of syncs will be paused.

 

FailOver3

FailOver4

Once the failover is completed, if the source is rebuilt, we can restore the source application with data using ‘fallback’. Click on the below highlighted fallback button for the same.

 

FailOver5

 

FailOver6

 

In case the policy is applied to multiple syncs, we can apply fallback for only selective syncs.

Click on the FallBack button to start the fallback.

 

Ensure the policy is Fallen Back.

 

FailOver7

FailOver8

FailOver9

Backup syncs will continue to be scheduled as per the policy schedule. The policy state will change back to Active after a successful sync post fallback.

With these simple steps, protect your OKE workloads by making them disaster proof with SWIFT’s DRaaS solution.

Conclusion

Below are a few links that you can to help you can explore more about the Rackware and Oracle partnership.

 

 

Robert Ronan

Principal Product Manager


Previous Post

Migrate any OpenShift or Kubernetes workloads to OKE cloud using Migration as a Service

Robert Ronan | 6 min read

Next Post


Kubernetes Service Operator for OCI Streaming — First steps

Fernando Harris | 17 min read