X

The latest cloud infrastructure announcements, technical solutions, and enterprise cloud insights.

Four Key Deployment Learnings of Oracle Cloud Guard at Oracle Data Cloud

Daniyal Nadeem
Security Operations Analyst
This is a syndicated post, view the original post here

Oracle Cloud Guard is an OCI service that provides Cloud Security Posture Management (CSPM). This article describes how our team at Oracle Data Cloud (ODC) onboarded Cloud Guard and the great value and visibility this has provided us with across our OCI footprint.

Oracle Data Cloud and Cloud Guard

We at ODC rely on OCI Next-Gen Cloud Infrastructure to run our workloads. OCI provides a range of amazing services at a very competitive cost, which provides great value to our business. Deploying cloud services presents some unique security challenges. To tackle some of these challenges, we looked at adopting Cloud Guard across our OCI estate.

Cloud Guard has provided us with great visibility into our OCI environment. It gives us the ability to detect things as they happen, improving our detection and response times. One of the challenges we had was identifying misconfigurations within our environment. As the teams within ODC work hard to bring new applications and systems into our OCI environment, human or system error can lead to misconfigurations. These, if not identified and dealt with can potentially result in a security incident. Cloud Guard has also allowed us to monitor our IAM service to help ensure key and password management align with our policies and standards.

We have several compartments in our tenancy, each containing a set of resources. Each compartment serves a different purpose in our tenancy, some containing highly elastic Instances whereas others focusing on network resources etc.

How It Works in a Nutshell

Cloud Guard works based on predefined rules which are part of Detector Recipes of which there are 2 types:  OCI Activity and OCI Configuration.

OCI Activity Detectors identify potential problems based on activity. This can be something as simple as a change in the status of an instance to more complex detections such as connections from suspicious IPs.

OCI Configuration Detectors identify configurations that may result in an attacker exploiting them and gaining unauthorised access to systems. These range from IAM configurations to public exposed instances.

These detection recipes are applied to a Target which defines the rough scope of where the detector recipes are applied. A target can be any compartment within your tenancy. You can create a target by going to the targets page in the console.

Targets scope includes all child compartments under them. Therefore, defining “root” as a target means that all compartments under that root are included. That is until/unless another target is defined in the structure and that target is now applied. The detectors for IAM can only trigger if they’re applied to the “root” compartment.

If a detector rule in that recipe is triggered, they create a problem (alert). All the problems, detectors, targets, and their configurations can be queried through the Cloud Guard APIs.

Responders address identified Problems for Targets based on the Detector – for example, by stopping an instance, suspending a user, or making a bucket private.

Challenges We Faced During Onboarding

When we first onboarded Cloud Guard, we enabled all the pre-built detector rules in the tool. This resulted in Cloud Guard detecting over 150,000 problems at the start! Surely our tenancy wasn’t that bad!

We identified that enabling all the default rules without any additional configuration may not be the best way to set up Cloud Guard.  Many of the alerts generated were due to how we had applied Cloud Guard without any context and were not actual security incidents.

Our console after enabling all detection rules

The sheer variety of problems that Cloud Guard detected helped us identify how we could better onboard the service by understanding the types of detections offered and customising them to our needs. Having 150,000+ alerts meant that we could not review every alert, therefore the best course of action was to clear (Resolve) all the alerts temporarily, remove all detector rules, and start again, but this time with a plan.

How We Addressed These Challenges

We wrote a Python script that:

  • Ran a loop that pulled 20 problems at a time
  • Saved them to a JSON file
  • Formatted the file to what the Cloud Guard API accepted as an input
  • Uploaded the OCIDs of those 20 problems with a status change to “RESOLVED” through the Cloud Guard API

The reason why we could only do 20 problems each time is due to a current limitation in the console and API in terms of how many problems can be bulk resolved at once. The Cloud Guard engineering team is improving this API so larger bulk-operations would be allowed in the near future.

Resolved vs Dismissed Problems

I would like to point out a key difference here between “Resolved” and “Dismissed”.

When you “Resolve” a problem, you tell Cloud Guard that the problem has now been fixed. If the same problem were to be detected again by Cloud Guard, it would re-open as a problem.

On the other hand, if you accept the risk of a problem presented by Cloud Guard and would like to silence that problem on that resource, the “Dismiss” feature can be used. This effectively tells Cloud Guard to ignore this problem, even if it reappears on that resource. Cloud Guard will honour that dismissal for up to 90 days before it may reactivate the problem to be addressed.

Always be 100% sure before dismissing a problem – having them peer-reviewed and having an approved process and audit trail will help in managing dismissed problems effectively.

Our console after resolving all the alerts

Detections Through Cloud Guard

Let’s look at a couple of examples where rolling out detection rules selectively across the estate brought us more value than rolling it out across our entire estate all at once.

One of the detections provided by Cloud Guard is to detect a compute instance being terminated. One of our compartments in the tenancy (Compartment X) has hundreds of instances spun up and terminated during the day based on the amount of traffic and load being used on those instances. By default, having that detector rule enabled on that compartment resulted in a lot of noise which presented itself as Cloud Guard problems. One of the steps we took in this instance is to remove this rule from Compartment X.

However, this detector rule brings us value in another compartment (Compartment Y) where we have our mission-critical servers. Any of those servers being terminated without proper change control will most likely be something very unusual and worth investigating. This is one of the customisable features of Cloud Guard, where different targets can contain different sets of detection rules. We can also customise some of the detection rules to trigger based on certain attributes. This feature can also be used to limit the scope of your detector recipes.

We have had some more detections that have brought us value as users of Cloud Guard. We had a few buckets that were public due to misconfigurations. Cloud Guard instantly alerted us about the problems, and we worked with the relevant bucket owners to remediate the issue. We have also detected connections from TOR nodes, where a user using a tor browser accessed our OCI estate and triggered the suspicious IP alert. Again, we contacted the user and were able to resolve the alert.

Cloud Guard has provided us with the ability to detect and respond to problems much faster. The information provided in Cloud Guard problems can simply be presented to the relevant team which provides great context into the alert. This has allowed our team to spend less time manually reviewing each alert, as we can rely on the detection information provided by Cloud Guard to remediate the issues.

Integration

Another great feature that Cloud Guard provides is the ability to integrate Cloud Guard alerts with the OCI Events service. We have created an integration using this feature to get critical alerts to PagerDuty.

This allows our analysts to respond to those critical Cloud Guard problems as soon as they are detected.

There are more custom integrations that can be implemented. A guide for such integrations can be found here.

Future Plans

As Cloud Guard is a fairly new service in OCI, we are excited to have onboarded it at this stage. As with any other service, only using Cloud Guard in your estate will not address all your security concerns in OCI. Having other detection mechanisms like a SIEM will further enhance your detective capabilities in the Cloud.

However, having Cloud Guard running in your tenancy does provide great value as demonstrated above in this article. The Cloud Guard team is working very hard to constantly improve the tool by adding new types of detections and other features which will bring more value to any of its users. To find out more about the different components of Cloud Guard, please visit the official documentation here.

I hope this article was useful in understanding the value of Cloud Guard and our journey of onboarding the service.

In parting, here are some useful best practices that will provide a smoother onboarding experience for Cloud Guard.

Best Practices

  • Know your tenancy
    • Spend some time understanding the role of different compartments within your tenancy and what detection rules provide the maximum value for each of those compartments.
  • Develop process documentation and an onboarding plan
    • This avoids running into issues like the ones described in this article.
  • Develop runbooks as you enable new Detector rules
    • These runbooks can be integrated with your current Incident Response process.
  • Do not roll out all the detections together
    • Gradually roll out new detections and assess the impact of each rollout as you go along.

Visit our website to learn more about Oracle Cloud Guard.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha