Behind the Scenes: Modernized IAM architecture with OCI Resource Principals

Cloud is reshaping the digital landscape. Cloud Zero estimates as of 2023, the global cloud computing market has surpassed $500 billion, doubling by 2028. This growth highlights the cloud’s vast potential, which remains untapped. Working at a cloud service provider (CSP), I’ve seen firsthand how the cloud enables scalability, simplifies maintenance, and lets organizations focus on core goals. Yet, one question remains: why isn’t every organization moving to the cloud?

This year at Cloud World, Oracle Cloud Infrastructure (OCI) showcased just how far cloud technology can push boundaries, announcing their plan to offer zettascale superclusters with up to 131,000 nodes. The sheer scale of this offering highlights unique challenges and opportunities present in the cloud space: ensuring systems can securely and seamlessly work together. Interoperability isn’t just a technical concern; it’s the cornerstone of a truly global and connected cloud ecosystem. While interoperability is a major driving force behind cloud adoption, security still remains a pivotal consideration for businesses contemplating the move. In a world where data flows across platforms and geographies, ensuring that sensitive information is protected is critical. Cloud computing offers incredible agility and scalability, but these benefits come with the responsibility to safeguard data against ever-evolving threats. Major cloud providers, including OCI, are continuously advancing their security protocols to keep pace with this dynamic landscape, ensuring that organizations can operate confidently in the cloud. The challenge, however, lies in striking a balance: security measures must be robust yet simple enough for businesses to implement without creating friction. If security measures are too rigid or complex, they risk becoming vulnerabilities themselves—like when overcomplicated password policies push users to unsafe practices.

Over the course of this blog post, I will give you a glimpse of how OCI engineers have worked to create a right-sized security solution for customers. This solution lets customers fine-tune their security measures and control who can access what. This approach leverages OCI patented Resource Principals (RPs), which helps address identity challenges by giving each cloud resource its own identity. This enables better tracking and auditing of resource interactions, including authentication and authorization calls made by the resource. Finally, we will take a trip through the portfolio of features which OCI have cultivated allowing RPs to be possible.

The rise of resource principals (RP)

Until 2018, OCI customers wrote policies granting services permission to access resources, such as ‘allow service database to access my key.’ While simple, this raised concerns about ensuring keys were accessible only to a specific databases, not the broader service. Same for backups, only the customer’s database—not the entire database service—should write to their storage bucket. This highlights a key cloud challenge: enabling secure system interactions without compromising security.

As applications increasingly communicate with various services like databases backing up to object storage; the challenge becomes: who is talking to whom? Historically, when systems communicated with each other, it relied on sharing credentials or creating artificial ones like usernames and passwords for machines. But this approach, where a machine pretends to be a person, doesn’t hold up in today’s cloud ecosystem. We needed a more sensible solution to secure these interactions.

This necessity led to the introduction of resource principals. RPs provide distinct secure identities for individual resources like virtual machines, databases, functions, or OKE containers – allowing customers to define fine-grained policies such as, ‘Allow my database with a specific ID in my compartment to access my key.’ The goal was not just to improve security boundaries but to make them easier to manage. With RPs each resource only has the permissions it needs. This is where the Resource Principal Session Token (RPST) comes into play: a short-lived token that asserts the identity of a specific resource type, such as ‘I am a resource of type container with ID ocid1.container.111111111111111.’ With RPST, resources can securely make API calls under their own identity, creating a much more logical and secure system of authentication for modern cloud interactions.

RP is the introduction of a new identity that is neither a user “cloud customer” nor a “cloud service”, but rather the cloud resources that customers use. RP allow cloud resources to have their own identities, eliminating the need of using the service account identity. It also supports the resource credential rotation, so if an attacker would steal a resource credential, it would only be useful for a short time. Now the resources have their own identity, which allows managing fleeting resources in a very fine-tuned way. This allows for detailed auditing, reflecting what truly happens behind the scenes and who used the credentials to perform actions. Best of all, customers don’t have to worry about rotating credentials anymore. RP allowed us to build guardrails directly into our services. By fine-tuning policies and granting permissions at the resource level rather than the service level, we avoided the confused deputy problem. This helps ensure that services cannot accidentally or maliciously use one customer’s permissions to benefit another. As engineers, mistakes happen. Having this built-in protection is a huge relief.

Resource principal interview flow
Figure 1: Resource principal interaction flow.

But before rolling out this concept across all OCI resources, we had to start somewhere, and compute instances were the perfect testing ground. This is where instance principals came in. Think of it as the first practical application of resource-based identity.

Instance Principals OCI’s first Resource Principal

When I joined the identity team in 2017, I worked on instance principal identity, enabling compute instances to have their own credentials. This is crucial as it lets customers run processes with specific permissions, avoiding the need to share user credentials or create hard-to-manage service accounts. It also prevents unauthorized access, as credentials tied to machines can no longer impersonate users.

Even when other cloud providers used service accounts instead of regular operator credentials, they introduced other security concerns. How will these credentials be rotated? Will these service accounts be associated with a user account? How will you guarantee service continuity during service account rotation?

So, after just my first week in OCI, I found myself learning to write code in three languages that I had never used before: Java, Python, and Go, while working on two critical services: public key infrastructure (PKI) and identity. It was both exciting and challenging. Fortunately, the code was self-explanatory, and OCI empowers everyone with the tools, processes, and resources they need to reach their full potential and succeed together. This was my first feature, and I knew that a major bank in the USA, one of OCI’s first adopters, was eagerly waiting for the production use. Delivering this feature was a significant milestone for me – not only did I successfully complete my first project in a new company in a short timeframe, but I also contributed to the initial version of RP and set off on a long journey ahead. I received positive feedback from the tech team at the bank, who said, “It just works without me doing anything.”

Initially, every compute instance in OCI was bootstrapped with its own instance credentials: a private key and a public key embedded in a signed certificate. These credentials were automatically rotated every two hours by an agent running on the host. The introduction of instance principals marked a significant step forward, giving each compute instance its own unique identity and allowing permissions to be granted with fine-tuned precision. While instance principals were a success, they only addressed the identity challenges for compute instances and didn’t extend to other cloud resources. This marked the beginning of our RP journey, but more was needed to solve identity management across the entire cloud ecosystem.

Iterating on success with Stacked Resource Principals

After successfully granting fine-grained identities to OCI compute instances, it was a natural step for Oracle to extend RPs to databases. Each Oracle database was given its own identity, enabling secure access to customer vault keys for encryption. This allowed specific permissions to be granted directly to databases, helping ensure secure and precise access.

Here’s where things got a little tricky: the database is hosted on a compute instance (which already has its own instance principal), but the compute instance is owned by the database service, not the customer tenant. So, we could not use the instance principal identity to enable database scenarios in the customer tenant. That’s where we decided to use the stacked identity concept to solve this problem.

We use the compute instance identity as the initial source of trust and “stack” the database’s identity on top of it. Think of it like checking into a hotel. You show your ID to claim your reservation, and the hotel gives you a room key – a separate identifier just for you. Now, the database can use its stacked identity to access the customer vault key and encrypt itself. The best part? The customer can control access to their key, allowing only their database to use it through a simple, fine-grained policy. This permission is specifically assigned to the database instance within their tenancy, rather than to the database service itself. This is how the stacked RP was born. Its innovation and simplicity enabled us to patent it Stacked RP architecture.

Of course, we hit some roadblocks along the way. We first tried to build a new service dedicated explicitly to the database instances. The service is supposed to be up all the time, and whenever a new database needs to access anything in the customer tenancy, the service will facilitate that. Think of it like hiring a hotel staff member just to escort guests to their rooms. It worked, but it was not scalable.

With stacked RPs each database has its identity based on the compute instance where the database is hosted. It uses this compute instance identity to stack another identity and access the customer resource that it needs.

The stacked RP had its limitations; it is a perfect choice for resources like databases that have a dedicated compute instance. To support serverless resources like OCI Functions, it was time to introduce the ephemeral resource type.

A stacked RP is a perfect fit for the database scenario. However, not all cloud resources are permanent like databases, nor do they all have their own dedicated compute instances. So, there was an obvious need to introduce an ephemeral resource concept for resources like OCI Functions, that get initiated, perform a task, and then vanish.

We worked closely with the OCI Functions engineers as the first customer for the ephemeral RP. Using functions resource information, it is assigned a session token with variable time limits ranging from a few minutes to a few hours. This session token is not expected to be renewed, nor does the resource have the mechanism to request a new token. The expectation is that the resource using this session token has a short lifetime, which aligns with the same lifetime of the token, as described in the Ephemeral RP Patent.

GIF of Stacked database resource stacking its database identity on top of compute identity to call OCI APIs.
Figure 2: Stacked database resource stacking its database identity on top of compute identity to call OCI APIs.

Leadership recognized the future of multi-cloud providers catering to same customers, supporting secure communications across providers. Starting with SQL database support in AWS, it expanded to Exadata in Azure and GCP. The challenge was granting identity to resources outside OCI while ensuring secure credential refresh. Long-living RPs addressed these needs, enabling secure interactions and credential rotation.

Long-living RPs have two modes for refreshing session tokens for long-living resources: a push model, where the service is responsible for issuing new session tokens to the resource before the current ones expire (ideally during half of its lifetime), and a pull model, where the resource has its own initial credentials. In this pull model, the resource uses the initial credentials to call its own service and obtains an intermediate token where the service vouches for the resource. This intermediate token contains all the resource information. The resource then uses the initial credentials and the intermediate token to request a session token from identity whenever it does not have a valid token and needs to call OCI APIs. Choosing between the push model and the pull model is left to the service and is based on the connectivity of the resources themselves. The Oracle agent is one of the first services to adopt this long-living RP model. The initial credentials must also be rotated so that any exposed resource credentials will eventually be invalidated, as described in the Long-Living RP Patent.

OCI enables on-prem support with certificate-based resource principals

Many potential cloud customers are not fully on the cloud yet. If we cannot get them to completely onboard on the cloud, let’s support their cloud resources to interact with their on-premises resources. The certificates-based RPs are the solution for these scenarios. The cloud customer uses their own identity to register a new cloud resource with OCI. During this registration process, the on-premises resources undergo an onboarding process and are assigned an OCI certificate. These certificates act as the credentials that the resources will use to request RP session tokens to call other OCI APIs. Certificate-based RP introduced a built security layer to the resource principal. We use certificate expire to tracks resources lifetime and we use certificate data to store resource metadata. The first to utilize this flow is the oracle management agent and WLP agents. WLP agents were introduced to protect our own resources, but some customers requested the same level of protection for their on-premises hardware. RPs have evolved once again to meet our needs in an easy-to-manage and secure way.

Emerging opportunities for nested resource principals with Fusion Apps and Machine Learning Pipelines

Since the introduction of the RP in OCI, more services have been onboarded to it. RP has been used to give an identity to the service data plane objects/resources. Traditionally, a resource has been the smallest unit of identity, obtaining its own RPST token and using it to call other OCI APIs. Now, a new type of resource is being introduced, where the resource is composed of multiple resources – a “parent-resource composed of sub-resources”. The customer will only initiate the creation of the parent RP. The parent resource will then initiate the creation of the sub-resource principals during the bootstrapping of the parent RP. For example, a customer can create a Fusion Pod RP, which is internally composed of multiple RPs, such as database, bucket, and key RPs. Different Fusion Pods with varying feature offerings will be composed of different sets of RPs.

We have always made security and simplicity our highest priority. Having the customer manage all policies and permissions for the subcomponents of a large resource like fusion pod would be a nightmare, requiring them to add a new policy whenever we introduce a new sub-resource. The nested RP addresses this in a straightforward and secure manner by allowing the identity of the parent resource to be shared with the sub-resources.

From the customer’s perspective, they enable their fusion pod to access its own key and write to its own bucket. Internally, we allow Oracle functions within the pod resource to inherit the pod identity and use it to read the key and encrypt the database. We also allow the database in the pod to inherit the pod identity and use it to upload database backup to the bucket – all in a neat, secure and easy way.

As you can see in the following graph the main resource contains multiple sub-resources, the customer only interacts with the parent resources and grants permission to the parent resources. The parent resource decides to allow the sub-resources to inherit these permissions.

GIF of nested Resource Principal bootstrapping process.
Figure 3: Nested RP bootstrapping process.

The introduction of RP Token-based MTLS connection

At its core, cloud technology is about autonomy—navigating both predictable and unpredictable scenarios while ensuring safety. It’s crucial to realize the importance of enhancing connectivity among thousands of services in the cloud. While mTLS with certificates works well for most of the scenarios where entities are recognized by its certificate, the introduction of RP and tokenizing cloud resources introduced a need and an opportunity in our communication pattern. Introduction of our innovative and simple solution RP token-based mTLS connection has fulfilled the need. Where our resources uses its own token to issue itself a self signed certificate that can be used to establish an mTls connection. Even though the certificate is self-signed, it can be chained to the token and the trusted token issuer to establish the needed trust.

hi5

Figure 4: RP token-based MTLS chain of trust.

Conclusion

Oracle’s Resource Principals journey began as a simple solution to compute instance identity. Over time, it evolved into a robust framework for managing diverse resource identities across the cloud. Oracle has pioneered a way to seamlessly balance security with functionality, even as cloud and on-premises landscapes grow increasingly interconnected.

The introduction of features such as stacked RPs, ephemeral identities, and long-living resource models reflects Oracle’s commitment to addressing the unique requirements of varied resource lifetimes and scenarios. These innovations underscore Oracle’s dedication to simplifying complex identity management challenges while empowering customers to control permissions for their resources with fine-grained precision. The RP architecture’s support for multi-cloud scenarios, on-premises integration, and diverse identity needs is a testament to Oracle’s forward-thinking approach to cloud security and interoperability.
Oracle’s RP journey reflects a legacy of continuous improvement and commitment to cloud security—one that will continue to shape the future of identity management across an increasingly interconnected digital landscape.

To learn more about utilizing Resource Principals for a flexible, fine-tuned security cloud experience, see the resource below:

Use Resource Principal to Access Oracle Cloud Infrastructure Resource

Behind the Scenes: Modernized IAM architecture with OCI Resource Principals

The rise of resource principals (RP)

Instance Principals OCI’s first Resource Principal

Iterating on success with Stacked Resource Principals

OCI enables on-prem support with certificate-based resource principals

Emerging opportunities for nested resource principals with Fusion Apps and Machine Learning Pipelines

The introduction of RP Token-based MTLS connection

Conclusion

Ayman Elmenshawy

Consulting Member of Technical Staff

Tackle your sovereignty obligations with Oracle Cloud Infrastructure

Behind the Scenes: OCI's Thousand-Eyed Canary for DevOps Monitoring

Behind the Scenes: Modernized IAM architecture with OCI Resource Principals

The rise of resource principals (RP)

Instance Principals OCI’s first Resource Principal

Iterating on success with Stacked Resource Principals

OCI enables on-prem support with certificate-based resource principals

Emerging opportunities for nested resource principals with Fusion Apps and Machine Learning Pipelines

The introduction of RP Token-based MTLS connection

Conclusion

Authors

Ayman Elmenshawy

Consulting Member of Technical Staff

Tackle your sovereignty obligations with Oracle Cloud Infrastructure

Behind the Scenes: OCI's Thousand-Eyed Canary for DevOps Monitoring