Behind the Scenes: Simplifying digital certificate management with one click

March 8, 2024 | 10 minute read
Tony Long
Principal Software Engineer, OCI Security and Identity
Text Size 100%:

Digital certificates are the backbone of internet security, and certificate expiry is among the top causes of outage in the tech industry.
But why? Customers need secure sockets layer/transport layer security (SSL/TLS) certificates to establish trust and identity for secured network communications, and these certificates come prebuilt with an expiry date to enforce security best practices and limit the blast radius if a private key is compromised. When your certificate expires, your customers can’t communicate with your TLS server, leading to disruptions and dissatisfaction.

Certificate expiry is a personal issue for me. Many years ago, I spent countless hours manually toiling over expiring certificates. I logged in to a portal for our internal certificate provider, passed a manually constructed certificate signing request (CSR) to our internal certificate authority (CA) user interface (UI), took the output, and pasted it into the UI for our load balancer. I then repeated this process for each microservice my team owned and in the dozens of regions our software was deployed. A small mistake or not rotating fast enough meant a customer-facing outage.

In a large organization with a multiregional deployment, even knowing where all your certificates are deployed can be a headache. Processes like manually constructing certificate signing requests are a massive drain on resources and a critical business risk, even if you don't account for the security implications of humans like Evil Tony (my favorite imaginary adversary) moving certificates and private keys around by hand. 

Armed with the understanding of customer challenges, Oracle Cloud Infrastructure (OCI) engineers donned their digital armor and embarked on a journey to transform certificate management in the cloud and to ease the burden of its customers. We built an automated, secure and most intuitive certificate management solution, called OCI Certificates.  

The goal was clear: To ensure that our customers never had to think about their certificates after they were set up. With OCI Certificates, customers can quickly request a certificate, deploy it on OCI resources, such as load balancersweb application firewalls (WAFs), and API Gateway, automate renew the certificates and manage the certificate lifecycle centrally seamlessly. 

In this blog post, we talk about how the seamless one click integration of OCI certificates with load balancers alleviates some of the customer challenges with deploying and maintaining the SSL/TLS certificates for their applications and services.

Seamless integration of OCI certificate with OCI services

Have you ever used two products developed by the same company and struggled to make them work together seamlessly? This issue happens too often, especially in the cloud. Often, product integrations feel like different companies developed them, rather than engineers from the same company working on a common goal. Generally, this problem occurs because the engineers that own the two products are organizationally distant from each other, and while we always put our mutual customer first sometimes the priorities of organizations are different. Personally, I've always been passionate about building a cohesive customer experience, and organization structure is generally an afterthought. Multiple project owners might not always collaborate effectively, leading to further challenges in delivering a unified customer experience.

We think about these things a bit differently in OCI. We believe in one cloud platform. Regardless of where an engineer is placed organizationally, we’re all working to build the best cloud we can. Our developers regularly commit code across team and organizational boundaries. With this type of organizational culture, it was only natural that we did things a bit differently to avoid failure modes. I was designated as the primary engineering lead across both products and given access to collaborate effectively with engineering leadership across both organizations. In other words, I was lucky enough to be put in a unique position to view these problems through a different lens as an owner of both products.

As stewards of our joint customer experience, instead of focusing on individual products, we knew that enabling one click  deployment of certificates in the cloud was the best way to set our customers up for success. Acting as an owner of code bases, we tackled the intricate cross-product workflows needed for seamless deployment with ease. We even solved such complex security challenges as one-selection authorization for one cloud resource to use another cloud resource. Adopting a unified product and team mindset allowed two distinct engineering organizations to collaborate and build the most intuitive and cohesive cloud certificates management product in the industry.

Create listener window on Oracle Cloud Console for certificate deployment
Figure 1: One click cerficate deployment on a Load Balancer

Improved certificate usage visibility

With this cross-product integration, a new challenge emerged. Monitoring certificate usage proved to be a complex task. For example, if a cloud customer wants to know which products are using a certificate, it would be a challenging feat in many cloud providers. They would have to look at the configuration for each instance in each product that could possibly use that certificate to verify its usage.

OCI Certificates solved this common issue with a simple yet deceivingly useful pattern: associations. Whenever a certificate is linked to another object, an association is created to document that a particular object is using a particular certificate. Associations aimed to simplify the monitoring process, providing a clearer understanding of certificate usage without burdening customers with intricate monitoring setups. Solving this problem for the customer is as simple as exposing the associations to the customer. However, this association object proves useful in solving a wide variety of customer issues.

Associations resource tracking in the Oracle Cloud Console
Figure 2: Association resource tracking in the Oracle Cloud Console

Reducing operator error

Another class of issue that keeps business owners up at night is human error; specifically, humans unintentionally deleting cloud resources. I have personally fallen victim to this; I was operating on what I believed to be a testing resource. Of course, it's okay to delete my test resource, right? Pager Rings. Oops, I clicked on the wrong object in the console.

OCI Certificates was engineered with these types of problems in mind, and when we thought through this pain point, we couldn't help but remember the association implementation that helped our customers identify which cloud resources were using their certificate. Could these associations also protect our customers from operator errors? As it turns out, yes it can! In order to protect our customers from mistakenly deleting their production certificate, we don't allow deletion of certificate resources that are associated with other cloud resources.

Confirmation window to delete a resource in the Oracle Cloud Console
Figure 3: Associations operational guardrails with a confirmation window for deleting an OCI resource

Simplified and secured authorization

When delivering on OCI Certificates, we believed that if our product wasn’t easy to use, then we weren't accomplishing our goal of making security easy for our customers. We believed a core pillar of this goal was simple deployment of your certificate. In the industry, requiring customers to write an identity and access management (IAM) policy to allow one of their cloud resources to integrate with another of their resources is commonplace. To accomplish this goal, we had to make it appear as if two products were developed by the same company (which they were). On the other hand, having invested years of my career into building security domain expertise I understand the draw of requiring policy for two cloud resource to interact.

This issue was by far the most difficult to tackle because of the security undertones and established industry precedence. However, at OCI, we believe that with the right conviction and a bit of creative engineering, we don't have to choose between ease of use and world class security, even in the cloud. 

The industry standard for this type of access is to require customers to communicate intent to allow their load balancer to use their certificate by a writing an access policy. In this traditional model of authorization, we explicitly write customer intention as a policy document. Not wanting us to be confined to thinking "inside the box," one of our architects challenged the team on why this was the standard and if we could do better for our customers. We began to wonder if we could securely capture customer intention implicitly without requiring them to write the document explicitly. For example, couldn’t our customers communicate their intention to allow their load balancer to read their certificate with the same click as when they connected the load balancer to the certificate?

When thinking of implementation strategy, we once again recalled the association object. The association object already told us which load balancers were using which certificate. When we agreed on the product behavior, the key was allowing a load balancer to read a certificate if the load balancer had an active association with the certificate. 

If the association granted authority, locking down association creations was critical. To accomplish this goal, the team drew on some previous skills: the on-behalf-of model. At a high level, this model requires that, for a load balancer to create an association with a certificate, it must securely prove that it performs this action on behalf of an authorized caller (the customer initiated the request). This means that even if Evil Tony could act as the load balancer service, he still couldn’t read any arbitrary certificate.

Architectural diagram for deployment of OCI Certificates service
Figure 4: OCI Certificates Service high level architecture

Putting it all together: Automatic renewal

With a deep understanding of the problems that businesses face with respect to their certificates, we viewed a fully automated rotation experience as a core requirement for the initial launch of this integrated product offering. Imagine effortlessly staying ahead of certificate expiration worries. That's the promise we deliver with our fully automated rotation experience. 

With our intuitive certificate system, customers specify a renewal configuration when creating their certificate and forget about expiry concerns. Even their other associated OCI resources, such as load balancers, rotate their certificate without any manual intervention. When a renewal occurs, the next renewal is scheduled. An agent operating alongside the load balancer data plane regularly polls for the latest version of the certificate to quickly pick up any new certificate versions. After this process, customer endpoints are serving the newest version of the certificate.

To operate at scale, when checking for certificate renewals, the load balancer data plane calls an internal API allowing it to check for renewals for all the many load balancers and certificates belonging to multiple tenants that the node is responsible for in one API call. This API is served by a horizontally scalable eventually consistent data plane application under the Certificates product. When a new version of a particular certificate is detected, the agent on the load balancer data plane fetches the latest certificate version using the publicly available API offering and installs it to the load balancer. 

The API calls are securely authorized through the OCI authorization service by the presence of a relevant association. For example, a load balancer as a service (LBaaS) might poll for updates for any certificates associated with a load balancer. API Gateway service can poll for updates for any certificates associated with an API Gateway service, and so on. However, when reading an actual certificate and private key material (as opposed to metadata), the authorization scope narrows. For these APIs, the caller must be securely identified as the exact resource referenced in the association to be authorized to read the certificate. In other words, for a particular load balancer to read a certificate in private key, that specific load balancer must have an active association with the certificate. The load balancer services themselves also can’t steal arbitrary customer certificates. 

In this way, OCI manages to offer a deceivingly simple deployment model that scales and is loaded with security and operational guardrails. We take on the hard problems so that our customers don’t have to even know they exist.

Customers might notice a similar pattern when using certificates with API Gateway. As OCI develops its next generation of services, customers might find some things that look familiar.

Workflow for certificate renewal in OCI Certificates Service
Figure 5: Certificate renewal flow in OCI Certificates Service

Conclusion

When OCI was young and had fewer customers, if a service owner came to the operations meeting to report an outage related to an expired certificate, they would bring donuts because it was such a common problem. OCI has grown a lot over the years, to the point where expired certificate outages re no longer a primary concern, internally or for our customers. With a bit of focus and dedicated creative engineering, a cohesive customer experience is possible to build even in the face of tricky confounding variables like microservice architecture in the cloud or important security considerations. Armed with insights into the challenges with certificate management in the cloud, OCI engineered an intuitive solution by seamless integration of OCI Certificates with other OCI resources.

Our journey includes these key takeaways:

  • Seamless integration of OCI certificates with various OCI services, alleviating customer challenges with deploying and maintaining SSL/TLS certificates
  • Improved certificate usage visibility through associations, simplifying monitoring and providing clearer insights into certificate usage
  • Operator error reduction, protecting customers from unintentional deletion of critical resources
  • Simplified and secured authorization, achieved through simplified deployment and intuitive policy enforcement mechanisms
  • Automatic renewal, ensuring customers effortlessly stay ahead of certificate expiration worries with touchless rotation on a predictable schedule

This blog series highlights the new projects, challenges, and problem-solving that OCI engineers are facing in the journey to deliver superior cloud products. You can find similar OCI engineering deep-dives as part of Behind the Scenes with OCI Engineering series, featuring talented engineers working across Oracle Cloud Infrastructure.

For more information, see the following resources:

Tony Long

Principal Software Engineer, OCI Security and Identity

Tony Long is a software engineer who has worked on various security products during his tenure at Oracle Cloud Infrastructure. Tony is currently working on the next generation of data replication at Oracle. He has a B.S. in statistical science from the UCSB (Santa Barbara). His background and interests lie in distributed systems and security problem sets.


Previous Post

Enhance security with new Oracle Linux 8 STIG image in Oracle Cloud Marketplace

Julie Wong | 2 min read

Next Post


Implement the foundation for zero-trust security with the Oracle Cloud enterprise landing zone

Thomas McCloskey | 5 min read