Behind the Scenes: Building OCI’s next-generation UX with atomic design

December 18, 2023 | 13 minute read
Abishek Murali Mohan
Consulting Member of Technical Staff, OCI Console Team
Vadim Komar
Principal Member of Technical Staff
Text Size 100%:

If you use Oracle Cloud Infrastructure (OCI), you’ve likely navigated its myriad services in the Oracle Cloud Console. The Oracle Cloud Console is the gateway to all OCI services, and the importance of a good customer experience isn’t lost on us. Since its inception in 2015, OCI has grown leaps and bounds from a handful of core services to hundreds of services. We went from single digit regions to more than 50, which included commercial, government, and dedicated global regions. The Console also grew from a single core team to hundreds of federated service teams coming together to build it. In our previous blog post, Scaling UI for hundred of services, we chatted about how we transformed the Console using the microfrontend architecture, which allowed our brilliant service teams to scale the Console and build thousands of UI pages, each with its own flair. 

However, scaling is rarely easy. While we had an architectural solution to scaling, our engineers still grappled with deployment management, our product managers and designers juggled evolving design systems, and our developers found themselves rebuilding core components in silos, sometimes with nuanced variances in functionality. Our development velocity needed some upgrades. To this end, we established core guiding principles for the Console, enabling the following features: 

  • Instantaneous navigation across different OCI services within a unified experience

  • A dynamic user interface that updates in real-time

We then set up constraints that anchored our solutions around the following pillars:

  • Consistency: A user should feel at home on any page in the Console. We strive for a cohesive and accessible experience, even as OCI expands. Every page in the Console should look, feel, and behave consistently. We’re investing in harmonious designs, intuitive workflows, and user-centric features.

  • Performance: UX performance is integral to user satisfaction and productivity. We use the latest in web development technologies and practices to make and keep the Console fast.

  • Reliability: Reliability is the resilience of the Console to errors and allows it to behave predictably in various conditions and edge cases. We strive for zero downtime with a unique serverless architecture and ensure high quality with rigorous QA and proactive error mitigation.

  • Agility: We optimize for low operational overhead and simplicity of development. Processes serve to improve customer experience, instead of bottlenecking innovation. Service teams can operate independently at a cadence suitable for their service, while still building a unified Console.

This blog post showcases how the first two pillars, consistency and performance, helped us achieve our goals for the Console, while future posts focus on reliability and agility. Whether you’re building, scaling, or simply keen on understanding complex web apps, we hope that our learnings offer you insights into the path to frontend scalability.

Consistency

Our goal with the Console was to let customers have access to all OCI without having to jump between discrete tools. This setup only works if all parts of the Console share a similar design philosophy. Our microfrontend architecture gave us the mechanism to achieve this goal by modeling the Console as a single-page web application. Service teams then built plugins for their services that would slot into the Console seamlessly.

In the early stages of federating the Console, we provided service teams with basic UI components like textboxes, sliders, and radio buttons, while leaving it up to the service teams to figure out how to integrate them into a cohesive app. It didn’t take long to realize that this minimal toolkit was insufficient. Our service teams were caught in a cycle of reconstructing components and reinventing interaction patterns, with variations in functionality. To ensure that we met our goal of a unified Console experience, we had to rethink our original hands-off approach.

Building an ecosystem

It was quickly apparent to us that to unify the customer experience, we had to unify our developer experience. If we wanted our developers to build a cohesive Console, we had to make the experience of building it consistent, fast, and error free. We evaluated several competing tools, frameworks, and libraries. For a while, Console was even written entirely in ClojureScript. Eventually, we gravitated to TypeScript, Webpack, and React as the tools powering the Console and its component libraries.

TypeScript’s type system provided the kind of compile-time safety we needed and access to IntelliSense. React’s declarative UI paradigm and virtual document object model (DOM) offer intuitive design and high performance. Features like JSX, state management through Redux, and React hooks fit our development style perfectly. By forcing us to think in terms of declarative UI states, we were able to eliminate a large source of bugs common with imperative frameworks. Combined with Webpack and its numerous plugins (Can you tell we like plugins?), we had a read-evaluate-print loop (REPL) that provided a live coding experience. Developers could see and debug their changes in a browser applied in real time.

The maturity of these tools meant that we had a huge talent pool of passionate engineers to hire from, as well as a rich ecosystem of documentation and libraries to use to get started quickly. Developers can also be productive across codebases because they use similar technologies.

Atomic design

With the ecosystem in place, we addressed the fragmentation in Console. We introduced a new design system: the Mercury Design System (HgDS). Hg is the chemical symbol for Mercury and HgDS, just like the chemical element, is meant to be fluid and reactive. HgDS now powers all of Console and ensures its uniformity. Drawing inspiration from the principles of atomic design, we reimagined our UI construction, conceptualizing it as a hierarchical structure ranging from the simplicity of atoms to the complexity of organisms as shown in figure 1.

  • Atoms: Atoms are the foundational building blocks of our interfaces, encompassing basic HTML elements like form labels, inputs, and buttons. Their functionality ceases if they’re further disassembled. In the HgDS context, atoms are context-agnostic, providing the versatility needed for various implementations.

  • Molecules: This level involves a combination of multiple UI elements that function collectively. Molecules are preconfigured compound components, such as table filters and tagging controls, that maintain visual and functional consistency across different plugins, relieving teams from the burdens of UI micromanagement.

  • Organisms: At this tier, we deal with intricate UI combinations composed of groups of molecules and atoms. Organisms are responsible for forming significant, reusable parts of the interface, establishing a standardized template throughout the Console, such as resource listing pages and detail pages.

This transition to HgDS eliminated redundant efforts in UI design and implementation. UI design is inherently multifaceted, encompassing concepts such as browser support, theming, styling, accessibility, localization, personalization, analytics and much more. By centralizing these aspects, service teams can now concentrate on their primary objectives within their business domain, without the distraction of these complex UX considerations.

 HgDS evolution from atoms to molecules and entire organisms.
Figure 1. HgDS showcasing its evolution from Atoms to Molecules and entire Organisms.

Integrating with OCI

As described in our previous blog, the Console only uses public REST APIs, no special private APIs. The Console has access to the same capabilities as our customers do by CLI or SDK. Those REST APIs are consistent across the surface of the platform and documented using Swagger. We used this consistency in OCI’s REST APIs to simplify our Console developer experience.

In our revamped approach, atoms offer foundational agility: the building blocks of a page. In contrast, molecules and organisms are designed with OCI’s intricacies in mind. They use React hooks to seamlessly integrate with OCI’s public REST APIs. We term these "Savant" components. Savant components are smart and reactive from the start. You never have to refresh a page to fetch the latest data from the backend. Instead, as new data arrives, the Console automatically updates the relevant portions of the page with the latest information.

With Savant components, service teams can integrate with backend APIs with just a few lines of code. By supporting component-level APIs integration out-of-the-box, service teams are relieved of the burden of having to think about the mechanics of API integration. HgDS Savant molecules and organisms take care of the following important questions in a consistent and transparent manner:

  • How does caching work?

  • How does polling work?

  • What should the retry policy be?

  • How to handle unexpected errors

  • How to localize API responses

  • How to fetch bulk data

  • Which urls to use (This is especially relevant because of OCI’s footprint across over 50 regions)

  • How to log API metrics and analytics

Innovating together

For instances when the central HgDS repository lacks needed components, we’ve established an internal open-source model. Here, plugin teams collaborate with Console to develop new components. This collaborative environment—our “component labs”—serves as an incubator for innovation, allowing for the initial testing and refinement of components. Upon meeting maturity benchmarks, these components are promoted to the core UI library, making them accessible to all service teams. Moreover, we instituted a User Experience Review Board, a team of passionate folks charged with maintaining the high standard of new UI in Console. This board rigorously reviews each new component or page to maintain a cohesive environment across the Console.

Performance

In the context of cloud portals, speed is a significant factor. The response time of the Oracle Cloud Console plays a crucial role in shaping a user’s interaction with OCI. Slow or unresponsive interfaces can hinder productivity and collaboration. So, focusing on performance to ensure that you can effectively utilize OCI’s capabilities is critical.

Built for speed

To achieve a performant Console, we used an array of techniques that encompass the state of the art in web development. The following principles are deeply ingrained into Console’s microfrontend architecture and the design of our UI component library.

  • Deferred rendering: Prioritizing the current plugins needs to minimize initial load time

  • Progressive rendering: Introducing content incrementally as API data arrives

  • API parallelization: Expediting processes through concurrent API calls facilitated by HTTP/2

  • Asset caching: Utilizing a geographically dispersed content delivery network (CDN) to expedite asset delivery

  • Data caching: Storing API responses locally to streamline synchronization

Keeping pace

The quest for speed doesn’t end with a swift Console. Maintaining this velocity is another challenge altogether, and one that we’ve met with innovative solutions. We introduced tools that make it very simple for developers to do the right thing, while discouraging bad practices.

One crucial investment was bundle-size reduction. Smaller files translate to faster download and parsing times for browsers, which is why we embraced build optimizations like tree shaking and code splitting. However, to keep bundle sizes small, we introduced build-time tools to scan our codebase for redundancies and prevent check-ins or dependencies from bloating the Console’s bundle size by more than 1 kb.

Another area that we identified for improvement is dependency management. The Console is a complex application, with dozens of dependencies. Initialization can become a bottleneck when juggling these numerous dependencies in code. Our solution: Automate Console initialization to extract maximum parallelism and eliminate accidental slowdowns. as illustrated in figure 2. We allowed developers to describe initialization dependencies and their fine-grained relationships in a declarative manner, which is parsed and run by an orchestration engine.

We have also invested in a robust caching strategy. We cache the results of several API calls critical to Console initialization, such as feature toggles, region subscriptions and user preferences, in browser storage using IndexedDb. By storing the results of these expensive but infrequently changing API calls, we not only improve the Console’s startup performance but also make the Console more resilient to backend failures.

Further, for some of our APIs, we buffer backend updates locally in the browser cache, so users can continue to use the Console and make updates even when a backend service goes down. When the service is restored, any changes that users have made are automatically synced with the backend, as seen in figure 3.

Image depicting Console dependencies are orchestrated to improve performance.
Figure 2: Console dependencies are orchestrated to improve performance.

Developers can declaratively define relationships between resources. The orchestration engine constructs an optimal directed acyclic graph (DAG) of dependencies, which guarantees correctness, efficiency, and predictability during Console initialization. In addition to simplifying dependency management, the DAG constructed by orchestration engine can be statically inspected and debugged for performance bottlenecks.

Image depicting console caching mechanism
Figure 3. Console caching mechanism

When the Console initializes, we first check the browser cache for data from previous sessions. If previous session data exists, the Console bootstraps with the cached data. If not, we wait for the data from the backend synchronously. In addition, we built a leaderless replication mechanism between the browser and the backend. This process allows the user to continue working in the Console even if a backend service goes down, by syncing updates between the browser and the backend. Updates are buffered in the cache and reconciled whenever new data arrives from the service. To enable this feature, we use ETags for concurrency control, a feature that many OCI APIs support out of the box.

Evolving performance standards

Our most extensive investment has been in the automatic generation and collection of a vast trove of metrics and logs. These factors are essential for monitoring client-side performance and availability worldwide. Every page load, interaction, and backend API call is monitored and logged. We track bundle sizes, JavaScript errors, memory usage, API latency, retries, plugin availability, and more, all while ensuring that these processes don’t degrade the user experience, by batching metrics and using background threads for emitting telemetry.

To emit these metrics, Console uses concepts of aspect-oriented programming, a design pattern that serves to abstract crosscutting functions from business logic. We use the performance web APIs in conjunction with the Typescript decorators feature to achieve this goal.

To improve customer experience, with traditional performance metrics like First Meaningful Paint, we collect metrics, such as Time to Interactive, as shown in figure 4. Now, a page becomes useful—a customer can meaningfully interact with the page, and it displays data.

Image depicting waterfall of a typical plugin loading in console and the metrics we capture
Figure 4: Waterfall of a typical plugin loading in the Console and the metrics being captured.
 

Our global CDN, combined with asset caching and bundle size reduction minimizes load (A) time. Automatic dependency management and deferred rendering optimizes initialization and reduces overhead (B). This consequently improves First Meaningful Paint (C). Parallelizing API calls and data caching strategies reduce the impact of blocking API (D) calls. Finally, our reactive component library that inherently supports progressive rendering improves Time to Interactive (E), letting users access and manage their OCI resources quicker.

Performance insights

As we worked on improving Console’s performance, we faced many surprising behaviors that impacted Console’s performance, including the following examples:

  • Browser quirks with IndexedDB: In our performance tracking, we noticed that IndexedDB, where we store security tokens, runs approximately three times slower on Safari than on Firefox or Chrome, which is a problem because every API call from this Console to OCI’s services is signed using the stored tokens. To overcome this issue, we implemented local storage and in-memory caching for data that’s accessed most often, ensuring that speed isn’t compromised.

  • Compression compatibility: Not all browsers are created equal when it comes to file compression. We noticed that some of our customers were downloading uncompressed JavaScript, CSS, and HTML files. The culprit? Their browsers didn’t support gzip. It’s a good reminder to account for varying browser capabilities in web development.

  • The ripple effect of global events: Large-scale events can have surprising side effects, like the Qatar World Cup’s noticeable dip in internet bandwidth across the Middle East. Web performance can sometimes be influenced by events well beyond the digital sphere.

These examples validate our investments in Console telemetry, without which we would never have been able to catch or debug these issues.

Conclusion

This post showcases the Oracle Cloud Console’s continuous evolution along its scaling journey. We transformed the Console from a monolith to the current microfrontend architecture supporting hundreds of services. Through this journey, we have found that architectural solutions are only one facet of scaling. By also acknowledging and focusing on the human element—the developer experience—we equip ourselves with a more holistic approach to creating and scaling user interfaces. Establishing constraints and simplifying the complexities that developers face is a strategic move that directly benefits the end-user experience. Teams have become more proactive in assisting each other, identifying limitations, and enhancing the collective toolset. The advantages also extend beyond collaborative gains. For example, implementing more effective caching mechanisms has improved not only performance but also reliability. Similarly, our commitment to simplicity has resulted in fewer bugs.

In this post, we focused on the pillars of consistency and performance. We’ve shown that fostering a technical ecosystem where innovation isn’t mutually exclusive with user interfaces that are cohesive and fast is possible. Stay tuned for our upcoming posts, where we delve into the remaining pillars of reliability and agility, continuing to share our insights into the realm of frontend scalability.

This blog series highlights the new projects, challenges, and problem-solving OCI engineers are facing in the journey to deliver superior cloud products. You can find similar OCI engineering deep dives as part of Behind the Scenes with OCI Engineering series, featuring talented engineers working across Oracle Cloud Infrastructure.

For more information, see the following resources:

Abishek Murali Mohan

Consulting Member of Technical Staff, OCI Console Team

Abishek Murali Mohan is a Senior Principal Software Engineer at Oracle, with over 12 years of experience. He plays a key role in OCI Console, leading efforts in frontend, backend, and platform engineering. His current focus is on designing next-generation frontend architecture and frameworks that enable Console to scale massively. 

Vadim Komar

Principal Member of Technical Staff

Vadim Komar has worked on various system-level design projects at OCI since 2019. His primary goal has consistently been to enhance user experience, expedite the development process, and mitigate operational challenges without compromising quality. Vadim is currently exploring and incorporating new technologies for faster Data Center builds.


Previous Post

Oracle sovereign cloud solutions: Providing transparent review of data access requests

Next Post


Network sources: A simple yet powerful way to protect OCI resources

Rohit Kaul | 5 min read