OCI Kubernetes Engine (OKE) now Scales to 5000 Worker Nodes

As organizations increasingly adopt Kubernetes to manage larger and more complex workloads, Oracle Cloud Infrastructure (OCI) Kubernetes Engine (OKE) has stepped up to meet these demands by supporting clusters with up to 5,000 worker nodes on enhanced clusters—whether they are managed or self-managed. This increased capacity is available in commercial regions and is enabled by default for enhanced clusters using the Flannel CNI plugin for pod networking. For clusters leveraging VCN native networking, this capability can be activated by reaching out to OCI. By allowing more worker nodes within a single cluster, OKE enables businesses to deploy larger workloads, optimize resource utilization, and minimize operational overhead. Moreover, consolidating into fewer, larger clusters simplifies security, monitoring, upgrades, and overall management.

This enhanced scalability allows businesses to scale their applications seamlessly while maintaining high performance and reliability. However, managing such large-scale clusters requires careful planning and adherence to best practices to ensure stability and efficiency. Below are some key strategies, based on Oracle’s recommendations, for effectively managing large-scale OKE clusters:

1. Limit Burst Scaling to 10% Increments

Scaling operations—such as adding or removing nodes or pods—can generate significant API traffic in large clusters, potentially leading to rate limits and performance degradation. To avoid overloading the API server, it’s recommended to scale in increments of approximately 10% of the total cluster size. This approach helps ensure smoother scaling processes and minimizes disruptions.

2. Configure FlowSchemas for Optimized Rate-Limiting

In high-load scenarios, non-essential requests can impact the performance of critical operations. By leveraging Kubernetes’ API Priority and Fairness (APF) feature, you can prioritize essential requests while rate-limiting less critical ones. Configuring FlowSchemas helps maintain optimal cluster performance during peak usage.

3. Tune Cluster Add-Ons for Scalability

Default configurations for cluster add-ons like CoreDNS and network plugins may not be ideal for large-scale environments. For instance, adjusting the number of CoreDNS replicas to one per eight nodes can improve cache efficiency and reduce resource consumption. Similarly, configuring network plugins to allow a percentage of nodes to be unavailable during updates ensures smoother rollouts.

These are just a few of the best practices for managing large-scale OKE clusters. For a comprehensive guide, including additional strategies and detailed configurations, explore Oracle’s official documentation on OKE best practices.

Learn More:

Large OKE Cluster Best Practices
Access the OKE Resource Center
Get Started with Oracle Cloud Infrastructure Today with Our Free Trial

OCI Kubernetes Engine (OKE) now Scales to 5000 Worker Nodes

Chip Hwang

Senior Principal Technical Marketing Engineer

Announcing Oracle Dev Tour: Build the Future with AI

Announcing IPv6 Support for OCI Kubernetes Engine (OKE)

OCI Kubernetes Engine (OKE) now Scales to 5000 Worker Nodes

Authors

Chip Hwang

Senior Principal Technical Marketing Engineer

Announcing Oracle Dev Tour: Build the Future with AI

Announcing IPv6 Support for OCI Kubernetes Engine (OKE)