Network troubleshooting for OCI Container Engine for Kubernetes clusters

January 9, 2024 | 4 minute read
Greg Verstraeten
Senior Principal Product Manager
Ajay Chhabria
Principal Product Manager
Text Size 100%:

Organizations adopting Kubernetes face many challenges in getting networking configured correctly. Troubleshooting a Kubernetes cluster’s network configuration involves analyzing the cluster components’ requirements, carefully walking through multiple route tables, and scrutinizing network security rules along the multihop network paths. If done manually, it can be error-prone.

Oracle Container Engine for Kubernetes (OKE) provides a Quick Create workflow, a Quick Start experience that automates the creation of a virtual network, a Kubernetes cluster, and nodes. When using this workflow, the networking configuration is done according to the best practices. However, administrators often create clusters in their corporate virtual network. The drawback of manually configuring a virtual network is that any missing security rules, routing rules, or gateways introduce a malfunction in their Kubernetes cluster. Troubleshooting the network configuration of a Kubernetes cluster is labor-intensive and time-consuming.

OKE now provides network path analysis tests to simplify identifying network connectivity issues in Kubernetes clusters.

Identify network connection issues in a Kubernetes cluster

OKE’s network path analysis tests pinpoint connectivity issues and offer actionable insights on the missing configurations, simplifying the troubleshooting process. The network path analysis tests are powered by Oracle Cloud Network Path Analyzer, a powerful tool to troubleshoot endpoint reachability issues in your Oracle Cloud Infrastructure (OCI) networks. Oracle Cloud Network Path Analyzer takes both routing connectivity information and security rules configuration and provides a visualization of the network path. Network Path Analyzer uses Batfish to analyze reachability and identify configuration errors. Batfish is an open source network configuration analysis tool maintained by Intentionet.

For a Kubernetes cluster to work effectively, you must enable and validate many network flows so Kubernetes components can communicate, including the following examples:

  • The Kubernetes API server must communicate on TCP/10250 with the worker nodes.

  • Worker nodes and pods must communicate on TCP/6443 and TCP/12250 with the API server.

  • The Kubernetes API server must access OCI services on TCP/443.

You can find the full list of the required network flows on Network Resource Configuration for Cluster Creation and Deployment, and the following diagram summarizes them.

OKE network diagram

OKE’s network path analysis tests turn this complicated and painful task of validating those network flows into an automated reasoning process. Validate each required network flow by selecting their corresponding test. A discovered path is displayed graphically, including the routing and security policy insights with a bidirectional path analysis between components.

How to run path analysis tests

OKE has added a new resource, called path analyzer tests, to the OKE cluster detail page. The tests are organized by Kubernetes cluster components, such as cluster API, nodes, pods, and load balancers. You can filter the tests by the type of issues you want to test, such as control plane failures or “node not registering.”

Select a network flow you want to test, and select the Launch path analysis button. For example, launch the “Cluster API to Worker Node” test.

Path analysis tests

OKE automatically defines the protocol, port, and IP address of each component test.

“Cluster API to worker node” path anaysis test

The discovered paths are displayed, including the routing status, and the security rule status for both the forward and return path.

Network path analysis discovered paths result

When the network is misconfigured, and communication can’t be established, the Console displays an Unreachable status. Expand the diagram information to reveal a possible reason for network failure.

Failure path example

In this case, the routing and security rules are missing. After fixing the virtual network configuration, you can rerun the test and finally get the Reachable status.

After testing all the required network flows, you have successfully validated that your cluster’s network is well configured, and you are ready to deploy applications in your cluster.

Learn more

You can learn more about OKE’s network path analysis tests with the following resources:

Greg Verstraeten

Senior Principal Product Manager

Ajay Chhabria

Principal Product Manager

Previous Post

Mitratech avoids SLA breaches with OCI Application Performance Monitoring

Avi Huber | 6 min read

Next Post

Azure SQL Database migration to OCI: Resource estimation and migration approach

Alberto Veratelli | 9 min read