This post was written by a guest contributor, Paul Jenkins, Senior Principal Product Manager at Oracle
There are various options available when it comes to monitoring Oracle Cloud Container Engine for Kubernetes, also known as OKE. There are various OSS tools like Grafana, Kibana etc and different ways to install the various components - all in the cluster itself or use a central dashboard. It is very easy to install something like Prometheus and Grafana into a cluster and expose the Grafana interface for that cluster - there is a guided example of how to do that on Oracle Cloud Native Labs è Here.
There is also the need to understand what is happening with underlying OCI resources used by an OKE cluster and to enable this there is an OCI data source for Grafana. Again, guidance is available on Oracle Cloud Native Labs Here.
This is great for individual dev/test or small numbers of clusters but in reality customers will either already have or will want a centralised dashboard to monitor both Kubernetes and OCI resources for multiple clusters, possibly in multiple OCI regions. This post will cover the steps required to set ups a central Grafana server to monitor OKE clusters running private worker nodes.
It is assumed that all necessary pre-reqs for running OKE clusters has been completed here, and that the audience is familiar with both OCI and OKE concepts and operations.
The main concept is to provide a central Grafana instance that can monitor multiple OKE clusters running private worker nodes. We will use the console quick-create a private OKE cluster using the OCI Console Quick Create feature. This will create the cluster and all required networking. Next we will create a Monitoring VCN with a regional subnet, a Local Peering Gateway along with the required routing and security settings.
The next step is to add a Local peering gateway in the OKE VCN and connect it to the Monitoring gateway and set-up the extra routing and security settings.
Then we will install Prometheus into this cluster and expose the kube-prometheus service so we can access the Prometheus data from a Grafana server running in the Monitoring compartment.
The final steps are to create a Grafana server, add the kube-prometheus service as a datasource and configure a Dashboard.
There following diagram shows the high-level solution:
Figure 1: Solution Overview
For simplicity, we will use an internet gateway in the worked example. The details of setting up VPN acmes depends on the specific equipment being used on premises. For information on using a VPN connection please see the OCI documentation è Here.
One of the key elements to this solution is making use of the ability to securely peer Virtual Cloud Networks. This allows OKE worker nodes to run in a private sub-net but still be accessible to a Grafana server running in a different dedicated VCN and compartment. The following diagram highlights the key networking elements required in this solution.
Figure 2: Networking Elements
For an overview of local peering refer to the OCI documentation è Here.
Ensure that all OKE pre-requisites are in place è Here
Create two compartments, one for the OKE cluster and one for the Grafana server. This example I am using OKE and Monitoring.
Create OKE Cluster
Create a Private OKE cluster in the OKE compartment using the console quick create feature è Here. Select an appropriate instance shape and number as required/allowed within tenancy limits.
Figure 3: Cluster Create 1
This will set up all required networking and node pools for the cluster.
Figure 4: Cluster Create 2
This will take you to the cluster details screen and show when all resources are up and running.
Figure 5: Cluster Created
Make a note of the name of the VCN and the CIDR block (this defaults to 10.0.0.0/16 in quick create)
Create a VCN in the Monitoring Compartment
Figure 6: VCN
Click Create Virtual Cloud Network. Select Create Virtual Cloud Network Only and select a CIDR block of 18.104.22.168/16
Figure 7: Create VCN
Click Create Virtual Cloud Network.
When the network has been created, create a public regional subnet
Figure 8: Create Subnet
Name = Grafana, CIDR Block = 22.214.171.124/24 and select the default Security List
Figure 9: Subnet
Create an Internet Gateway to allow access to the Grafana dashboard.
Figure 10: Internet Gateway
Update the default route table to add an internet gateway destination of 0.0.0.0/0
Figure 11: Route Table
Figure 12: Route Table
Create a Local Peering Gateway - this will be used to connect to the OKE VCN.
Figure 13: Create LPG
Figure 14: Local Peering Gateway
Add a route rule to direct 10.0.0.0/16 traffic to the OKE VCN via the to-oke LPG
Figure 15: Update Route Table
Create Egress Rules for the Monitoring VCN:
Figure 16: Egress Rules
Create ingress Rules for the Monitoring VCN:
Figure 17: Ingress Rules
Set-Up Additional Networking in OKE Compartment
Select Networking, Virtual Cloud Networks
Figure 18: Select VCN
Select the VCN created in the OKE compartment by the Quick Create.
Figure 19: Select VCN
Figure 20: OKE VCN
Create a Local peering Gateway
Figure 21: Create LGP
Establish a peering connection to the Monitoring gateway
Figure 22: LGP Created
Figure 23: Establish LPG Connection
After a few moments the connection will be established.
Figure 24: LPG Connection stablished
Select route tables for the OKE VCN
Figure 25: Route Tables
Add a route for the monitoring VCN via the local peering gateway from the Loadbalancer subnet. Destination 126.96.36.199/16, Target to-grafana LPG
Figure 26: Route Tables
Add an ingress security rule for the same subnet to accept traffic from the Grafana dashboard on port 9090
Figure 27: Ingress Rules
At this point all required networking to allow access from the monitoring VCN to the OKE VCN via private local peering gateways has been completed. The next steps are to install Prometheus in the OKE cluster and create a Grafana server in the Monitoring Compartment/VCN.
It is assumed that there is a current kubeconfig available è Here to allow kubectl commands to be run against the OKE cluster. There are many examples of how to install Prometheus and Grafana but this is based on the Oracle Cloud Native Labs on Monitoring OKE è Here but with a couple of changes.
Firstly we need to not install Grafana onto the OKE cluster which is installed by default in the lab and secondly we need to change the Kubernetes-prometheus service to use a private load balancer.
Create a role binding to cluster-admin for your OCI user:
kubectl create clusterrolebinding admin --clusterrole=cluster-admin --user=ocid1.user.oc1..nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Then following the Lab instructions, add the repo for the prometheus operator:
helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo update
Download a set of values used to install prometheus:
Before installing, edit the values.yaml file to not install Grafana on the OKE cluster by changing deployGrafana: True to False
Figure 28: Ingress Rules
Now, install Prometheus to the OKE cluster using the changed yams file.
helm install kube-prometheus --namespace monitoring stable/prometheus-operator -f values.yaml
After a few moments you can see the newly started pods:
kubectl --namespace monitoring get pods -l "release=kube-prometheus"
The kube-prometheus service with exposes the Prometheus data is by default defined as ClusterIP, this means that the data is not available outside of the cluster. This must be changed to allow external access.
The simplest way to do this is to change the service to NodePort which will make the data accessible on each worker-node. However, this is dependent on each worker-nodes IP address so if we want to be able to withstand node pool upgrades and node failures we need a more consistent way to access the data. In order to do this we must edit the kube-prometheus service to use a private OCI load-balancer.
kubectl edit svc kube-prometheus-prometheus-prometheus -n monitoring
Add a private LB annotation
and change the type to LoadBalancer.
After a few moments you can check the services by issuing the following:
kubectl get svc -n monitoring
You will see that Kubernetes-prometheus has a type of LoadBalancer with a private (10.0.20.4) IP address in the load balancer subnet.
Figure 29: Monitoring Services
We will install a Grafana server in the monitoring compartment/VCN and create a data source as the OKE cluster Prometheus service we added in the previous steps. There are guides available to do this è Here. However, an OCI custom image was created after following these instructions to make it quicker and easier to get up and running. The following instructions make use of this custom image.
Select Compute in the monitoring compartment.
Figure 30: Compute
Select Custom Images.
Figure 31: Compute Images
Click Import Image and select a name - Grafana, OS of Linux, Image Type of OCI and select and object storage URL of https://objectstorage.uk-london-1.oraclecloud.com/p/yNfon08ieUoDnMQRjzArEDR538dF9D6jlkbVK0h3TIg/n/intpaulj/b/Images/o/grafana
NOTE: This pre-authenticated request will be live until 12/31/2020.
Figure 32: Compute Image
Click Import Image. After a few minutes the work request will complete.
Figure 33: Compute Image Created
Return to the compute dashboard and create a new instance in the monitoring compartment, selecting an appropriate shape, use a Public IP address, use the custom image option to use the image we imported earlier along with a key file to allow SSH access to the instance.
Figure 34: Create Instance 1
Figure 35: Create Instance 2
Figure 36: Create Instance 3
Figure 37: Create Instance 4
Click create. When the image is up and running, make note of it's Public IP address. At this point I would recommend running a yum update to make sure the image has latest fixes etc.
Open up a browser, navigate to http://your.public.ip.adress:3000 and you will see the Grafana Log-in screen.
Figure 38: Grafana Login
Login with User=admin and password=passw0rd. Please change this after logging in. You will see a portal with some existing dashboards. These are included as examples using the OCI data source.
Figure 39: Grafana Dashboard
Click on settings and Add Data source
Figure 40: Add Data Source
Add the details for the kube-prometheus service defined when we set Prometheus on the OKE cluster. (In this case 10.0.20.04:9090).
Figure 41: OKE Data Source
Click "Save & Test". Return to the home screen and select Home in the top Left.
Figure 42: New Dashboard
Figure 43: Import
Enter 10000 as the Grafana.Com Dashboard ID and click Load. This will select a sample Kubernetes monitoring dashboard from Grafana.com. You can browse and select different re-built solutions but this is a very popular dashboard.
Figure 44: Select Source
Then click Import
Figure 45: Import
Add the data source we added above. (OKE Toronto)
Figure 46: Select Data
You will now see a Kubernetes dashboard displaying Prometheus data from your Private OKE cluster.
Figure 47: Prometheus Dashboard
Many installations will have multiple clusters – may dev, test, QA etc – which will all need to be monitored. Multiple VCNs can be peered using Local Peering Gateways but they are a 1-2-1 relationship. So that each VCN to VCN peering will need it’s own dedicated LPG paring.
The following diagrams show and example of how this could be set-up:
Figure 48: Multiple VCN Peering
It is also possible to securely and privately peer VCNs across regions using Remote Peering Gateways. This facility can extend the solution to have a single Grafana server monitoring multiple OKE clusters in Multiple OCI regions. The concept is very similar to the solutions above and shown in the following diagram.
Figure 49: Remote Peering
Please refer to the OCI documentation for more information on Remote Peering è Here.
Where either performance or security requirements are high, it is also possible to further separate the Prometheus deployment by creating a separate node pool to run in. One way of doping this is to use Label Selectors. This limits any overhead that running Prometheus incurs but has the overhead of increasing the compute required and therefore increased costs for running your cluster.
At this point Grafana, Prometheus and its UI should be installed, configured and reachable through the external IP address. As explained above the configuration can be adapted to multiple VCNs, multiple nodes in the same region or in different ones. For additional details, review Oracle Cloud Infrastructure (OCI) Container Engine for Kubernetes (OKE) to learn more about how you can seamlessly monitor your OKE clusters alongside the rest of your infrastructure. Want to experience OKE for yourself? Sign up for an OCI account in case you haven't done that yet.