X
  • November 26, 2019

Monitoring OKE with Prometheus and Grafana

Gilson Melo
Senior Principal Product Manager

This post was written by a guest contributor, Paul Jenkins, Senior Principal Product Manager at Oracle

There are various options available when it comes to monitoring Oracle Cloud Container Engine for Kubernetes, also known as OKE. There are various OSS tools like Grafana, Kibana etc and different ways to install the various components - all in the cluster itself or use a central dashboard. It is very easy to install something like Prometheus and Grafana into a cluster and expose the Grafana interface for that cluster - there is a guided example of how to do that on Oracle Cloud Native Labs è Here.

There is also the need to understand what is happening with underlying OCI resources used by an OKE cluster and to enable this there is an OCI data source for Grafana. Again, guidance is available on Oracle Cloud Native Labs Here.

This is great for individual dev/test or small numbers of clusters but in reality customers will either already have or will want a centralised dashboard to monitor both Kubernetes and OCI resources for multiple clusters, possibly in multiple OCI regions. This post will cover the steps required to set ups a central Grafana server to monitor OKE clusters running private worker nodes.

Note:
It is assumed that all necessary pre-reqs for running OKE clusters has been completed here, and that the audience is familiar with both OCI and OKE concepts and operations.

Overview

The main concept is to provide a central Grafana instance that can monitor multiple OKE clusters running private worker nodes. We will use the console quick-create a private OKE cluster using the OCI Console Quick Create feature. This will create the cluster and all required networking. Next we will create a Monitoring VCN with a regional subnet, a Local Peering Gateway along with the required routing and security settings.

The next step is to add a Local peering gateway in the OKE VCN and connect it to the Monitoring gateway and set-up the extra routing and security settings.

Then we will install Prometheus into this cluster and expose the kube-prometheus service so we can access the Prometheus data from a Grafana server running in the Monitoring compartment.

The final steps are to create a Grafana server, add the kube-prometheus service as a datasource and configure a Dashboard.

There following diagram shows the high-level solution:

Figure 1: Solution Overview

Note:
For simplicity, we will use an internet gateway in the worked example. The details of setting up VPN acmes depends on the specific equipment being used on premises. For information on using a VPN connection please see the OCI documentation è Here.

Step by Step

One of the key elements to this solution is making use of the ability to securely peer Virtual Cloud Networks. This allows OKE worker nodes to run in a private sub-net but still be accessible to a Grafana server running in a different dedicated VCN and compartment. The following diagram highlights the key networking elements required in this solution.

Figure 2: Networking Elements

For an overview of local peering refer to the OCI documentation è Here.

Pre-Reqs
Ensure that all OKE pre-requisites are in place è Here

Compartments
Create two compartments, one for the OKE cluster and one for the Grafana server. This example I am using OKE and Monitoring.

Create OKE Cluster
Create a Private OKE cluster in the OKE compartment using the console quick create feature è Here. Select an appropriate instance shape and number as required/allowed within tenancy limits.

Figure 3: Cluster Create 1

This will set up all required networking and node pools for the cluster.

Figure 4: Cluster Create 2

This will take you to the cluster details screen and show when all resources are up and running.

Figure 5: Cluster Created

Make a note of the name of the VCN and the CIDR block (this defaults to 10.0.0.0/16 in quick create)

Create a VCN in the Monitoring Compartment

Figure 6: VCN

Click Create Virtual Cloud Network. Select Create Virtual Cloud Network Only and select a CIDR block of 11.0.0.0/16

Figure 7: Create VCN

Click Create Virtual Cloud Network.

When the network has been created, create a public regional subnet

Figure 8: Create Subnet

Name = Grafana, CIDR Block = 11.0.10.0/24 and select the default Security List

Figure 9: Subnet

Create an Internet Gateway to allow access to the Grafana dashboard.

Figure 10: Internet Gateway

Update the default route table to add an internet gateway destination of 0.0.0.0/0

Figure 11: Route Table

Figure 12: Route Table

Create a Local Peering Gateway - this will be used to connect to the OKE VCN.

Figure 13: Create LPG

Figure 14: Local Peering Gateway

Add a route rule to direct 10.0.0.0/16 traffic to the OKE VCN via the to-oke LPG

Figure 15: Update Route Table

Create Egress Rules for the Monitoring VCN:

  • 0.0.0.0/16 to route all internet traffic
  • 10.0.0.0/16 to route port 9090 (Grafana dashboard) traffic to OKE 

Figure 16: Egress Rules

Create ingress Rules for the Monitoring VCN:

  • Port 3000 on 0.0.0.0/0 to accept Grafana requests from the internet. (This would be changed to suit the VPN CIDR block in a VPN set-up.)

Figure 17: Ingress Rules

Set-Up Additional Networking in OKE Compartment
Select Networking, Virtual Cloud Networks

Figure 18: Select VCN

Select the VCN created in the OKE compartment by the Quick Create.

Figure 19: Select VCN

Figure 20: OKE VCN

Create a Local peering Gateway

Figure 21: Create LGP

Establish a peering connection to the Monitoring gateway

Figure 22: LGP Created

Figure 23: Establish LPG Connection

After a few moments the connection will be established.

Figure 24: LPG Connection stablished

Select route tables for the OKE VCN

Figure 25: Route Tables

Add a route for the monitoring VCN via the local peering gateway from the Loadbalancer subnet. Destination 11.0.0.0/16, Target to-grafana LPG

Figure 26: Route Tables

Add an ingress security rule for the same subnet to accept traffic from the Grafana dashboard on port 9090

Figure 27: Ingress Rules

At this point all required networking to allow access from the monitoring VCN to the OKE VCN via private local peering gateways has been completed. The next steps are to install Prometheus in the OKE cluster and create a Grafana server in the Monitoring Compartment/VCN.

Installing Prometheus

It is assumed that there is a current kubeconfig available è Here to allow kubectl commands to be run against the OKE cluster. There are many examples of how to install Prometheus and Grafana but this is based on the Oracle Cloud Native Labs on Monitoring OKE è Here but with a couple of changes. 

Firstly we need to not install Grafana onto the OKE cluster which is installed by default in the lab and secondly we need to change the Kubernetes-prometheus service to use a private load balancer.

Create a role binding to cluster-admin for your OCI user:

kubectl create clusterrolebinding admin --clusterrole=cluster-admin --user=ocid1.user.oc1..nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

Then following the Lab instructions, add the repo for the prometheus operator:

helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/

You can skip the steps to install tiller as this is selected by default in quick create so we can skip to the step to install the operator into a separate namespace called monitoring:

helm install --namespace monitoring coreos/prometheus-operator

Download a set of values used to install prometheus:

wget https://raw.githubusercontent.com/oracle/cloudnative/master/observability-and-analysis/telemetry/prometheus/values.yaml

Before installing, edit the values.yaml file to not install Grafana on the OKE cluster by changing deployGrafana: True to False

Figure 28: Ingress Rules

Now, install Prometheus to the OKE cluster using the changed yams file.

helm install coreos/kube-prometheus --name kube-prometheus --namespace monitoring --values values.yaml

After a few moments you can see the newly started pods:

kubectl get po --namespace monitoring

The kube-prometheus service with exposes the Prometheus data is by default defined as ClusterIP, this means that the data is not available outside of the cluster. This must be changed to allow external access.

The simplest way to do this is to change the service to NodePort which will make the data accessible on each worker-node. However, this is dependent on each worker-nodes IP address so if we want to be able to withstand node pool upgrades and node failures we need a more consistent way to access the data. In order to do this we must edit the kube-prometheus service to use a private OCI load-balancer. 

kubectl edit svc kube-prometheus -n monitoring

Add a private LB annotation
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/oci-load-balancer-internal: "true"

and change the type to LoadBalancer.

type: LoadBalancer

After a few moments you can check the services by issuing the following:

kubectl get svc -n monitoring

You will see that Kubernetes-prometheus has a type of LoadBalancer with a private (10.0.20.4) IP address in the load balancer subnet.

Figure 29: Monitoring Services

Creating a Grafana Server

We will install a Grafana server in the monitoring compartment/VCN and create a data source as the OKE cluster Prometheus service we added in the previous steps. There are guides available to do this è Here. However, an OCI custom image was created after following these instructions to make it quicker and easier to get up and running. The following instructions make use of this custom image.

Select Compute in the monitoring compartment.

Figure 30: Compute

Select Custom Images.

Figure 31: Compute Images

Click Import Image and select a name - Grafana, OS of Linux, Image Type of OCI and select and object storage URL of https://objectstorage.uk-london-1.oraclecloud.com/p/yNfon08ieUoDnMQRjzArEDR538dF9D6jlkbVK0h3TIg/n/intpaulj/b/Images/o/grafana

NOTE: This pre-authenticated request will be live until 12/31/2020.

Figure 32: Compute Image

Click Import Image. After a few minutes the work request will complete. 

Figure 33: Compute Image Created

Return to the compute dashboard and create a new instance in the monitoring compartment, selecting an appropriate shape, use a Public IP address, use the custom image option to use the image we imported earlier along with a key file to allow SSH access to the instance.

Figure 34: Create Instance 1

Figure 35: Create Instance 2

Figure 36: Create Instance 3

Figure 37: Create Instance 4

Click create. When the image is up and running, make note of it's Public IP address. At this point I would recommend running a yum update to make sure the image has latest fixes etc.

Open up a browser, navigate to http://your.public.ip.adress:3000 and you will see the Grafana Log-in screen.

Figure 38: Grafana Login

Login with User=admin and password=passw0rd. Please change this after logging in. You will see a portal with some existing dashboards. These are included as examples using the OCI data source.

Figure 39: Grafana Dashboard

Click on settings and Add Data source

Figure 40: Add Data Source

Add the details for the kube-prometheus service defined when we set Prometheus on the OKE cluster. (In this case 10.0.20.04:9090).

Figure 41: OKE Data Source

Click "Save & Test". Return to the home screen and select Home in the top Left.

Figure 42: New Dashboard

Figure 43: Import

Enter 10000 as the Grafana.Com Dashboard ID and click Load. This will select a sample Kubernetes monitoring dashboard from Grafana.com. You can browse and select different re-built solutions but this is a very popular dashboard.

Figure 44: Select Source

Then click Import

Figure 45: Import

Add the data source we added above. (OKE Toronto)

Figure 46: Select Data

You will now see a Kubernetes dashboard displaying Prometheus data from your Private OKE cluster.

Figure 47: Prometheus Dashboard

Multiple VCNs

Many installations will have multiple clusters – may dev, test, QA etc – which will all need to be monitored. Multiple VCNs can be peered using Local Peering Gateways but they are a 1-2-1 relationship. So that each VCN to VCN peering will need it’s own dedicated LPG paring.

The following diagrams show and example of how this could be set-up:

Figure 48: Multiple VCN Peering

Multiple Regions

It is also possible to securely and privately peer VCNs across regions using Remote Peering Gateways. This facility can extend the solution to have a single Grafana server monitoring multiple OKE clusters in Multiple OCI regions. The concept is very similar to the solutions above and shown in the following diagram. 

Figure 49: Remote Peering

Please refer to the OCI documentation for more information on Remote Peering è Here.

Multiple Node Pools

Where either performance or security requirements are high, it is also possible to further separate the Prometheus deployment by creating a separate node pool to run in. One way of doping this is to use Label Selectors. This limits any overhead that running Prometheus incurs but has the overhead of increasing the compute required and therefore increased costs for running your cluster. 

Conclusion

At this point Grafana, Prometheus and its UI should be installed, configured and reachable through the external IP address. As explained above the configuration can be adapted to multiple VCNs, multiple nodes in the same region or in different ones. For additional details, review Oracle Cloud Infrastructure (OCI) Container Engine for Kubernetes (OKE) to learn more about how you can seamlessly monitor your OKE clusters alongside the rest of your infrastructure. Want to experience OKE for yourself? Sign up for an OCI account in case you haven't done that yet.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.