Cloud Native Monitoring and Visualization with Prometheus and Grafana
By Mickey Boxell
Application observability is critical throughout the lifecycle of a project and especially when you are about to enter production. In addition to logging and tracing, many organizations gain insight into their applications by means of time series metric collection. A time series is simply a series of data points indexed according to the time they took place. In this case, those data points are related to application state. We can use this data to determine whether or not software is operating as intended.
Agent-based metrics collection is the traditional standard for monitoring. In a Kubernetes environment, the dynamic nature of containers and schedulers can make this approach complicated and cumbersome. Some key requirements are to make sure no parts of the environment are overlooked, to avoid requiring additional effort as new containers are created, and to capture data from short-lived containers. This is where Prometheus comes in.
Prometheus is widely considered to be the best solution for open source cloud native systems and service monitoring. At its core it is a time series database, but it also includes tools to scrape data applications at regular intervals, a multidimensional data model, a flexible query language, and a built-in alert manager. This is the toolkit that Cloud Computing Native Foundation (CNCF) recommends for container-based infrastructure. It recently became the second project to graduate from the CNCF program.
Prometheus was designed specifically to address the metric collection challenges of a Kubernetes environment. The Prometheus server uses service discovery or static configuration files to discover data targets in your environment. This feature addresses the need to capture data from containers added or removed as the environment scales up and down. Prometheus itself can be scaled through the use of a "global Prometheus" server which federates data from multiple Prometheus deployments.
Prometheus scrapes and stores application data by means of HTTP pull or GET requests. There are libraries available to instrument applications that were not originally designed to expose time series data. Prometheus has the ability to capture short-lived data, such as data created from batch jobs, through the use of a push gateway. It also uses exporters to gather data from third-party systems for which it is difficult to directly capture Prometheus metrics.
Making Use of Your Metrics
How do you make use of your collected metrics? Prometheus offers two great built-in options:
Web UI is used to graph expressions of your collected data.
Alert manager is used to trigger an alert if some condition is found to be true. It routes alerts from the Prometheus server to many standard alerting tools, such as email, PagerDuty, etc.
You can set alerts on prometheus node usage, such as a high memory or CPU threshold to assess when to scale your environment.
Both the Web UI and alert manager leverage PromQL, the Prometheus expression language. This information can also be consumed by external systems using the HTTP API.
Grafana, another CNCF project, is a recommended addition for your monitoring solution. Grafana is a often used to augment Prometheus and other monitoring tools. Its purpose is to model collected data into beautiful and useful dashboards. Grafana consumes the same PromQL API used by the Web UI and alert manager.
Using Grafana addresses the challenge of the complexity of the PromQL. Starting with Grafana's sample dashboards simplifies the data analysis process and empowers business users to create insightful dashboards.
Below is a simplified architecture containing the key components present in a Prometheus and Grafana environment.
For help creating a Kubernetes cluster with Oracle Container Engine for Kubernetes follow the this guide. Prometheus can be easily installed on your Kubernetes cluster by means of Helm, a package manager for Kubernetes. Rather than having to install each component from scratch, you can use a Helm chart to quickly deploy the services on your cluster. Check out this article for more information about how to get Helm set up on your cluster.
There is a Helm chart containing both Prometheus and Grafana available from CoreOS: "coreos/kube-prometheus" - running helm install followed by the chart name will install this in your environment.
Once the installation is complete you will be able to navigate to the Prometheus Web UI:
And also the Grafana dashboard:
Prometheus is a valuable monitoring solution for your Kubernetes environment. When augmented with Grafana it becomes an accessible tool to help developers and business users inspect applications and infrastructure throughout their lifecycles. You may want to take the additional step of deploying a sample application to test out Prometheus and Grafana. Here is a guide for deploying a sample application on OKE. For more details about the deployment process check out this solution. Our guide also includes information about how to create a basic query and chart to gain insight into your application.