Capturing logs in Kubernetes
Do you ever wonder how to capture logs for a container native solution running on Kubernetes? Containers are frequently created, deleted, and crash, pods fail, and nodes die, which makes it a challenge to preserve log data for future analysis. Application and system logs are critical to diagnosing and addressing problems impacting the health of your cluster, but there is a good chance you will run into hairy problems associated with the dynamic nature of containers and schedulers.
The simplest option is to avoid logging altogether. However, this comes with an obvious cost: you will have little-to-no understanding of what is going on in your cluster. If this is a problem you need to solve, which is the case for just about everyone deploying cloud native solutions, then read on.
So how do we do this? When using Kubernetes for your container orchestration platform, there are a couple of built-in options available to grab logs: Docker logs and Kubernetes logging. These options are good tools to use in small-scale environments, such as development, however they are difficult to scale - users would need to log into each container to view the logs - and they do not address storing logs somewhere independent of ephemeral nodes, pods, or containers.
Another option would be to purchase $$$ an enterprise tool. There are many paid services available, but most have more features than typical users might need and the cost can outweigh the benefits offered.
What about a choice that solves the problem without breaking the bank? Something that adds instrumentation on top of the built-in options? The option many people turn to is the EFK stack - a scalable open source solution used to capture and aggregate logs and then visualize them in order to provide actionable business insights.
What is EFK?
The EFK stack is composed of Elasticsearch, FluentD, Kibana. This is similar to the ELK stack, which swaps Logstash with FluentD as its log collector. I have chosen EFK rather than ELK because FluentD is a Cloud Native Computing Foundation project that is simple to configure and has a smaller memory footprint than Logstash.
Together these tools form a centralized, scalable, flexible, and easy to use log collection system for IT environments. FluentD captures the logs from each microservice and forwards them onto Elasticsearch, which addresses the issue of preserving logs after the end of a service lifecycle. FluentD is deployed as a DaemonSet, which means that the pod will be provisioned on every node of the cluster, which addresses the need for easily deploying the solution at scale. Kibana will then be used to visualize the aggregated logs.
This is an example of a basic three-node Kubernetes cluster running EFK with a single microservice operating on each node:
For help creating a Kubernetes cluster with Oracle Kubernetes Engine follow the this guide. After your cluster is up and running, the process of spinning up the EFK stack on Kubernetes is simple thanks to Helm, a package manager for Kubernetes. Rather than having to install each component from scratch, you can use a Helm package, known as a chart, to quickly deploy the services on your cluster. Check out this article for more information about how to get Helm set up on your cluster.
There are stable Helm charts available for each of the three components, but our quickstart documentation includes a link to our custom chart that further simplifies deployment by combining Elasticseach, FluentD, and Kibana into one chart.
After a few minutes of the cluster launching things, you will be able to navigate to the Kibana interface.
You may want to take the additional step of deploying a sample application to test out Kibana. Here is a guide for deploying a sample application on OKE. Our guide also includes information about how to create a basic index pattern to gain insight into your application.
You should now have a running deployment of Elasticsearch to store logs, FluentD to format and forward them along, and Kibana to visualize and make sense of them. We hope this means you are one step closer to debugging all of those nasty production bugs!
For more details about the deployment process check out this solution.