It's time to close the loop on Getting Started with Microservices. Part 1 of this series discussed some of the main advantages of microservices, and touched on some areas to consider when working with microservices. Part 2 considered how containers fit into the microservices story. Part 3 looked at some basic patterns and best practices for implementing microservices.
In today’s post, we will examine the critical aspects of using DevOps principles and practices with containerized microservices.
One of the biggest adoption drivers for microservices architectures is release velocity. In order to accomplish high velocity releases it is crucial to have efficient DevOps processes in place. There are a lot of moving bits and pieces in a microservices architecture that need to be considered when implementing those DevOps processes. In this blog post we will cover some of the patterns and processes for containerized microservices. Many of the things described result from experience with customers and our own lessons learned when building out cloud services for Oracle Cloud.
There has been some confusion around the acronyms CI/CD and CD, so before going too much into the processes themselves let’s clarify what these acronyms mean. Table 1 describes each of the acronyms.
|CI – Continuous Integration||Frequent code check-ins to a shared repository with automated builds. This practice is used to catch potential integration issues early in the cycle.|
|CD – Continuous Delivery||Ensure that code is always production ready. The readiness is usually ensured by continuous integration (CI) and advanced testing such as load and/or stress testing. Once declared “ready”, the process of deployment into production systems is done manually by a DevOps person.|
|CD – Continuous Deployment||The same as continuous delivery except that the deployment into production is done automatically by the CI/CD system.|
The DevOps flow really starts with the developer. You have a couple of choices as a developer when working with containers. We won’t cover too many details on this as it is highly dependent on your development preferences and company policies, but it is useful to list the most common choices. Table 2 provides an overview of these.
Build and debug your code locally using your preferred IDE. As part of the CI process the code is compiled, and a container image is created and uploaded to a container registry.
Note: It is really important to version your images as part of the image creation. You’ll learn more about versioning further on in this blog post.
|Development Container||The development container image contains tools and components needed for the application and application diagnostics/testing. The source directory on your development machine is mapped as a volume into the container, and the application is tested by running the container in local container environments such as Docker for Mac. The development container itself is normally only used during development and will not be pushed to production.|
|Application Container||The service is tested by running in a production-like container. This allows the developer to test her code under realistic conditions. Usually you separate development and runtime containers. This set up requires significant additional planning and coordination, such as maintaining a production-like container and making sure that each team member is using the same version, for example.|
Most of the teams within our microservices platform development team at Oracle are using the native development approach as it gives every team the flexibility to continue using the build tools and system of their choice. Some teams use Maven, others Gradle. Even though they use different tools and build systems, every team produces a container image as part of our continuous integration. Another aspect is that every service development team has its own CI/CD pipeline in order to achieve independent deployments. Figure 1 outlines the DevOps flow at a high level that starts with the native development approach.
Figure 1: Microservices DevOps Flow Using Native Development
These are the high level steps of what is happening at each stage of the pipeline.
The last aspect of such a DevOps pipeline is collecting telemetry and diagnostics data once the service(s) has been deployed to a target environment. The data collection should not only contain the “usual” telemetry data, such as resource consumption, errors and warnings, but also end to end traces of requests and in some cases, even usage of functionality. Collecting all the data gives you a better understanding of the service’s runtime behavior but also gives some valuable insights into how your service is being used from an end user perspective, so that you can make product decisions based on data and not based on opinions.
This describes a simplified flow, but it should give you a pretty good idea about the process from service development all the way to making the containerized service available in an environment.
In most companies releases happen once per month maximum, and each release usually requires a lot of planning and release meetings. In a very agile world where companies need to react quickly to bugs, customer feedback or new market requirements, moving slowly is a clear disadvantage. Automation of deployments is a key pillar of microservices DevOps practices, but with a fully automated DevOps pipeline you need to be able to trust that your service meets your quality bar. For that reason, thorough testing is very important in each stage of the release pipeline. Testing is being driven by your specific application requirements, such as SLAs, and it is far beyond the scope of this article to describe all the different ways of testing, test automation and test methodologies for microservices applications. That said, it is worth pointing out the two main approaches. The first one is using a staged environment and the second one is testing in production.
The following demonstrates one example of how the different testing phases can be implemented in staged environments. In fact, the example below describes at a higher level how release and testing are handled for one of our internal services that was designed using a microservices architecture. Figure 2 shows testing stages during the release pipeline.
Figure 2: Release Pipeline Testing Stages
In our case we do automatically promote a release candidate into production; we basically only do continuous delivery. The main reason, besides business decisions, is that we do need to interpret the performance and longevity results which we cannot automate at this point.
If you are interested in more details on how we handle production grade container CI/CD and testing, check out the “Show and Tell Session on Handling Production Grade Containers” that was delivered at Oracle Code.
For many enterprises, automatically deploying into production is a very scary thought. That said, there are some testing techniques that have proven very successful with companies who have mastered continuous deployment.
Rolling updates is a functionality which is usually provided at the platform level in one form or another. At a high level what that means is, that while you deploy a new version of a service to your environment, the system is updating the previous version of the service to the new one, and health checks are being executed. If they fail the deployment is rolled back. The advantage is that while a deployment is in progress users experience very little impact. As mentioned, depending on the platform, you may have several implementations of rolling updates. Kubernetes for example keeps an entire deployment history in the system which allows you to easily roll back to a previous (working) version.
Canary testing is a technique used to deploy a new version of your microservice to a small percentage of your user base to ensure the proper behavior of the new version. For example, you may need to confirm that the integration with other microservices works and that resource consumption factors like CPU, memory, and disk usage are within an expected range.
For example, let’s say there is a new version of the frontend microservice to canary test. The new version of the service gets deployed into the environment and the traffic is split as shown in Figure 3, with 99% of initial traffic going to the current release (v1.0), while 1% of traffic would go to the new release (v1.1). If the new release meets your requirements, then you can increase the percent of traffic incrementally until 100% of traffic is using the new release.
If at any point during the rollout the new release starts failing, you can easily toggle all traffic back to v1.0 without needing to do a rollback and redeployment.
Figure 3: Canary Deployment
Blue-Green deployment is conceptually very similar to canary deployments, except that this approach assumes that you have two production environments, with one of them serving 100% of the traffic. As you prepare a new release of your service, let’s say you do your final stage of testing in the blue environment. Once your tests and telemetry show that your new version is working in the blue environment, you have a router or gateway direct all the traffic to the blue environment and the green one becomes idle. Blue-Green deployments are a very powerful way to roll out a new version but they have certain drawbacks. For example, they require additional infrastructure. That said, while modern platforms such as Kubernetes allow you to implement blue-green deployments without the need of additional clusters, you may want to consider their out of the box capabilities for testing and deploying to production, such as rolling updates and canary releases.
While A/B testing can be combined with either canary or blue-green deployments, it is a very different thing. A/B testing really targets testing the usage behavior of a service or feature and is typically used to validate a hypothesis or to measure two versions of a service or feature and how they stack up against each other in terms of performance, discoverability and usability. A/B testing often leverages feature flags (feature toggles), which allow you to dynamically turn features on and off.
As mentioned before the good news is that many container native platforms such as Kubernetes offer support for testing in production out of the box. If the out of the box functionality does not meet your needs you can also consider using a service mesh such as ISTIO or Linkerd. We’ll discuss service meshes in more detail in our upcoming blog post about a modern API/service centric approach for building microservices.
Lessons Learned and Recommendations
To wrap up this blog post we look at some recommendations resulting from lessons learned when building microservices internally as well as during interactions with customers.
Container versioning can be accomplished by adding tags to an image. By default, when you create a container image using just “docker build”, no versioning is applied and Docker appends the tag “:latest”. This may be very confusing as ‘latest’ does not mean the latest built image, it actually means the latest image without an explicitly defined tag. For that and other versioning benefits you should always use proper container image versioning, e.g. fe-commerce:1.0.0-b21, by adding tags to your image. See the Docker documentation for more information.
That containers are small is a common assumption, but that is not necessarily true. There are large container images out there reaching almost the size of small VMs. Image size matters in many scenarios; a very obvious one is a scale out scenario. Assume your cluster runs at full capacity and you need to add another machine to the cluster (scale out) to serve the load service A is experiencing; in this case, the image needs to be downloaded from the registry before the orchestrator can launch a container for service A. Depending on the latency and size of the image that may take a while, so as a rule you should always consider using small base images such as Oracle Linux 7.1 – slim. This can be partially mitigated by having your container registry very close to the cluster that will host the services.
In many cases the orchestrator reports the status as running for containers while the actual service inside the container is still starting up. If timing at that level is important to you, you can configure a health check URL for the service so that one can definitely conclude that the container and process inside it are up and running. It’s a good practice for health checks to go beyond just checking if it is active and returning 200OK. Health checks should be designed in such a way that they report the actual status of the service from a functionality point of view. If the service is not in a healthy state, the orchestrator can take the appropriate action, such as restarting the container.
Below are some lessons learned from testing.
In part 4 of this blog, we looked at the fundamentals of DevOps with containerized microservices. This concludes the “Introduction to Microservices” blog series. I hope it is now clear that there are a lot of things developers need to know when jumping into a containerized microservices world. But there’s more – we will also be releasing an upcoming blog post where we will share some observations on how we think developers should build not only containerized microservices, but also functions and events on the same platform in an API/service first approach. We call it the grand unified theory of cloud-native development or , so stay tuned!
As always, please provide feedback and comments as we are always seeking to improve.