Author: Lars Schubert, Cloud Enterprise Architecture
6 key takeaways about Infrastructure as Code (IaC):
Regardless of whether you are working in a multi-cloud, hybrid cloud, or build-your-own-private-cloud scenario, being able to provide solution stacks in a repeatable, reliable, and manageable way is key.
The promise of cloud capabilities such as flexibility, scalability, and speed can only be realized if organizations can adopt those capabilities on a business-critical solution stack (not as independent services). A solution stack may consist of edge routers, load balancers, container platforms, storage services, databases combined with respective configurations and security policies applied.
Consoles and dashboards are nice for simple, generic tasks, but do not provide the capabilities needed to operate in such complex environments. This is why automation frameworks have been introduced to abstract from the vendor, product, and solution-specific aspects through extension points where the community or vendors can provide their respective adapter or plugin code.
In typical projects, we see the need of the DevOps team to provide multiple instances of the same platform configuration for evaluation, development, testing, and production purposes.
Manual setups and configurations often end up in inconsistencies for configurations, naming conventions, etc. and are hard to validate and to maintain during the lifecycle. The infrastructure might be changing over the lifecycle (e.g. evaluation in Public Cloud, Development, Test, and Prod in Cloud at Customer).
When infrastructure changes, the way of provisioning usually changes as well, in the same way that cloud providers have different means to provide their resources. The platform setup might be performed repeatedly to evaluate different versions or configurations for a final fit for purpose setup. In general, we see a bunch of situations where we want the same solution instantiated once more BUT just to change this part or that component or the version of A, which requires the adoption of B, and so on.
This is a situation where we usually consider an automation first approach. We are not performing manual tasks to achieve the desired state of a solution stack, but rather writing code to describe either the expected end state or the sequence of tasks that need to be performed. This code is executed against well-known APIs of resource providers. This allows us, whenever a specific setup requires different resources (e.g. compute shapes, storage attachments, platform configurations, infrastructure providers), to re-configure or to destroy and re-create the whole setup with literally the push of a button.
This approach is also referred to as Infrastructure as Code (IaC) which allows describing the target infrastructure – and platform in some cases – through code fragments managed in well-known software configuration management tools (such as git, bitbucket, gitlab). That means that the resources and their respective configurations follow the same development, test, and release strategies as your business application code running on top of those resources.
For one of our projects, we have chosen a combination of Terraform, Ansible, and Python to automate the overall provisioning pipelines for infrastructure, platform solutions, and custom integrations (Figure 1).
This selection has proven to be well suited in the given scenario. Other scenarios might consider a different toolset, especially if there is know-how available and the toolset meets the needs of the new initiative.
Terraform allows abstraction from underlying infrastructure providers through provider plugins.
Ansible is a well-known automation component managing remote machines (e.g. virtual machines) to install and configure additional software across a large number and different types of machines.
Python is used to control the orchestration between infrastructure (Terraform) and platform (Ansible) tasks. Furthermore, the organization was already quite familiar with using Python and Ansible from other engagements.
Figure 1: Example Automation for Kubernetes Platform on Cloud at Customer
A key element when it comes to lifecycle tasks such as restarting services, resizing resources, reconfiguring components and more, is that automation tasks are designed and implemented idempotent. That means regardless of when and how often an automation task is executed, the expected result is always the same.
Following the principles of automation first and Infrastructure as a Code, we have proven to evolve the customized platform solutions incrementally and spin up new versions of a platform solution in hours rather than months. Innovations can be evaluated in parallel and do not affect others unless they are in a well-known and approved state.
From an organizational perspective, we learned that the peer code reviews in the DevOps team and continuous re-validation of the automation code is required to keep the quality and benefits of automation in place. It also guarantees that the knowledge around the applied principles and their implementation is spread across multiple team members, so that no individual will be the single source of knowledge.
Stay tuned for more Cloud Tech Insights from us!