Problems old and new
Traditionally, setting up and managing development environments for teams has been an overlooked cost, in part because the effort is absorbed into the overall cost managing infrastructure. The amount of physical infrastructure that could be used always capped the cost. This limitation has often led to teams, when they finally have the infrastructure capacity, never releasing the product even if the developers don’t need it all the time.
The allocation of virtual machines (VMs) and subnets was in the hands of the system administrators, who could easily limit resource use. As a result, developers couldn’t accidentally disrupt each other through the controls on the networks and virtualization, and if necessary, even restricting things by only allowing VMs with approved configurations.
Cloud has changed that tradition. While a good thing, it hasn’t happened without bringing the following new challenges:
-
Runaway costs, giving organizations and individuals shock bills
-
The old habits of resource hoarding have potentially no boundaries. But the inverse thinking also occurs: If left running, each server only costs cents, euros, yen, and so on. Keep on top of those small expenditures, and the bigger financial position remains healthy.
-
Infrastructure-as-code means that we can mess up infrastructure in seconds and cause IP collisions, changing network rules and creating problems for colleagues. Before, infrastructure teams handled these activities. Unpicking an infrastructure issue isn’t the same as hooking your integrated development environment’s debugger to your running code.
As a consultancy and builder of business solutions and Oracle partner, Capgemini has a vested interest in maximizing the empowerment of the development team. With assurance to our customers, we don’t accidentally use all their credits while we develop their solution. We don’t want to lose the opportunity that cloud offers by putting more of the infrastructure setup in the hands of system admin teams for basic controls to avoid the mentioned risks. We also don’t want to burden experts with spending their days clicking through the cloud UI setting up and applying policies resulting from developers filling in request forms. This process is boring for them, slow things for developers, and ultimately not cost-efficient for the business.
Fortunately, Oracle Cloud Infrastructure (OCI) offers a broad choice of options and capabilities to avoid the pain. This blog shows how the UK Capgemini team is minimizing the work and maximizing individual empowerment and productivity with OCI. OCI has a rich API and observes good API practices by supplying software development kits (SDKs) that help accelerate consumer adoption and supports multiple languages.
OCI and SDKs make things easy
Having the SDK makes developing logic that uses the APIs quick and easy, such as establishing connections, using the environment’s code quality, code completion, documentation, and reflection to allow the API consumption to be done successfully quickly.
The UK Capgemini team has a standing principle of using Python for scripting and automation. While we could be polyglot, it increases the burden on the team, particularly graduate staff, to master multiple languages. Better to master one language and let others come more quickly after. The SDK supports Python, as well as Go, Java, Typescript, JavaScript, .Net, and Ruby.
Our utility has been developed as several Python objects with various functions, revolving around create, read, update, and delete (CRUD) actions for compartments, Identity and Access Management (IAM) users, groups, quotas, budgets, and relationships between these entities. We add a layer of opinion on top of the SDK. But this layer is necessary because quotas and budgets are linked to compartments, users need to belong to groups, and we grant access controls to groups.
Oracle has provided plenty of documentation using the commonly adopted mechanisms for each language. With Python, you can see the SDK documentation in a consumable format using readthedocs.io, which all ties back to GitHub. The examples I’ve favor more basic scenarios, such as configuring a list of values for a predefined tag isn’t yet demonstrated with an example at the time of writing. I occasionally found it helpful to look at the API documentation, which is written differently and can fill any gaps in understanding.
Use case
To illustrate the ease and elegance of a creation operation, let’s look at some of the code.
def create_group (groupname):
global logger, config_props
group_ocid = find(groupname, QRY_CONST.GROUP, "pre create group check")
if (group_ocid == None):
try:
request = oci.identity.models.CreateGroupDetails()
request.compartment_id = config_props[CONFIG_CONST.TENANCY]
request.name = groupname
request.description = app_description
identity = oci.identity.IdentityClient(config_props)
group = identity.create_group(request)
group_ocid = group.data.id
logger.info("Group Id:" + group.data.id)
except oci.exceptions.ServiceError as se:
logger.error ("ERROR - Create Group: ", se)
return group_ocid
First, I call another of my own methods, called find, which builds on a powerful and labor-saving feature. If the group doesn’t exist, we create a details object (request) that takes all the group’s associated values. The details objects embody the data model for the entities.
Next, we create a client object. The SDK has several different client objects, one for each domain or major family of objects, such as identity, quotas, and budget. But identity covers a significant chunk of operations that we’re interested in. The client handles the connection. We only need to provide a dictionary, or hash map, of the specified properties, and the rest is done for us. With the client ready, we can call the appropriate method passing our request object.
Consider the following concepts: Oracle throttles our API calls if we make too many in rapid succession. But the response object includes HTTP status codes, and these codes tell you if it switches from queuing the request (status 202) or rejecting because of throttling (419).
It leads us to the next library feature. Some tasks can take OCI a bit of time. As a result, we want to avoid being throttled or the app unexpectedly times out a connection. Within the library, we can use several ways to make our utility wait until something is completed.
oci.wait_until(identity, identity.get_compartment(compartment_id), 'lifecycle_state', 'ACTIVE')
This call holds the application until it completes the request. In this case, the compartment reaches a lifecycle status of active. You can configure different rules for how the SDK waits or abandons waiting. Now, we can ensure that any processes that depend on a preceding one can work because we can wait on the preceding activity.
Going back to the find function, while the client object can pull back lists of objects, the means to filter is limited, and you need to actively handle result pagination—behaviors that you’d expect for a web UI. But when you often need to perform a query to go from the object name, such as the name of a compartment or user, to the all-critical OCID, we can usually provide information to target the specific object more precisely. We have a far better approach than using the model objects. OCI provides a structured query language that looks and feels like SQL, such as the following example:
query policy resources where displayname = myQuotaPolicy
If you haven’t created more than one policy with the same name, you get the correct OCID, which allows you to interact with that specific entity through the SDK because the query expressions are consistent across the different services. We’ve seen a couple of minor quirks, such as how naming of nested compartments works. We’ve taken the process to the next level of having a data structure of queries that can have only the necessary values added, such as the following example:
query_dictionary = {
QRY_CONST.USER: {QRY_CONST.ALL : "query user resources where inactiveStatus = 0", QRY_CONST.NAMED : "query user resources where displayname = "},
QRY_CONST.COMPARTMENT: {QRY_CONST.ALL : "query compartment resources where inactiveStatus = 0", QRY_CONST.NAMED : "query compartment resources where displayname = "},
QRY_CONST.GROUP: {QRY_CONST.ALL : "query group resources where inactiveStatus = 0", QRY_CONST.NAMED : "query group resources where displayname = "},
As a result, the following code allows us to perform a search for any OCID:
query = query_dictionary[query_type][QRY_CONST.ALL]
if (name != None):
query = query_dictionary[query_type][QRY_CONST.NAMED]+"'" + name + "'"
# print (query)
search_client = oci.resource_search.ResourceSearchClient(config_props)
structured_search = oci.resource_search.models.StructuredSearchDetails(query=query,
type=QRY_CONST.QRY_TYPE,
matching_context_type=oci.resource_search.models.SearchDetails.MATCHING_CONTEXT_TYPE_NONE)
found_resources = search_client.search_resources(structured_search)
The query language can fully exploit the tagging mechanism. As a result, you can easily implement solutions, such as setting tags indicating which services can be started and stopped and when. When the tags are set, we simply run a query to get a list of services to start or stop.
Quotas
This topic brings me back to applying quotas with Python. The most investment in effort for our tool solution is gathering all the quota values for the services available, deciding what quotas to use, and how to apply quotas to individuals or groups. Pulling all the quota options together requires the UI or documentation. The documentation source provides an easier method of retrieving the details. We haven’t yet found a means to retrieve the metadata in an automated manner. Having retrieved the information, we structure it into a JSON file, such as the following fragment:
"quotas": [
{
"description" : "Big Data",
"comment": "",
"deployment_grouping": "individual",
"documentation_url" : "https://docs.oracle.com/en-us/iaas/big-data/doc/get-started-oracle-big-data-service.html#GUID-4C882FA2-F9FB-41B3-872E-3E7E411F63BB__SERVICE-QUOTAS",
"family_name" : "big-data",
"quota": [{"quota_name" : "vm-standard-2-1-ocpu-count", "value" : 1, "apply":true},
{"quota_name" : "vm-standard-2-2-ocpu-count", "value" : 2, "apply":true},
{"quota_name" : "vm-standard-2-4-ocpu-count", "value" : 0, "apply":true},
{"quota_name" : "vm-standard-2-8-ocpu-count", "value" : 0, "apply":true},
{"quota_name" : "vm-standard-2-16-ocpu-count", "value" : 0, "apply":true},
With all the quotas in a single file, we can more quickly set the quotas and run through them setting the values. We can control whether the quota is applied using the apply attribute, so instead of deleting parts of the JSON, we can flag it to not be used. Then our app reads the JSON file, iterates through the structure to construct the quota statements, and applies them accordingly. The JSON file is fully detailed in the GitHub documentation. We also have an apply Boolean value in the structure, so instead of deleting and then needing to bring back deleted records, we can turn on the apply value.
Can’t Oracle’s CLI do a lot of this? Why bother with development at all? The CLI is powerful and works on multiple platforms. But when you work directly with a CLI, you can run into challenges both immediate and subtle.
To analyze the results of one step to get to another, you parse stdout strings to reextract the meaningful values from the message. Why invest that effort if the SDK can do it for you? The more subtle challenge is that the need to parse those responses and produce decision logic leads people to use the utilities available in their native shell. Shells differ across Windows, Mac OS, and the various flavors of Linux. As a result, your tool is naturally less portable. The Python approach is portable, and we believe you can run the script from an IOS device if you want.
Conclusion
If you’re doing configuration tasks that are laborious or need to be repeated many times, automate it. It can take a bit of effort to get going, but the automation pays off. With OCI, automating many of the basic tasks is a breeze when you have a feel for the SDK. Automation through the SDK is both more stable and reliable than using robotic process automation, for example. The OCI UI continues to evolve and change, meaning that RPA scripts are at a greater risk of breaking. Using the CLI can risk fewer portable solutions.
Python is pervasive, often deployed onto our laptops as part of a standard build and preinstalled in many Linux environments. In the Oracle Cloud Infrastructure shell, GIT, Python, and OCI SDKs are all present. So, running these tools only takes a pull from GIT for the script and preconfigured settings. Plug in your credentials and go.
All the code is available on GitHub. Feel free to try it, extend it, and improve it. We only ask that you recognize the code origin and share any changes back.
For more information, see the following resources:
