Private Clouds, the Evolutionary Approach
By mrbill on Sep 30, 2009
Continuing with the ramblings of my last entry, since I am up late with children and their dozens of excuses as to why they are not asleep...
Now that we have defined the "ilities" that we want from our private cloud efforts, we can examine each of them and look for obvious opportunities with high returns. People cost, IT CAPEX, IT OPEX, energy costs, reducing operational complexities, improving service levels, reducing risk, and any other opportunities that we can target and quantify. One major rule here is that when we pick a target and an approach, we must also have a "SMART" set of goals in place.
For the .21% of the readers who have never heard the SMART acronym before, it stands for Specific, Measurable, Attainable, Realistic, and Timely. In other words, for every action that we plan to take, or every improvement that we want to deploy, we must have a measurable set of criteria for success. It amazes me how many IT managers do not know the "average utilization of server systems in the data layer during peak shift". Yeah, that is pretty darn specific, but ask yourself, do you know what your company's utilization is during the prime workday cycles? Bingo. We need a baseline for whatever metrics we choose to measure success for each project and change to our IT operations.
Sidenote: If the answer to the previous question was "Yes", and the utilization is anywhere above 30% during workday peak shift hours, I am impressed.
So where are the obvious targets? I have already hit on one of them, system utilization and idle processing cycles. Systems consume electricity and generate heat (Those servers are actually very efficient space heaters), resulting in cooling requirements and air circulation requirements, and odds are that a majority of the processing potential is not being used for processing.
Consolidation? Maybe. Capacity Planning? Definitely. Capacity Management? Absolutely! Consolidation is a valid target project, but is usually approached as a one-time, isolated event. Consolidation does not necessarily change the behavior that caused over-sizing to begin with, or help when workloads are seasonal or sporadic. These variable workloads most often result in systems that are sized for "peak load", with lots of idle cycles during off hours and off-days (and sometimes off-months).
The first step to a consolidation is capacity planning, including the key step of generating a baseline of capacity and consumption. If, instead of treating this as a one time event, we start monitoring, reporting, and trending on capacity and consumption, we have now stepped into the realm of Capacity Management. We can watch business cycles, transactional trends, traffic patterns, and system loads and project the processing needs in advance of growth and demands. What a concept.
Now imagine a world where we could dynamically allocate CPU resources on-demand, juggle workloads between systems with little or no downtime, and use systems of differing capacity to service workloads with differing demands. Wow. That sounds like one of those "ilities" that we were promised with that "Cloud" concept. Dynamic resource allocation and resource sharing, possibly with multi-tenancy to maximize utilization of compute resources. Yep. Sure is. Ignoring the "Cloud" word, let's look at how we can implement this "Cloud-like capability" into our existing IT environment without bringing in a forklift to replace all of our systems and networks, and spending billions.
Breaking down those technology pieces necessary to execute against that plan, we need Capacity Management (TeamQuest, BMC, Tivoli, pick your tool that does capacity and service level management). The tool doesn't matter. The process, the knowledge generated, and the proactive view of the business matter. Caveat: Define your needs and goals \*before\* buying tools that you will never fully implement or utilize!
So now we know what our hour-by-hour, day-by-day needs are, and can recognize and trend consumption. We can even start to predict consumption and run some "what if" scenarios. The next step is dynamic capacity, which in this context, includes "Resource Sharing, Dynamic Allocation, Physical Abstraction (maybe), Automation (hopefully, to some degree), and Multi-Tenancy from our right hand "Business Drivers" column from my last blahg entry. Sure, we can juggle and migrate these workloads and systems by hand, but the complexity and risk of moving those applications around is ridiculous. We need a layer of physical abstraction in order to move workloads around, and stop thinking of "systems" as a box running an application.
There are many ways to do this, so pick the solution and products that best fit your IT world. You can create "application containers", or standard operating environments for your applications, and juggle the "personalities" running in the physical machines. Not easy. Most apps will likely not move in easily. Still a good goal to reduce variance and complexity in your environment. In this case, not a quick hit, as you will end up touching and changing most of your applications.
The obvious answer (to me and 99.6% of the geeks reading this) is to employ virtualization to de-couple the application from the operating environment, and the operating environment from the physical hardware (and network, and storage). Solaris Containers, LDOMs, VMware, Xen, xVM software in OpenSolaris, Citrix, fast deployment and management tools, the options and combinations are all over the map. The deciding factors will be cost, capabilities, management tools (monitoring, reporting, and intelligence), and support of your operational and application needs. The right answer is very often a combination of several technology pieces, with a unifying strategy to accomplish the technical and business goals within the contraints of your business. There are many of us geeky types that can help to define the technology pieces to accomplish business goals. Defining those business goals, drivers, and constraints is the hard part, and must be done in IT, "the business", and across the corporate organization that will be impacted and serviced.
There, we have some significant pieces of the "private cloud" puzzle in place, and if the server systems were severely under-utilized, and we were able to move a significant number of them into our new "managed, dynamic capacity" environment, we should be able to realize power, cooling, and perhaps even license cost savings to balance the cost of implementation. One interesting note here, if I have "too many servers with too many idle cycles" in my datacenter, why should a vendor come in leading with a new rack full of new servers? Just wondering. Personally, I would prefer to invest in a strategy, develop a plan, identify my needs and the metrics that I would like improved, and then, maybe, invest in technology towards those goals.
Just the late night ramblings of an old IT guy.
Next entry will likely talk more about the metrics of "how much are we saving", and get back to those SMART goals.