The Fall and Rise of IT: Part 1

Here's a collection of charts, graphs, and images that provide insight into the abyss of the typical datacenter operation. It's scary out there, when we apply benchmarks used to measure utilization, efficiency, and contribution from other part of the business.

But there is hope. For example, just this month Sun released a valuable and comprehensive (and free) BluePrint book called "Operations Management Capabilities Model". We've been working on this one for some time - so check it out. In addition, you can sign up (for free) with our SunTONE Program for self-assessment guides and self-remediation activities related to our ITIL-plus Certification program. It is based on, but extends ITIL. Thousands of companies are registered. We'll help if you'd like. Finally, the Service-Optimized DataCenter program will act as a Center of Excellence for putting these concepts into practice along with innovative new technologies in virtualization, provisioning, automation, and optimization, and other best practices. As you read about the state of IT below, realize that there is an escape from the pit of mediocrity. Part 2 will explore the oppty.

For now, for this post, I'll survey some of the problems that need fixing...

Let's assume that the prime directive for a datacenter is simply to: Deliver IT Services that meet desired Service Level Objectives at a competitive cost point. There are all kinds of important functions that fall within those large buckets [Service Level and Financial Mgmt], but that'll work for this discussion.

In my experience working with customers, there are two primary barriers that prevent a datacenter from being as successful as it might be in this mission. First, there is rampant unmanaged complexity. Second, most IT activities are reactive in nature... triggered by unanticipated events and often initiated by unsatisfied customer calls. The result: expensive services that can't meet expectations. Which is the exact opposite of the what an IT shop should deliver!

Here are some related graphics (with comments following each graphic):

This illustrates the typical "silo" or "stovepipe" deployment strategy. A customer or business unit wants a new IT service developed and deployed. They might help pick their favorite piece parts and IT builds/integrates the unique production environment for this application or service. There is often a related development and test stovepipe for this application, and maybe even a DR (disaster recovery) stovepipe at another site. That's up to four "n"-tier environments per app, with each app silo running different S/W stacks, different firmware, different patches, different middleware, etc, etc. Each a science experiment and someone's pet project.

Standish, Meta, Gartner, and others describe the fact that ~40% of all major IT initiatives that are funded and staffed are eventually canceled before they are ever delivered! And of those delivered, half never recover their costs. Overall, 80% of all major initiatives do not deliver to promise (either canceled, late, over budget, or simply don't meet expectation). Part of the reason (there are many reasons) for this failure rate is the one-off stovepipe mentality. Other reasons are a lack of clear business alignment, requirements, and criteria for success.

This is a interesting quote from a systems vendor. While 200M IT workers seems absurd, it describes the impact of accelerating complexity and the obvious need to manage that process. We saw the way stovepipe deployment drives complexity. We're seeing increasing demand for services (meaning more stovepipes), each with increasing service level expectations (meaning more complex designs in each stovepipe), each with increasing rates of change (meaning lots of manual adjustments in each stovepipe), each with with increasing numbers of (virtual) devices to manage, each built from an increasing selection of component choices. The net result is that each stovepipe looks nothing like the previous or next IT project. Every app lives in a one-off custom creation.

If all this complexity isn't bad enough, as if to add insult to injury, each of these silos averages less than 10% utilization. Think about that.... say you commit $5million to build out your own stovepipe for an ERP service. You will leave $4.5M on the floor running idle! That would be unacceptable in just about any other facet of your business. Taken together, high complexity (lots of people, unmet SLOs) and low utilization rates (more equip, space, etc) drive cost through the roof! If we could apply techniques to increase average utilization to even 40% (and provide fault and security isolation), we could potentially eliminate the need for 75% of the deployed equip and related overhead (or at least delay further  acquisitions, or find new ways to leverage the resources).

We've seen what complexity and utilization does to cost... But the other IT mandate is to deliver reliable IT services. This graphic summarizes a few studies performed by IEEE, Oracle, and Sun as to the root cause of service outages. In the past, ~60% of all outages were planned/scheduled, and 40% were the really bad kind - unplanned. Thankfully, new features like live OS upgrades and patches and backups and dynamic H/W reconfigurations are starting to dramatically reduce the need for scheduled outages. But we've got to deal with the unplanned outages that always seem to happen at the worst times. Gartner explains that 80% of unplanned outages are due to unskilled and/or unmotivated people making mistakes or executing poorly documented and undisciplined processes. In theory, we can fix this with training and discipline. But since each stovepipe has its own set of unique operational requirements and processes, it nearly impossible to implement consistent policies and procedures across operations.

So it isn't surprising, then, that Gartner has found that 84% of datacenters are operating in the basement in terms of Operational Maturity... Either in Chaotic or Reactive modes.

Okay... enough. I know I didn't paint a very pretty picture. The good news is that most firms recognize these problems and are starting to work at  simplifying and standardizing their operations. In Part 2, I'll provide some ideas on where to start and how to achieve high-return results.


Interesting graphs. Cost factors associated with highly customized datacenters seem very obvious to me, so I often wonder why I have such a tough time explaining all this to IT management (I'm a consultant myself). In fact I've done several projects at customer sites where I was asked to do symptomatic treatment of such complexity (this usually comes down to installing some monitoring tools like OpenView so IT slaves can get paged 24x7 when some component in the stovepipe has failed). IT management usually seems to stress "managing" the chaos, while in fact it should be eradicated and it's plain to see there's a lot of low hanging fruit here. But unfortunately IT management simply doesn't seem to "get it".

Posted by Kristof Van Damme on February 15, 2005 at 07:10 AM EST #

It is much easier to bolt on the next stovepipe and run around extinguishing fires, than it is to transform the infrastructure into an integrated and coordinated and efficient service delivery platform. It takes a lot of hard work and a long time to do it right. And in the mean time, you've still got the fires to deal with and a continuous stream of requests for services. But it is a worthy and necessary goal. The alternative is to accept high costs and unsatisfied customers. Ultimately this kind of operation will be outsourced by an outfit that can deliver to expectation at an aggressive cost point (see my blog on Coase's Law).

Posted by Dave Brillhart on February 17, 2005 at 01:15 AM EST #

Post a Comment:
Comments are closed for this entry.



« July 2016

No bookmarks in folder