Wednesday Mar 18, 2009

A Busy Day for Clouds - DSC and Netbeans Too

The Data Center as we Know It is Dead. (Sorry George Gilder)

Officially -- welcome to cloud computing from Sun.

If you haven't seen the news, look at Sun's Cloud Announcement and check it out for yourself. Netnet -- Sun's Cloud business unit is not only building our own public cloud (at SuperNAP today -- but also taking that technology to our service providers, other partners, and our direct customers (private clouds.)

Also, my team, specifically Robert Holt, Mikael Lofstrand, and myself have been working on something called OpenSolaris Dynamic Service Containers (DSC) -- you can join the fun at This makes it easy to manage containers/zones and scale them up and down across physical nodes -- all via a simple registry.

Additioanlly, we've been working on some IDEs for the cloud -- look at John Kirby's page for some ideas here. This will eventually link to the DSC work above and our own cloud offering based around x86 virtualization (VMIs.) Why is this important??

The deployment model is assembled in the IDE today but the cloud needs to manage the workload. The "model" can't simply live in the IDE, it needs to live in the cloud as well.

Great work team!!

Wednesday Feb 18, 2009

Patterns Refactored for the Cloud

Patterns Refactored for the Cloud Over the last six years, we've discussed “patterns” with several of our large system partners. Every one seems to come up with a hierarchy, usually at 3 or 4 levels. An example:





How does this apply today? This structure forms the model on how this complex state machine operartes. At each level, there is different data required to solve for the dependencies. At the high level, implementation is far removed. Today, IT folks manage “images” -- binary blobs that represent so many things that they are difficult to control and decompose. Advanced developers deploy using OSGI. This is the difference that needs to be explored.

Images are a deployment payload

Models are the structures defined by the system

APIs and “Instruction sets” define tell the system how to operate at run-time

Policy help defines the “what” happens when and are in play even prior to run-time

Views are consistent – ie what a developer sees is different but is in the model such that an admin can see what's going on in the environment as necessary. An operator can “destroy” which effects the developer “view” should it be necessary.

Friday Feb 13, 2009

Cloud Serviceibility and Architecture

Composite services and “clouds” are architectural in nature. We can no longer attempt to mediate system events at an element or “server” level. It must be broader and intelligently confer architectural context at nearly every level.

A Server Perspective

A trouble ticket is filed, possibly automated

Someone researches the problem and confers with the admins, developers, BU, etc.

The service limps along, or is down completely during this process

Eventually the troubled system is brought back on line

- services (applications) are reloaded, things are back up.


A Service Perspective The event management system analyzes the outage

-- are other services functioning, is the desired SLO (service level objective) still being met? If other services are functioning and SLO OK then server is placed in “to replace” status pool within the POD

VM or other operational management (policy) system brings up additional workloads if required to replace capacity lost

Someone eventually replaces the system (quarterly?) when “replace” threshold limits are reached

-- new systems added to “spare”/unallocated pool

While simplified, this illustrate a next generation practice that most large scale system providers do today – they don't care when a server goes down, in fact they (the human) may not even know. It's trapped by a content delivery network (e.g. Akamai – there for perhaps someone else's problem) or the “management” systems – a sort of architectural meta-cognition.

The difference here – is like the classic “If a tree falls in the forest and no one is around to hear it does it make a sound?” Is that tree architecturally significant? Maybe to a couple of squirrels but its very likely (??) that the forest survives and the animals find a new home (unless it was the last tree!!) Was the event architecturally significant? If not – wait until we hit that threshold.

The key to correctly applying architecture is abstraction. We will want to be able to specify a workload or process to run without specific implementation parameters so they can be consumed by the “cloud” that best meets the requirements. There are some key factors....

\*Where's the data?

\*Where's the data go once processed?

\*What's the desired availability?

\*What's the desired level of scale? Now? Tomorrow? One Year from Now?

\*What's the security requirements?

\*What are the key service run-time dependencies?

\*Are the service components stateless or stateful at the node level?

There are probably others but they are increasingly non-relevant in an abstracted discussion. Any compute infrastructure that supports cloud computing should be able to deal with these issues today, but what makes them inter-operate and how does a user decide which cloud to use? What if the user is the system of clouds?

Thursday Jan 08, 2009

Developing on Clouds

So it's been awhile -- I hope everyone has had a restful holiday season and your all ready to get back into the thick of things. :) By now you've seen our announcement on the Q-Layer acquisition. Great news, and it validates and hopefully accelerates some of the work we are doing on the OpenDI side -- now hosted at

I've been doing some thinking lately on the enterprise, private cloud space. I believe there is some level of tension between the cloud as it exists today and how enterprise IT exists and operates. I think this is a good think -- perhaps it helps to move along the utility computing models (non-project based silos, etc) that we've tried to recommend in the past. But I do see a serious change in the developer use case and how it plays out with an IT shop.

In the diagram, there's two processes illustrated. The top, the developer as a freely able person to pick and choose his deployment technologies and design/technical decisions, and the second -- a more constrained enterprise view that helps illustrate my point.

IT shops need, want some level of control -- they have to -- they are often the folks that are goaled, fired, replaced, based on their ability to meet SLOs (servlce level objectives.) The cloud needs to take this into account and eventually help guide the developer towards the path of.... 1) getting their apps deployed 2) monitored, instrumented 3) scaled appropriately, etc 4) and managed

There's a handoff point here that's important. The role of the "operator" in a large existing shop is going to change slowly. The cloud needs to provide the views and the tools into this to ensure the business operates at the level they expect. How do we do this?

Take a look at the whitepaper at OpenDI for some more thoughts on the "model" driven approach.

Tuesday Nov 11, 2008

Complexity and Building Enterprise Clouds

I've been doing some thinking lately around the cloud model and how enterprises might adopt it. Enterprises are challenged with a conflict -- between giving their developers control and choices, and maintaining operational control. Case in point -- the ownership around SLAs if often with the operations/adminsitration org -- not the developer. The developer in many cases is hoping that most of the "systemic qualities" will appear within the platform and not necessarily require lots of development time. An interesting example of improvement in this space is the SHOAL project around Glassfish.

One of my employees is working on some modeling projects -- trying to model the data center "as is" vs deriving the model from a "perfect" state where choices are somewhat removed from the scenario. I mean the data center is architected in specific ways that allow or disallow some functionality -- you see this in very large sites, like Google and Yahoo. They have several major architecture patterns where many or most services confirm to those patterns. You want to deploy? You conform.

This battle is often up hill. The last 20% of a solution is the area that you spend the most time on, convincing others of the design or that "good enough" will trump perfect. But I think we need to get over that -- we can't afford not to.

Graffiti is a good example. Hand writing recognition was very hard, companies failed trying to figure this out. Did they constrain the problem (thus the solution) enough to progress to something that works without a whole bunch of "change?" Jeff got it right -- fix the few letters that cause the problem (i vs L) and constrain the problem. He found a solution. We've gotten a bit more flexible today but its still the core thinking in the industry.

What problems can we solve today if we limit the choices, give a way a little control, and are able to take technology to the next level?

UPDATE: Forgot the Jason Corollary: "Impossible" exists only because we haven't stated or re-factored the problem so it is "possible."


Thoughts from Jason Carolan -- Distinguished Engineer @ Sun and Global Systems Engineering Director - -


« June 2016