Sunday May 07, 2006

The True Value of Information Ownership

An article, published on 5/5 at seems to be a harbinger of something either interesting or sinister happening in the Internet/2 realm. Here we have Google which makes the majority (like 98+%) of their revenue through advertising moving to take on a large role in information meta-indexing and even data management. And then we have Microsoft, who sells software with 95% of their revenue from the sale of “old-economy” shrink wrapped software, also moving to use data management & storage as a mechanism to build stickyness into their service economy (and I might add to execute on a defensive strategy against Google and the Free/Open-Software movements).

If you look at the cost of operating a data-center, they tend to get organized into a couple of categories:

  1. Facilities expenses: dominated by power and bandwidth
  2. Operational expense: dominated by labor
  3. Capital expense: increasingly dominated by data management systems but also by computers
  4. Software expense (sometimes capitalized): dominated again by in typical order: data management systems, vertical integration software, and infrastructure middleware

There are different strategies employed by different kinds of hosting, hardware, infrastructure software vendors to minimize these costs either within category or by trying to integrate capabilities and therefore uses systemic engineering approaches to minimize total cost. If one looks at Google, for example, they run their own data centers which allows them to maximize the interaction between compute/storage density and the facility costs (they control all the variables in 1/2/3 above - the typical IT charges), as well as rolling their own OS image, middleware software, and even File Systems (GFS) to manage 2/3/4. Microsoft, on the other hand, with their core business in engineering hyper-integration of 4 to reduce costs, needs to understand the impact of doing 1,2,3 on their existing channel and determine whether they can afford to do the focussed engineering to do the feature reduced / reliability enhanced engineering around their integrated software stack to achieve 1,2,3.

What is interesting to me is the yet undefined intersection of business data stores, and the online shared data sets (see Microsoft's Map the world in real-time initiative). This is to say the information economy is built around the premise that those who can can have the best information (or process information the most accurately and efficiently) will have more value than other category players. So if we start having these data centers (note the term data centers is very accurate here), built through advertising $'s, at what point do the “owners” of this data, or potentially more importantly the data's relationships begin to control a macro-economy, able to exploit this ownership to incredible, even monopolistic possibility.

I have to say, and it should come as no surprise, that the notion of utility data centers which aggregate demand against a shared physical plant does offer the possibility of democratized computing. Utility Data Centers effectively level the playing field between those who have capital and those that don't. By creating an open-marketplace in which the efficient use of assets, and the “value” of the services that one develops enables a small player to expose their innovation and compete successfully against the large players. With the recent moves by Microsoft and Google, one has to wonder if this is yet another emerging market where people fail to recognize the potential for monopoly abuse until it's too late, until the critical data needed by other “information economy players” becomes owned by a select few, and access to this data becomes too costly to compete.

Monday May 09, 2005

Requirements for

At Sun, we have long predicted the impending doom for systems that are often termed Generation 1 (G1), or as I call them, “linear architectures”.... systems that are wholey built from a traditional 2/3-tier architectural approach, in which all elements of the presentation, business and data management layers are solely constructed for a single application, not built for “sharing”.

For many reasons, G1 systems and approaches, have continued to have longevity, why won't they continue? For we have been using the same system paradigm since the mid-70's, a continual drive towards commoditization and manufacturing has thus far survived major disruptions.

The shear speed and power provided by the network of connected computing resources is seeing a shift in the paradigm from the network as a data transfer media, to the network as a fabric for the execution of complex distributed services. This is perhaps arguably one of the challenges facing our large scale SMP product line (but we're not alone in this challenge - IBM's Mainframe line is really having some challenges).

The problem that is becoming more apparent is the exponential complexity introduced by a complex Service Oriented Architecture (loosely coupled coarse/medium grain just in time compositional systems) when architected using the mechanisms traditional to G1 systems.

An example of this fractal complexity:

Company A including a service from company B in their SOA. Company B then compositionally includes services from C to complete, perhaps w/o A's knowledge, what happens if C uses the same service from A...

At best this means that identity and privacy need to be further protected, at worst we may have a re-entrant/recursive execution problem (yeah single box threading can be hard, but network threaded apps where you don't have control of the components or sourcecode?), or “ilities” problems as the G1 paradigm typically does not find the need to elaborate security, scalability, availability, manageability models as part of the core design since each “tier” is typically fully characterized only for use by the preceding component: “the EJB's only get called through OUR Servlets” or “the DB is only accessed through OUR EJB's”.

With the movement toward component compositional models, SOA's, and Grid computing we begin to realize that a new paradigm is afoot - both with it's challenges and advantages:

What are's core challenges:

  • fallacies of networking
  • network “enforced” isolation through distribution
  • contiguous system memory vs. distributed memory
  • service operation model & recognition of partial failure
  • federation of core “identity services” including identity, context/role, entitlement

But it carries substantial advantages:

  • differentially granular and dynamically scaled systems
  • ability to take advantage of locality / proximity in execution
  • dynamic parallelization of workflow
  • ability to version/add functionality (carefully) in-vivo
  • declarative security levels and well known enforcement models
  • enforce isolation rules / best practices through interface “encapsulation”, resource management and controlled access

What are some of the major trends that I forsee needing in the construction of based systems:

  1. Majority of time in planning and assembly vs. iterative contstruction
  2. Improvements in model annotation to capture systemic qualities and referential patterns (micro-architectures) so that non-functional behaviors can be better understood
  3. Declarative models vs. code (though pseudo code could be used for declaration for rich syntax)
  4. Federation of core services ( a core tenant of SOA that many forget!)
  5. Ability to deal with distributed & distributable data
  6. Append only / constant query data models (fail in place, recover from clone)
  7. Omnipotent debugging, AOP and debugging languages like “D”
  8. Systemic (compositional) SLA managment ... what is possible & what is desired
  9. Micropayment environment to allow for “for fee” SOA services
  10. Fine grained identity & entitlements to allow for security level agreements w/ tooling

More later!

Sunday May 01, 2005

Disruptions and Discontinuities

A large number of people have been telling me over the past weeks that they cannot see their workloads shifting outside the “protected” four walls of a corporate data center. To that I respond, are your four walls really that protected, I mean most corporate Intranets are little more secure than the social engineering that continually compromises them; take the ChoicePoint incident for example, which had no hacking involved.

The question I really ask: are you so sure that you cannot better manage corporate/state/local/federal policy through contracts? to companies who make isolation, and enforcement a priority because of their multi-tenant nature? Just take for example an apartment building, where the common areas are secured to protect the dwellers despite the fact that some dwellers may not lock their own doors. For tenants who do lock their doors, the exterior doors add an additional layer of protection, under specific contract with the condo association. - Just a thought!

I then moved on to a review of Tom Friedman's new book “The World Is Flat': The Wealth of Yet More Nations” In this review by Zakaria he relates a section of the book that really typified why I think that multi-tenant utilities, like Sun Grid will inevitably have the loads to make them work:

Jerry Rao [an Indian entrepreneur], explained to [Tom] Friedman why his accounting firm in Bangalore was able to prepare tax returns for Americans. (In 2005, an estimated 400,000 American I.R.S. returns were prepared in India.) ''Any activity where we can digitize and decompose the value chain, and move the work around, will get moved around.

specifically that where there is a need which cannot be met with existing resources: financial, staff or other, then innovation will naturally fill the demand. These disruptions/discontinuities where new processes so fundamentally shift the economics vs. the old businesses, it becomes easier for people to recognize that changing, and in some cases standardizing is worth it!

My copy is on order, can't wait.




« June 2016