Wednesday Mar 22, 2006

Part I: SunGrid “The Beginning”

Albert Einstein said “The significant problems we have cannot be solved at the same level of thinking with which we created them.”

The Sun Grid projects started in earnest more than 2 years ago under pretty different pretenses. Sun's Sales organization was struggling with the high levels of efforts and residual expenses associated with letting Scott McNealy visit with a customer. He couldn't help but tell them about this Ultra-Thin Client device which could save on power, operational expense, and even allow for hot-desking to reduce real-estate burdens - in fact Sun saved more than $27M in the first year of the program alone. And so Scott would go on to promise each CxO level executive a SunRay that Sun would come and install for them as a trial.

Alamo PlantThe Technical Sales Organization (TSO) first setup a blueprint and demo kits, but we weren't as fast as Scott's plane. Someone (Brian F.) had the fantastic idea, to put a Desktop “Service” on the Internet, and now we could just parachute a SunRay to each of these executives, and they just plug it into their home broadband router, and they are running... coined “CxOnet” the plant has been running for about 2.5 years (thanks Brian), which grew up to become Project Alamo, and now “Sun Grid”.

I believe that this was a precipitating event in realizing Scott McNealy's long term vision of the Big Friggin Webtone Switch as well as Greg Papadopoulos' predictions around the demise of shrink-wrapped software, and the emergence of next generation utility data centers, and of course to Jonathan Schwartz who acted as a catalyst in moving towards subscription/utility software models. (Obviously a lot of credit has to be given to the Networking Infrastructure Providers who, during the great build out precipitated by the DotCom boom laid a tremendous amount of Bandwidth around the country, and brought substantially more reliable/redundant networks to every business and most every home.)

This new value proposition, of not having to “run” your own data center is obviously still in it's adolescence, but I think that the very patterns for multi-tenant secured computing across a computing mesh - not unlike the power grid, offers scale, reliability, consistency, and affordability that will be difficult to match. The multi-tenant isolation - has been the critical tenant of security design, and many will herald this as a return to mainframe computing. Let's just remember that mainframes had to be multi-tenant because of the cost of the physical plant, they had multi-tenancy engineered into the very Operating System (as does OpenSolaris today, I might mention). In today's horizontal compute fabrics, the very networks that connect computers lay challenge to this tenant, and the Sun Grid architecture is designed to enable a network operating system of similar control, and yet the economics of running it are rarely brought into question... The question being, what is the cost of low utilization (a lack of true multi-tenant virtualization) on your operating expense, and potentially on your needs for complex compliance legislation.

Sun Grid is not perfect, it never will be, but I believe that this is a monumental step towards more affordable computing for end-users and corporations alike.

I would like to take this time to thank the whole Sun Grid Team who has worked around the clock for months as we have been transiting the final hurdles towards release, and the many experts throughout Sun who have provided invaluable insight and assistance towards this shared goal! Thank You All!

Sunday Mar 19, 2006

Running Jobs on Sun Grid that require “Service Containers”

Sun Grid's resource management semantics basically dictate that jobs be self-contained, and terminate all processes in order to exit. The problem with terminating processes in a grid context is that it's not quite as simple as doing a PID trap on a single host, instead, you need to use the qsub, qstat and qdel commands to better manage your distributed jobs.

The example pattern that I'd like to elaborate is one of a “server/framework” which needs to run in order to support a client. Whether a simple RMID, or a more complex instance of a web server, app server or JavaSpace, the pattern is very similar. The developer wants to:

  1. Start up one or more servers (in our case 2, the httpd and the GigaSpaces Enterprise Server)
  2. Make sure that the servers are running
  3. Submit the client and wait for the client to complete
  4. Shutdown the Servers so that the Sun Grid Job can terminate and stop the meter

First some basic syntax:

  • #$ = new directives for SGE which do things like populate environment variables (-V)
  • qsub = submit this task to the grid for scheduling.. we use a couple of opt
  • “-sync n” fire and forget... don't wait for the job to be scheduled
  • “-N <jobname>” not required but could be used for parsing qstat... unfortunately qdel requires a jobid instead of a job name (to keep you from shutting down similarly named jobs)
  • “-t 1” or -t 1-4:1“ submit a job to one or multiple nodes with a minimum
  • qstat = get the status of the SGE queue, which in the case of Sun Grid will only return the jobs that you own for privacy purposes
  • ”-r“ only return the ”running“ jobs... jobs that are waiting (status=”qw“) are excluded
  • qdel = delete / stop the specified jobs

Now onto the listing:

#! /bin/bash
#$ -V

# if we are running against an older version of SGE, the ”$ -V“ direction
# will not exist, so be sure that we source the SGETOOLS (or at least try to)
if [[ ${SGETOOLS:-”unset“} = ”unset“ ]]
echo setting SGETOOLS

echo ”Starting the GigaSpaces Servers“
GSC=`qsub -sync n -N gsee-gsc -v GSEE_HOME=$GSEE_HOME -v GRID_HOME=$GRID_HOME -t 1-4:1 $GRID_HOME/bin/gsc`
GSM=`qsub -sync n -N gsee-gsm -v GSEE_HOME=$GSEE_HOME -v GRID_HOME=$GRID_HOME -t 1$GRID_HOME/bin/gsm $GRID_HOME/config/overrides/gsm-override.xml`
echo ${GSC}
echo ${GSM}

#SGE Job return syntax is XXXX:X-X:X where $JobID:$rested_min-$max:$Actual_min
# so trim out just the first XXXX which is a regex matched from the 3rd field
MATCH=”\\(.\*\\) \\(.\*\\) \\([0-9]\*\\)\\.\\([0-9]\*\\)-\\([0-9]\*\\):\\([0-9]\*\\)“ #simple match for multi-node job
MATCH2=”\\(.\*\\) \\(.\*\\) \\([0-9]\*\\) \\(.\*\\)“ #simple match for simple 1 node job

GSCparsed=( `echo $GSC | sed -n -e ”s/${MATCH}/\\3/p“` )
if [[ ${GSCparsed:-”unset“} = ”unset“ ]] then
GSCparsed=( `echo $GSC | sed -n -e ”s/${MATCH2}/\\3/p“`)

GSMparsed=( `echo $GSM | sed -n -e ”s/${MATCH}/\\3/p“` )
if [[ ${GSMparsed:-”unset“} = ”unset“ ]] then
GSMparsed=( `echo $GSM | sed -n -e ”s/${MATCH2}/\\3/p“`)
echo ”Jobs $GSCparsed and $GSMparsed submitted“

# wait for these jobs to showup in qstat
until [[(”$GSMstatus“ > 0) && (”$GSCstatus“ > 0)]]
#evaluate the qstat -s r response (running jobs) to make sure that the
#requisite jobs are running
GSCstatus=$(qstat -s r | nawk '/'${GSCparsed}'/{var1+=1} END {print var1}')
GSMstatus=$(qstat -s r | nawk '/'${GSMparsed}'/{var1+=1} END {print var1}')
echo ”GSCstatus = $GSCstatus“
echo ”GSMstatus = $GSMstatus“
echo Server status is $(qstat -s r)
sleep 10

#run our application - in this case, use multiple nodes to help us calculate prime factor
echo ”crunching“
~/ $1
echo ”done“
#clean up
#parse jobid's out of GSM and GSC
echo $(qdel $GSMparsed $GSCparsed)
#go ahead and print out the queue status on the way out to verify cleanup (optional)
sleep 10
echo ”Leaving...“ echo $(qstat)

Hopefully, this example sheds some light on some of the mechanisms that a developer might enlist in order to launch more complex, server dependent applications against the Sun Grid. Please let me know if I need to elaborate further. I want to take this opportunity to recognize GigaSpaces, and specifically Dennis Reedy for his help in putting together a grid job which could flex a couple of nodes against their GigaSpaces Enterprise Server 5.0 environment. I'd also like to thank Bill Meine and Fay Salwen for their scripting assistance.

Keywords: ,

Wednesday Feb 15, 2006

Mash-Ups and Dynamically Provisioned Services

As I have been watching all of the discussions about mash-ups, I have been wondering if traditional integration mechanisms employed by the developer community are really well suited for this new environment in which services (yours and others) are embraced, combined and extended in order to deliver some new aggregate value proposition.

I really view mash-ups as a way to take someones intellectual property and extend it to address some new use-case for which the original designer may/not have designed. This extension creates numerous problems that include license (approved use) - something that I'll allow the attorneys in the audience to argue about, but interesting to me is the context under which the component in question was designed to be used, and mechanisms to elaborate that context to help the mash-up developer understand the critical “ilities” - reliascalavailaserviceability of using their application in a “production” environment.

As I looked for analogous problem domains, I happened upon the Integrated Circuit industry - an industry that has been moving from discrete semiconductors to integrated circuits to Systems on-a Chip (SoC). The proliferation of specialized “cores” (provided by IP backed designs) and the recognition that customers are looking for single chip solutions for cost, space and power reasons, has evolved a set of processes and tools to combine these cores into a single system - and in doing so has forced substantial changes in the Electronic Design Automation (EDA) field.

I hypothesize that a similar revolution will unfold for the “service oriented” world that is being espoused by just about every rag, and in most every IT shop. Taking cues from EDA, it might look something like this:


Some components will need to be developed –allowing the expression of business processes and critical IC in a programmatic language. However, it is anticipated that as components are added to an Internet enabled / distributed registry (think about the catalogs that we used to receive from IC vendors) that developers will become focussed on assembly to enable business processes, and proven business process patterns will become building blocks at the next higher level.

Tooling/Paradigm: Specific business service assembly tools + Business Process Modelling to provide both component development and component assembly at the functional layer, with extensions to the domain model specific to the component abstraction that allow for systemic constraints to be suitably defined. The analog in chip design seems to be VHDL/Verilog & SystemC.

Once a workflow/business process has been composed, the developer needs to be able to verify that it behaves as intended (before worrying about the systemic constraints). The output of the verify step should be a constraining graph based upon what we know about new and existing components to allow the system to plan for the process deployment. Over time, as component sub-systems emerge, they will be pre-verified (ebay model).

Tooling/Paradigm: there need to be a set of tools / processes that can be run to ensure that interfaces are appropriately wired, test cases executed to ensure appropriate functional result is achieved. The analog in chip design is Functional Verification.

Each of the components has systemic constraints, the system nowneeds to leverage rules/policies to determine the overarching constraintswhich best characterize the characteristics of the defined model. In this way the system can begin to understand how things like trans-action performance (viewed as latency), high availability (viewed as uptime), can be elaborated and tradeoff decisions made with the help of the developer (cost / time).

Tooling/Paradigm: once the system is functionally defined, the constraints need to be organized to ensure performance, and discrepancies resolved. This results in a systemic design in which the no constraints remain “at odds” with one another. The analog in chip design is Design Synthesis

Now that the constraints are fully understood, we can begin to group components and map them against known capabilities of the infrastructure, selecting appropriate provisioning and operational policies/rules and bringing those component based plans together into a federated construct that can be used by the observability and management systems to deploy the system.

Once the plan has been developed, it can be delivered to an executor. There should be (at a minimum) 2 execution types: try it, do it. Try it should allow the interfaces to be excercised so that the plan can be validated/ verified against “production like”. At which point the plan can be certified to run at scale. This 2 step process is critical as it will help us maintain control of rogue applications (unintentional) that may not behave well. Furthermore, execution includes a mandatory monitoring/auditability that can enable an operator to better re-plan over time for better Service Level performance at lower cost.

Thanks for reading, as I stated above, this is just elaborating an analogy, whether it proves valuable in SOA is yet to be seen, but I'd love your comments.

Friday Nov 18, 2005

The Sun Grid Environment & the value to the developer eco-system.

<cross post from>

Sun Grid looks like a traditional IT stack, exposing common interfaces to enable developers and ISV's to target different abstraction layers:

Business Opportunities

The core environment is made up of a “Resource Factory” (RF), a production plant that is optimized to produce power at appropriately consumable chunks (balancing the economics of operations/distribution against typically demanded performance units). In this model the resources are managed by a Distributed Resource Manager and similarly metered/monitored using commercial DRM's and OSS/J technologies through a Service Data Record (a superset of the traditional telco model of a CDR or Call Data Record). Over time, Sun Grid will allow for multiple/pluggable Distributed Resource Managers through the support of a super scheduler which will allow a variety of resource management models to be deployed to support the various container/containment strategies for managed workloads.

Where the RF and it's blueprints are probably interesting to Data Center operators/managers who themselves are embarking on virtualization and distributed resource control/managment projects, the developers will probably not have a lot of use for this abstraction. IMO, what developers are really looking for is a set of runtime containers that are as thick/thin, complete or pluggable as their applications demand. This means that we need a container model that allows for specific orchestration.

For example, in a traditional 3-tiered (I know, nothing is traditional anymore) architecture, the separation of concern drives us to the isolation of the MVC across multiple tiers.... this means that we need a content delivery/presentation container, and a business logic container which is suitably aligned with it's data management containers/services. How do we describe these relationships in both functional and non-functional ways? Right now this is more art, than science which is to say that there are a number of different mechanisms, each with specific value propositions, and each with their own warts. Sun has the N1 Service Provisioning System (SPS), we also have DCML (an OASIS activity), WS-Management, and even techniques from the GGF which are trying to expose enterprise services using declarative markups.

This blog is getting a bit long, so I'll follow up in a second, but let me not leave without talking about the REAL value proposition for developers (the higher layers), here are some of the services that I'm thinking about:

  • a repository for open source components and libraries allowing for both forward, and regression testing that developers can use, without distribution licensing concerns (caveat emptor) to build, test, re-factor, test new version... their service / application
  • a repository, a vending machine really, for commercial components that can be used in a metered/rated way (see OSS/J note above) by other developers to shorten time to deployment, improve competitive comparison, open up new opportunities and markets, and allow components to compete in an open capital market/exchange (is my component really $xx better per use than yours... if it is then I'm incented to keep developing).
  • a service facade for public/private hosted services that build on the economics of the Sun Grid and the pay-as-you-use model, as well as providing improved integration through close proximity to other's services (proximity remains very important in reducing latency in distributed service based architectures).
  • common services, including things like federated identity and common entitlement policies that allow these ecosystems to emerge.
  • a piloting ground for commercial software... instead of shipping a disk, why not take a SaaS approach where a demonstration entitlement is granted and instantly provisioned... how much could we reduce the cost of software sale.
  • and finally, but not least, produce a mechanism, like the component/service mechanisms above to allow for business processes to be patterned, shared, extended and sold to perform regulatory or other tasks. Things that haven't yet emerged in the open communities, but are certain to once suitable service oriented substrates are available.

I hope that I've been able to elucidate some of the value that a common utility could provide, and facilitate through the Java.NET community as we move forward on this journey.

Please join us in the Sun Grid Community, sign up a new project, get some free grid time, and let's move this vision forward.

Tuesday Nov 15, 2005

So you have a new project & don't want to “buy” more computers?

Depending on who you talk to, computing is a competitive weapon or a cost of doing business, either way it's an investment. One question that each and every new project has as it moves from the business plan to implementation is the often excessive cost and time to put into place the infrastructure so that you can finally realize that killer application.

With the advent of utility computing, and before that, outsourcing, businesses gained the ability to shift the costs from capital costs to expenses, and in doing so have improved their capital portfolio and cash flow positions. Now you have a new initiative, and though you have models/projections wrt. the data transfer, loading, processing and storage rates, you are probably still not quite certain of how this should be architected. One potential BluePrint comes from Sun in it's Sun Grid Rack System K2A Gridrack 3and Sun Grid Utility Services approach. This blueprint gives you the ability to take industry standard x64 servers or the novel SPARC CoolThread (Niagra T1) based systems to grow your computing plant just in time, in small incremental grains with incremental cost ( the value engineering done by the core utility team).
I1 T1 Lg

Take advantage of package density, operational automation, systemic monitoring and metering and workload scheduling advancements that are currently under investigation by the Sun Grid team as your solution matures. Furthermore, this puts you under initial control of your near term deliverables (your hw, your sw, onsite) and aligns with options for fiscal improvements/flexibility in longer term. If your initiative is radically successful to the business, you look to “join” Sun Grid, allowing your work to be distributed onto the Sun Grid mesh of data centers in addition to your own; this gives both time to market, peak scale as well as geographic diversity for high availability.

How do we get started?

  1. Let's talk about the blueprinting - will your application fit this design... horizontal computing services are different from vertically integrated services in their treatment of memory (the largest issue) specifically the unified memory architecture of SMP machines is a critical facilitator to some large scale transactional systems.
  2. If we can go horizontal - which most if not all new applications (green field) can, then let's think about the data flow to look at the points of constriction. Areas where we become hardware limited because of disk, network and cpu speed/contention.
  3. With these “critical to scale points” in mind, let's determine a scalability strategy to help us parse this load coherently and with availability. Many of todays applications are well suited for pipelined approaches as in the image here.
  4. determine the core services that need to be shared, and how these core services can be federated benefiting both your company and your partners - federation is about the sharing of responsibility and control.
  5. go back to 2, and continue to refactor until scalability can be addressed with some fudge factor that lends the software developers some flexibility in approach.

We are ready when you are with a set of System Integrator Partners, ISV's, Client Solutions team, and a very active Open Community to help you take advantage of these emerging models, simplifying your Data Center, and changing the economics of corporate and research computing.


Tuesday Nov 01, 2005

MVAPICH for Solaris Released

We have seen numerous press releases on Message Passing Interfaces (MPI) lately including those from Microsoft who has been working with Argonne Labs (funding a Win32 port) of MPICH2, and this, most recent announcement of Ohio State University's port of MVAPICH to Solaris across Infiniband.

Sun has been collaborating with OSU for a long time, working with Linux and Solaris on both SPARC and x64 based platforms. The current announcement from OSU is a novel MPI-2 based design (at the ADI-3 level) providing uDAPL-Solairs support. So what is this acronym soup?

Infiniband: a high performance switched fabric providing high bandwidth (in excess of 30Gbps) and low latency (can be lower than (<)5ms for serial I/O (channel based) between two host channel adapters (HCAs which are available at costs < $70). This fabric utilizes a separate I/O | communications processor from the traditional node CPU to allow the independent scaling of I/O and the offloading of I/O responsibilities allowing performance & cost tuning of computing clusters. Typical per port costs are in the $300 range (HCA & TCA) vs. >$1k for 10GBE adapters, so performance@cost is definitely in IB's favor for the highest of performance needs.

Message Passing Interface (MPI): established in 1999 to provide a standard set of message passing routines that focus on both performance and portability, recognizing that these goals are often at odds with one another. MPI-2 work was begun in 1997 was designed to realize areas where the MPI forum was initially unable to reach consensus like one-sided communications & file I/O. Basically MPI, makes use of GETting or PUTting ( or ACCUMULATE) data from/to a remote window that reflects a shared memory space in non-blocking ways for parallelized performance (an older, but still relevant tutorial from University of Edinburgh).

User-Level Direct Access Transport APIs (uDAPL): there has been a need to standardize a set of user-level API's across a variety of RDMA capable transports such as InfiniBand (IB), VI and RDDP. The model, is a familiar one to most infrastructure programers, that of a interface producer (both local and remote) and an interface consumer that has visibility as to the localness of the provider. uDAPL is designed to be agnostic to transport ala IB to unlock consumers (like MPI) from the intricacies of the underlying transport in a standardized way. Within this layer cake, it is expected that a uDAPL consumer will talk across a fabric to another uDAPL consumer though this is not mandated, it is common practice.

MPICH & MVAPICH2: are implementations of MPI provided by a variety of entities (mostly government agencies/labs and universities) which are frequently competed on features and performance. MVAPICH2 has been focused on IB, whereas MPICH2 supports other interconnects including Quadratics and Myrinet, either way, the goal is to create a high performance consumer (programmer) interface that can sit on standard or customized interconnection stacks. Where MVAPICH2 tends to shine is in larger packets providing higher bandwidth (though at a cost to small packet latency). A reasonable comparison from OSU and Dr Panda here (though we have to remember Dr. Panda's sponsorship of MVAPICH).

So that was a short summary, but hopefully this just wets your appetite for looking at architectures like Infiniband for constructing highly performant Grids/Clusters, and some of the techniques that you might request from Sun Grid to accelerate your parallel applications.

BTW: Sun Grid has MPICH 1.2.6 pre-installed including Java wrappers, here is a sample deployment script:

Sunday Oct 23, 2005

jeri rocks

Jeri or “Jini Extensible Remote Invocation” has been in the back of my mind lately. My problem has been to find a way to allow for a richer interaction between a server side processing environment, and a set of client side executors. The challenge has been that traditional approaches dictate a tight coupling of a client side representation of a model and a server side implementation... with constant “polling” or firewall challenged notification.

As Sun Grid nears reality, the engineering team is focussed on enriching the experience enabling Sun Grid to be:

  • a stand alone computing environment
  • an integrated services platform (but still fairly discrete in functionality), or
  • an extension of your data center.

The last opportunity provides the holy grail, a compute model that can dynamically flex between running services in your data center to running services in the extranet (Internet). In this case one must be concerned with the security model, the proximity to critical data/services, audit & debug, service level agreement, and so on. One model pushed by many is using the WS\* “standards” to provide the framework for interoperability, but I think that many have already learned how the programming model needs to change as the business loosely couples components and the resultant impact on compute efficiency (latency & throughput).

Java, for a long time, has been looking at models that range from JRMP, IIOP, JMS and other protocol/communications stacks, but many Java developers keep coming back to RMI because of its ability to coherently operate systems made up of local and remote objects.

Jeri has been around for a couple of years (since 2002, I believe) when a couple of critical JSR's were voted down by the JCP. At this point, the Jini team realized that there was but one way to strengthen the security model behind Jini if it weren't going to happen in the core JRE, and that was to create a new implementation of RMI on top of a novel protocol stack.

Jeri itself is a layered implementation made up of a marshaling layer, an invocation layer, and a transport layer, this allows for separation of concern and the ability to replace individual layers so long as the contracts are maintained, thereby giving systems architects a tremendous amount of flexibility around specific implementation. A really good tutorial is available here, with a few examples that really helps to make some of these key points.

Back to Sun Grid, we need to expose application interfaces that include a portal (for browser based clients) with channels provided by Sun Grid and others, a WS environment including a registry, again for Sun Grid interfaces, and 3rd party interfaces, and will probably need a more coherent set of application interfaces that bridge the POJO, Swing Framework, Jini, domain so that we can create the kinds of dynamic systems that are emerging based upon Plain Old Java Object (POJO) based components running as services.

(It should go without saying that we also need monitoring, management, debugging, metering, entitlement, ... systemic interfaces), and though they will likely be different, I think that it's well within the realm of possibility to suggest that the afore mentioned interface technologies can play a large role in realization.

Keywords: , , ,

Wednesday Oct 12, 2005

Musings on SAAS

We need to evolve our thinking about why and how people purchase computers and software as todays model is terribly inefficient:

  • software “as a service” (SaaS) can be a very powerful tool to emerge ecosystems in which online services are consumed rather than “downloaded, licensed components”. Driving much of this is both the cost and complexity of distribution (both buyer and seller), as well as the low utilization of the systems where the components are deployed.
  • todays software distribution model favors the largest of players which is to say the players who can afford the high cost of sales and service under the buy it, get it, install it, use it model of distribution.
  • ecosystems will evolve which take the ebay or Amazon model of marketplace aggregation to services, but with software, unlike hard goods, the qualities of service become the critical differentiators
  • we do not yet have either an appropriate meta language nor “Trader Service” (yes, from my Corba days) which might allow services to be found and effectively bound in an on-demand fashion.
  • we also lack the repository which can manage these components and their entitlements in such a fashion to enable true monetization on a “fair and equal” playing field... .typically this is provided by an open marketplace using explicit symbology.

I haven't yet talked about next generation orchestration languages that could allow “processes” to be also purchased (things that System Integrators typically look at as their critical IP - a pre-proven HIPAA or SEC process that a company could use to improve their own process compliance, but it's certainly a possibility once the component marketplace is established. In all, very exciting new world that Public Utilities enable.

Overall, it's very exciting to watch Google, MSN, Amazon and Yahoo battle in this next frontier, after all I call for a trader service, and symbology, which IMO is their strong suits. The question will be which of these companies has the right tools for the communities of developers, and can provide the “grease” to make their marketplace the most valuable and therefore attractive.

Sunday Aug 21, 2005

Change or Die

When I was recently lamenting the challenges associated with Utility Computing, specifically that despite the tremendous benefits that are possible through a move to/toward a more virtualized development environment, and even executing against a public utility, where resources (and their costs) are amortized against an aggregate of shared use; we discussed the challenge with the inertia to change. This interesting article from Fast Company, certainly seemed to ring true: Change or Die.

Specifically, the statement from Dr. John Kotter, HBS:

The central issue is never strategy, structure, culture, or systems. The core of the matter is always about changing the behavior of people.”

The article goes on to detail that one shouldn't use the “fear of death” analog, but rather, the “joy of living”

So, live joyously, give yourself more free time, use
Sun Grid to free yourself from procurement hell, developer execution time hell, or even Hell with a Capital H (stop spending money on underutilized computing infrastructure).

Seriously, a team of developers has been working really hard all summer to launch our public grid instance, give it a try.

Monday Jun 06, 2005

Are you “locked in”?

In a recent Information Week article by Darrell Dunn, he tells of a survey by AFCOM:

“Twenty-one percent of the respondents to a survey by InterUnity Group and AFCOM say they plan to implement utility computing next year, and 10.6% of those already using the model say they expect to increase its use next year. Vendors with utility-style offerings, such as VeriCenter, Hewlett-Packard, IBM, and Sun Microsystems, are seeing the results of such investments.”

This is well and good, customers are seeing the utility in using someone else's capital to run their businesses. The problem is that so long as IBM's David Gelardi continues to say:

“You really can't look at capacity on demand in the same way as a utility like water or electricity because it's more sophisticated than that,” he says. “We're not there as an industry yet, and I know most clients aren't there yet.”

We have to ask ourselves, so what's the holdup... could it be IBM's “Customization” business, that sustains complexity within a customer's processes and systems to drive revenue? Where is that customization.... Power vs. x64, AIX vs. Linux (which distribution), and the ever increasing complexity of the Globus Toolkit target?

BTW: is this transparent to you? Also, AFCOM has a really interesting article on the importance of workflow as a utility enabler.

Tuesday May 31, 2005

Too Risky, even for Lloyds

Gotta love the Register with it's vulture logo. But I came across this headline, “Itanic sinks at Lloyds Register” and just had to laugh...

As most of you know, we have been proceeding to build out Sun Grid with a focus on Opteron based processors. What is becoming increasingly clear is that there is a cost of power - namely that power efficiencies will be a critical element of any utility offering. We really notice this when we try and power cycle quite a few racks (32 nodes x ~350W = ~11kW/rack) at the same time and recognize that instantaneous power-on spikes (booting every node at once) could cause us to blow circuits. In fact, where we used to pay “per circuit” for power, many of our co-lo partners are now moving to metered power just like we do at home. This really makes some sense because power in = heat out, which is really a recognition that though floor space is expensive, that power density is a very large contributing factor to overall cost.

This makes me double excited about our Niagra based processors which are expected to provide 8 cores at < 60W:

 Processors Throughput Images Gap

“Imagine the potential impact to IT operations: a single blade shelf designed to do the work of 32 of today's 4-way servers; eight rack units instead of 160; less than 3 kilowatts of power versus 38; one blade system to manage instead of 32 servers.”

and I haven't even started to talk about the relative balance in performance that Concurrent Multi-Threading (CMT) provides in beginning to address the growing gap in cpu clocking vs. memory latency in which traditional processor designs are frequently waiting for the data that they need to perform useful work to become available.

Sunday May 15, 2005

Q: SunGrid cheaper than Beowulf?

I recently came across this analysis from Thom Hickey, with whom I have absolutely no relationship: so his analysis is totally his own, [but I have taken the liberty to restructure]

So, let's take some of Thom's #'s:

Operating Cost (C) = $100,000/yr
Operating Cost/h (Ch) = (C) / 8760 hr/yr = $11.42/hr
CPU's (P) = 48
Hourly Operating Cost/P = $0.24/cpu-hr (which is cheap at this small scale)

Now this doesn't look that high (though I'd suggest that these economics are akin to measuring the cost to produce energy using a home generator vs. an efficient commercial gas turbine), but when we really dig into how he is using his Grid, we can begin to see the real benefit of utility business model:

utilization (U%): 48cpu-hr/day / 1152 available cpu-hr/day = 4.2% of available resources
annual consumption: 48cpu-hr/day \* 250 workdays/year = 12,000cpu-hrs or $12k in use for the compute elements.

which isn't an efficient use of capital by any measure.

As Thom states: “Even if you throw in the occasional run-away process that burns up cpu time for a weekend, out-of-pocket costs should be under $30,000/year.”

In fact, even at 4x the apparent /cpu-hr cost, to do testing, having error prone jobs, and even scaling up the jobs, we're still well under 50% of the cost of maintaining your own cluster.

Furthermore, we still haven't looked at the networking costs, the job management/operational costs and other burdens. Sun Grid, for example has an Infiniband option which provides a 4xIB non-blocking fabric. For those of you who haven't been watching Infiniband, it is quite different than the bussed fabric's like Ethernet and has performance that increases as nodes are added vs. the reverse for Ethernet. This provides a very interesting extra-chassis backplane-like technology and is roughly 1/2 the cost of a 10GbE equivalent (per port cost). Ask other competitors how their grids are architected, I'll be that you'll see that not only is our business model unique, but so is our architecture.

A: it's not that SunGrid is necessarily cheaper, pre-se, but that multi-tenancy and sharing expenses can make Sun's $1/cpu-hr a bargain; depending on your utilization. The value is only further bolstered by a substantial investment in high performance infrastructure like the IB network.

Wednesday Apr 20, 2005

Trend toward Software as a Service

I found this really interesting article on Software as a Service (Saas) and how enterprises are adopting Utility (standardized software for whose demand is aggregated by a service provider for economic efficiencies):

A field guide to software as a service | InfoWorld | Analysis | 2005-04-18 | By Eric Knorr,Leon Erlanger,James R. Borck:

“But an urgent need to stop piling cost and complexity on IT is sowing the seeds of change. Although enterprises may not be replacing effective, large-scale systems with SaaS alternatives, the SaaS option suddenly becomes perfectly viable when it comes to adding new functionality. And, as discovered, SaaS can be particularly successful at replacing in-house or off-the-shelf software that has failed miserably.”

What is exceedingly interesting to me is that there is a continuum that I have been experiencing over the past couple of years:

Round 1/Year 1: “Time to cut costs”

Customer recognizes reduced budgets and begins process of reducing costs

  • Path 1: Outsource non-critical/ segmentable application to outsourcing provider
  • Path 2: Move toward consolidation (simple) of compute and storage resources within similar classes of applications starting with non-business critical applications

Outsourcing works for first year (the true benefit year) and then change costs, and delivery cycles which although contractually committed/agreed become to stiff to allow for business agility for certain business functions
Initial consolidation does reduce costs from operations / expense from acquisition by 15+%, and outsourcing benefits are realized for most customers

Round 2/Year 2: “Time to re-evaluate strategy”

New controls negotiated into existing outsource contracts Consolidation recognizes need for improved Operational Readiness / ITIL / Process & approach standardization and moves forward only with support of business customers
In return for standards based approaches business customers begin to demand detailed metering to better understand the true cost of IT to their business
Business and IT collaborate on approach to decrease variability and the huge costs associated with customization

Round 3/Year 3: “Increased appetite for standardization”

Businesses recognize that their highly customized business processes come at a price both real (time-to-market, expense, and revenue) and regulatory pressures force upgrades ? “is the time right to re-factor our business?”
Anicdotally: in a recent meeting between a handfull of CIO's, one brought up “has anyone been able to implement XYZ - large enterprise package here- on time and on budget?” “nope”, “nope”, “nope” so what are we doing wrong?

SaaS seems to be taking off if you look at the growth of ASP's like Salesforce, Oracle-On-Demand, and other ASPs
Elaborate a strategy taxonomy for enterprise applications based upon a basic taxonomy: Custom Apps, Custom applications on compute/net/store consolidated platforms, Standard apps on fully virtualized platforms, Standard apps on utility models.
Start talking to IBM, HP and Sun about their Utility models, and Business Grid technical approaches...

One big question invariably arises: “How do we keep from doing this to ourselves again?”... answer of the day seems to be SOA, yet another acronym to solve all ails, but to me this just addresses a part of the problem... like saying that “standards will enable a complete solution to interoperability”. I think that there is actually an ecosystem change / play afoot which is very interesting... similar to the well recognized “getting Amazoned” moving toward “getting googled” (becoming inadvertently marginalized through a loss of direct relationship with a customer because of a competitors ability to offer aggregation and mass-customization at scale) we begin to see defensive plays being forced on businesses to enable improved agility only afforded by economies of scale. I've got some ideas in this space because in fact explicit collaboration between developer/BU and IT becomes a key factor in success. And I'll elaborate some of my ideas around the role that a properly architected utility ecosystem can play in bridging the gap in the next couple of days.

Keywords: ,

Tuesday Apr 19, 2005

Sun Grid != Outsourcing

So this recent article on how Outsourcing is changing got me thinking about how different utility computing is from outsourcing - though many try to place them in the same buckets.

Financial Differences:

  1. Transparent usage based pricing (come on Sam P.)
  2. $0 commit contracts - Incurred business/financial liabilities
  3. Shortened cash conversion cycles if expense can be directly tied to produced revenue in same period...
  4. No pre-provisioned assets which whether in-house or outsourced drives some expense back to business
  5. EA Sports launches new game, typically ramps up infrastructure 6mo ahead to ensure that it run's at scale, they then incur the expense no matter the revenue. With Sun Grid, they incur some cost to test at scale, develop performance policies, and then as load ramps up = expense, so does revenue.
  6. Economic advantages can be re-invested for application re-factoring/tuning for improved out term (based upon term of ROI) advantage.
  7. The same is true for outsourcing, but change fees and potential for penalties typically make this model substantially less attractive
  8. Fine grained financial control (over time to the transaction level); which allows better accounting - is this business worth doing based upon cost vs. margin? most IT departments and Outsourced contracts are too coarse grained to provide this information back to BU.
  9. (scary) futures markets, hedging/speculation

Business Models Enabled:

  1. New business opportunities emerge around “intermediation” - disinterested 3rd parties that clear transactions = “market makers” in areas of healthcare txn clearing, intelligence sharing, transaction aggregators
  2. Subscription based software (software as a services - saas) - digital business processes monetization e.g. and where the process owners may not own the data, but the processes (see ILM below). This, by the way, is estimated to be an $8B business by '07 per IDC.
  3. ILM, leverage shared global namespaces and meta-tagging to evolve a HSM strategy inclusive with archival partners (best in class) - potentially allow users to “own” their own data... e.g. I want to have control of my own financial, healthcare records, and don't trust my doctor/healthcare payor to do it on my behalf. And what about “personal archival storage” - identities, licenses, photo's, videos, music, critical digital documents.
  4. Make money from contributed IC... component ecosystem
  5. Make MORE money from invested capital (power production) or invested intellectual capital (harnessing produced power with know how) due to multi-tenancy = volume/scale

Operational Efficiencies:

  1. Multi-source, contingency/DR planned across multiple sites
  2. Servicable within an existing data center - take our Sun Grid patterns, become certified, offer excess to community for profit (improved cost profile)
  3. access to IC/best practices and core facilities around both the standard offers, and through CSO for custom implementations
  4. customer has ability to build a site/pod to our specification and “sell” unused cycles back to the grid once “certified”
  5. increased ability to apply policies to workload management
  6. customer can scale internal workloads to the grid (peaking capacity)
  7. Choice with control - can engage on grid at multiple levels depending on appetite for management and standards, can furthermore extend telemetry and control differentially to feed back management information to end customer for management purposes.
  8. Can be used by CFO/CIO as drivers to a consistent enterprise architecture w/in a company
  9. Fine grained metering can help BU's better control costs... if I know the cost of a txn, can I better judge whether this transaction is worth tuning or even doing?

Developmental Improvements:

  1. Perfect platform for open source development and support of developer communities (keep each developer's environment consistent) = lower complexity, inc. productivity.
  2. Collaboration network to allow for shared code development (e.g. outsourced development) using a “dis-interested 3rd party” to host the environment
  3. Have platform for systemic test at scale (ability to leverage deployment plans vs. pulling cable to re-factor core infrastructure)
  4. Multi-tenant basis of Sun Grid is equivalent to isolation being looked for within Corporate BU's for next step in consolidation
  5. Secure platform for SOA provided by Sun Grid Federated Identity, and Entitlements systems - choice with control
  6. Access to tools such as SALSA for improved architectural control in development projects (SALSA is a pattern recognition and enforcement tool that helps to ensure consistent enterprise architecture pretenses are maintained, but it takes substantial resources to run) - yet another development service on Sun Grid.
  7. Component ecosystem - free/fee componentry from base services, to service aggregates - creates “market” for components and business processes that are orchestration models through a component “vending machine”.
  8. Tooling for business developers including “wiring” tools to orchestrate business processes across an Adaptive SOA
  9. Tooling for workload analysis to allow for “best” provisioned resource(s) - splitting job flow across pods of different types e.g.Niagra(tm), Opteron(tm), SPARC based upon “best executor” and interconnected by high-speed mesh (NUMA backplane). Check outAndy Ingram's work on workload analysis

IMO, we just cannot get away from the fact that outsourcing contracts & providers will typically invest in the lowest cost of service to enable an SLA (whether it's well written by the customer or not because tradeoffs cannot be made accurately a priori (see JPMC and my response - “yeah right”)). Benefits to outsourcing, typically financial contribution in year 1 with downstream being less lucrative for the business - after all, someone does have to pay. Benefits for Sun Grid, typically start in out months/year due to start up/Non-Recurring Engineering costs, but downstream can be substantial -> small depending on customer choice to re-invest dividends and cost of their IT today. Outsourcing seems to be a CAPEX/OPEX driver, but Sun Grid can also drive revenue opportunities because of open-market & community nature of ecosystem that we need to construct. Will there be a market for highly tuned financial instruments that an individual/company can use to offset development/support costs? can you do that with outsourcing? Here's for cheap, with choice, and control!Technorati cosmos

In short:

Outsourcing.isVeryDifferent(new SunGrid()) == true;

Monday Apr 18, 2005

Top Line Benefit for Utility Computing

In it's most basic form, Utility Computing is yet another cost cutting measure of helping an IT reduce it's cost of service; both current and future, at today's scale and tomorrows. But, quite honestly, there's a top line benefit to go along with the bottom line benefit... the top line is represented by a utility community ... a Co-op in typical energy terms where a variety of participants can “add value” to the collective, and the collective is enriched by the the value of the collective whole.

I'm really excited about the opportunity for Sun Grid to establish a network where components representing Intellectual Capital can be established, to finally create a “capital market” for software..

intellectual capital

Think about it, what represents a capital market:

  1. There are instruments that have inherent and stable value
  2. There is the enduring ability to exchange value for value across instruments
  3. There is a permanent marketplace / exchange that is a designated place where people can go to exchange value.
  4. There is sufficient fluidity in the market to allow the market to be established and maintained

(attrib: hal stern)

Recently (well in March), we established our intent to create Sun Grid Utility Compute and Storage cycles as instruments for exchange with our partner Archipelago Holdings, and this is massively exciting because it just breathes of transparency just like any other commodity market.

So back to Sun Grid, we were talking about Intellectual Capital, represented in components, stored in Sun Grid's asset manager, and properly meta tagged for use by developers. When these components finally become commonplace, I'd expect the best components float to the top and make a lot of $, and the worst components just sit gathering dust in the asset repository until they are aged out of the system. I can even see competitive exchanges for components at development time - finally compete on implementation vs. integration, and eventually even move the components upstream to include entire business micro-processes... things that vertical markets may need, to comply with specific state/local/federal/international laws, but for whom the market was to small for a “software company”; but with a fully automated exchange for value, these processes can afford to be maintained (I'll elaborate this more later because I think that it's potentially the most exciting paradigm shift enabled by the Sun Grid.





« July 2016