Saturday Jul 22, 2006

The Long Tail of Web Applications

Listening to National Public Radio (NPR) the other night, I was intrigued by an author, Chris Anderson who coined the term “the long tail” of sales in his book: The Long Tail: Why the Future of Business is Selling Less of More. Chris used an arbitrary online music (by the song) retailer, and quoted something like 95% of the inventory of the eTailer has sold at least one copy, and something like 85% of the revenue is from the things that represent the bottom 85% of the inventory... this means that the OLD 80:20 rule (80% of revenue addressed by the to 20% of products) may now be proven wrong by the channel dis-intermediation offered by the Internet.

Translating this into something that I'm passionate about, Utility Computing, may mean that the revenue that utility providers should be chasing might not come from the “block buster hits” = the applications that people fund having built today, but rather from the 80% of applications not yet invented because the communities that they would benefit are so small that today's assembly techniques prevent their development (from a revenue/margin justified business position). Today, because of Utility Computing providers (pay-as-you-drink), productivity enhancing languages like Ruby, and dynamic application assembly techniques (“mashup” seems to be the word de vogue), perhaps UC should be banking on some of the enabling technologies to broaden the field of available applications... something unique for everyone vs. one size fits all.

Tuesday Jun 13, 2006

Real Money Here!

As many have seen, Sun is offering up a $50k bounty for the coolest application.

Coolapps Graphic I'm starting to get some really good questions, which lead me to believe that there are some mighty cool things being done. If you are writing a cool-app, and need some help, please do query the SunGrid Developer Community. I'll hide my email from the blog, but if you are interested in contacting me directly, it's

Have some fun, build something that simulates protein folding, airfoil shaping, analog circuit routing, 3d rendering ... there are lots of cool things that I think can leverage all this compute power. Good Luck!

Thursday May 25, 2006

The Sun Grid $50k Contest & the Doubter

While I appreciate Greg Nawrocki's perspective on the challenges associated with growing a community of users for Sun Grid and more specifically the Cool Apps Challenge, and respect his viewpoint; I do believe that there are a couple of things that Greg needs to remember.

  1. Grid computing has been around for a very long time, mostly in the academic spaces, but also in enterprises.
  2. The Academic / Scientific endeavors tend to frequently push the bloody edge of technology, which means that they are constantly battling for more “resources” to solve their mathematical/relational problems.... Chips are “never fast enough”, “memory is never fast enough or big enough”, “storage...” you get the picture.
  3. Enterprises are dealing with substantially more data than they ever have, and the analysis of that data - turning data into an actionable thesis is seen even more strongly as a competitive imperative.... this is taking enterprises into the theoretical realms once reserved for science.
  4. At the most basic level Grids are nothing more than a collection of resources, that can be combined to produce a variety of systems, though Grids today address the more data-crunching intensive workloads - some embarassingly parallel, there is nothing preventing a grid of resources from being used to compose an enterprise system substrate, in fact Sun's N1 initiative, Platform's Symphony, Cassatt Collage, DataSynapse, Parameus, GigaSpaces and others share this vision of composing systems from common components (should sound familiar to the SOA crowd).

All in all, Greg is right, having a competition is a mechanism to increase awareness of the Grid's capabilities and applicability but, I do have to ask “why not.” If the killer application for the grid emerges, great, if it doesn't; it has still increased the visibility of the value of utility computing and the impact of a mesh of consistent resources. Are developer contests right for the enterprise? - hey, who knows. Because Southwest Airlines emails out a “Cities Special” do you refuse to fly? And, what about Google's Summer of Code, or the Intel Game Demo Contest let's face it competitions are another form of promotion, and as an effective brand and community building technique to the all important developer community - independent or enterprise (most of us do both anyways).

Most enterprises share a set of challenges; now that we have already seen the impact of platform consolidation on both acquisition costs, and operations costs, and now we're just beginning to address the impact of virtualization on data center utilization (raises it) and operational factors (makes it more complex to manage). Virtualization impacts every layer in an enterprise architecture, and applications need to be better able to deal with horizontal scale/distributed workflows, and the impact of accumulated MTBF. I talk frequently about Deutsch's fallacies; distributed partial failure needs to be addressed explicitly by an architecture, and increasing awareness in this area is important for the entire IT industry. Sun is actively building a community of partners and customers who, in a like minded way want to solve for increased operational / capital efficiency, increased scale, and increased agility - difficult problems, but things that only a large community of constituents can solve.

So Greg, we had to start somewhere! if you know of a better way to get to our end goal of agile-distributed-component based-failure resistant-capacity managed-systems, I'd love to hear it!

Friday May 12, 2006

More on the “Google-Mart” proposition

A number of readers have contacted me to discuss my last blog on the emergence of Microsoft and Google not just as online/marketing entities and content service providers, but because they ware well funded/positioned to be mega-ASP's dis-intermediating a swath of independent service providers. The initial inception of my thinking in this area came from Robert Cringley's blog back in November, 2005. In this article, Cringley cites ample evidence of this strategy along the following axes:

  • Google is buying up lots of Dark Fiber (note that bandwidth is one of the critical factors in a successful application services platform)
  • Google's “shipping contianer” in Mountain View which is said to contain “5000 Opteron processors and 3.5 petabytes of disk storage that can be dropped-off overnight by a tractor-trailer rig”
  • Google is building data-centers in places where there is cheap power and excellent “rights of way”
  • The re-introduction of the Google Accelerator

So I'm not saying that we should support/react to Cringley's conspiracy theory, but we have to look seriously at “someone” doing this, because the opportunity is just too lucrative not to explore. All of a sudden, small self contained data centers start cropping up in macro-level packaged solutions, located at/near power sub-stations which are also frequently co-located with networking rights of way... and Google prepares to deliver it's services very, very close to the consumers... and that proximity can lend tremendous contextual benefit to the service mix (demographically aligned) as well as doing it at prices that traditional competitors cannot touch.

To me, the exceedingly interesting Computer Science aspect of this work is the establishment of a new layer within the traditional IT infrastructure - the Network Operating System, which, like failed attempts at NUMA systems, may finally hold the reliability, serviceability and scalability benefits that could allow one to run an orchestrated service on a “grid” of computers, just as they used to run in “user space” on a single computer - the promise of Java (for homogeneity) and SOA (for dynamic orchestration and late binding), realized.

Sunday May 07, 2006

The True Value of Information Ownership

An article, published on 5/5 at seems to be a harbinger of something either interesting or sinister happening in the Internet/2 realm. Here we have Google which makes the majority (like 98+%) of their revenue through advertising moving to take on a large role in information meta-indexing and even data management. And then we have Microsoft, who sells software with 95% of their revenue from the sale of “old-economy” shrink wrapped software, also moving to use data management & storage as a mechanism to build stickyness into their service economy (and I might add to execute on a defensive strategy against Google and the Free/Open-Software movements).

If you look at the cost of operating a data-center, they tend to get organized into a couple of categories:

  1. Facilities expenses: dominated by power and bandwidth
  2. Operational expense: dominated by labor
  3. Capital expense: increasingly dominated by data management systems but also by computers
  4. Software expense (sometimes capitalized): dominated again by in typical order: data management systems, vertical integration software, and infrastructure middleware

There are different strategies employed by different kinds of hosting, hardware, infrastructure software vendors to minimize these costs either within category or by trying to integrate capabilities and therefore uses systemic engineering approaches to minimize total cost. If one looks at Google, for example, they run their own data centers which allows them to maximize the interaction between compute/storage density and the facility costs (they control all the variables in 1/2/3 above - the typical IT charges), as well as rolling their own OS image, middleware software, and even File Systems (GFS) to manage 2/3/4. Microsoft, on the other hand, with their core business in engineering hyper-integration of 4 to reduce costs, needs to understand the impact of doing 1,2,3 on their existing channel and determine whether they can afford to do the focussed engineering to do the feature reduced / reliability enhanced engineering around their integrated software stack to achieve 1,2,3.

What is interesting to me is the yet undefined intersection of business data stores, and the online shared data sets (see Microsoft's Map the world in real-time initiative). This is to say the information economy is built around the premise that those who can can have the best information (or process information the most accurately and efficiently) will have more value than other category players. So if we start having these data centers (note the term data centers is very accurate here), built through advertising $'s, at what point do the “owners” of this data, or potentially more importantly the data's relationships begin to control a macro-economy, able to exploit this ownership to incredible, even monopolistic possibility.

I have to say, and it should come as no surprise, that the notion of utility data centers which aggregate demand against a shared physical plant does offer the possibility of democratized computing. Utility Data Centers effectively level the playing field between those who have capital and those that don't. By creating an open-marketplace in which the efficient use of assets, and the “value” of the services that one develops enables a small player to expose their innovation and compete successfully against the large players. With the recent moves by Microsoft and Google, one has to wonder if this is yet another emerging market where people fail to recognize the potential for monopoly abuse until it's too late, until the critical data needed by other “information economy players” becomes owned by a select few, and access to this data becomes too costly to compete.

Wednesday Mar 22, 2006

Part I: SunGrid “The Beginning”

Albert Einstein said “The significant problems we have cannot be solved at the same level of thinking with which we created them.”

The Sun Grid projects started in earnest more than 2 years ago under pretty different pretenses. Sun's Sales organization was struggling with the high levels of efforts and residual expenses associated with letting Scott McNealy visit with a customer. He couldn't help but tell them about this Ultra-Thin Client device which could save on power, operational expense, and even allow for hot-desking to reduce real-estate burdens - in fact Sun saved more than $27M in the first year of the program alone. And so Scott would go on to promise each CxO level executive a SunRay that Sun would come and install for them as a trial.

Alamo PlantThe Technical Sales Organization (TSO) first setup a blueprint and demo kits, but we weren't as fast as Scott's plane. Someone (Brian F.) had the fantastic idea, to put a Desktop “Service” on the Internet, and now we could just parachute a SunRay to each of these executives, and they just plug it into their home broadband router, and they are running... coined “CxOnet” the plant has been running for about 2.5 years (thanks Brian), which grew up to become Project Alamo, and now “Sun Grid”.

I believe that this was a precipitating event in realizing Scott McNealy's long term vision of the Big Friggin Webtone Switch as well as Greg Papadopoulos' predictions around the demise of shrink-wrapped software, and the emergence of next generation utility data centers, and of course to Jonathan Schwartz who acted as a catalyst in moving towards subscription/utility software models. (Obviously a lot of credit has to be given to the Networking Infrastructure Providers who, during the great build out precipitated by the DotCom boom laid a tremendous amount of Bandwidth around the country, and brought substantially more reliable/redundant networks to every business and most every home.)

This new value proposition, of not having to “run” your own data center is obviously still in it's adolescence, but I think that the very patterns for multi-tenant secured computing across a computing mesh - not unlike the power grid, offers scale, reliability, consistency, and affordability that will be difficult to match. The multi-tenant isolation - has been the critical tenant of security design, and many will herald this as a return to mainframe computing. Let's just remember that mainframes had to be multi-tenant because of the cost of the physical plant, they had multi-tenancy engineered into the very Operating System (as does OpenSolaris today, I might mention). In today's horizontal compute fabrics, the very networks that connect computers lay challenge to this tenant, and the Sun Grid architecture is designed to enable a network operating system of similar control, and yet the economics of running it are rarely brought into question... The question being, what is the cost of low utilization (a lack of true multi-tenant virtualization) on your operating expense, and potentially on your needs for complex compliance legislation.

Sun Grid is not perfect, it never will be, but I believe that this is a monumental step towards more affordable computing for end-users and corporations alike.

I would like to take this time to thank the whole Sun Grid Team who has worked around the clock for months as we have been transiting the final hurdles towards release, and the many experts throughout Sun who have provided invaluable insight and assistance towards this shared goal! Thank You All!

Sunday Mar 19, 2006

Running Jobs on Sun Grid that require “Service Containers”

Sun Grid's resource management semantics basically dictate that jobs be self-contained, and terminate all processes in order to exit. The problem with terminating processes in a grid context is that it's not quite as simple as doing a PID trap on a single host, instead, you need to use the qsub, qstat and qdel commands to better manage your distributed jobs.

The example pattern that I'd like to elaborate is one of a “server/framework” which needs to run in order to support a client. Whether a simple RMID, or a more complex instance of a web server, app server or JavaSpace, the pattern is very similar. The developer wants to:

  1. Start up one or more servers (in our case 2, the httpd and the GigaSpaces Enterprise Server)
  2. Make sure that the servers are running
  3. Submit the client and wait for the client to complete
  4. Shutdown the Servers so that the Sun Grid Job can terminate and stop the meter

First some basic syntax:

  • #$ = new directives for SGE which do things like populate environment variables (-V)
  • qsub = submit this task to the grid for scheduling.. we use a couple of opt
  • “-sync n” fire and forget... don't wait for the job to be scheduled
  • “-N <jobname>” not required but could be used for parsing qstat... unfortunately qdel requires a jobid instead of a job name (to keep you from shutting down similarly named jobs)
  • “-t 1” or -t 1-4:1“ submit a job to one or multiple nodes with a minimum
  • qstat = get the status of the SGE queue, which in the case of Sun Grid will only return the jobs that you own for privacy purposes
  • ”-r“ only return the ”running“ jobs... jobs that are waiting (status=”qw“) are excluded
  • qdel = delete / stop the specified jobs

Now onto the listing:

#! /bin/bash
#$ -V

# if we are running against an older version of SGE, the ”$ -V“ direction
# will not exist, so be sure that we source the SGETOOLS (or at least try to)
if [[ ${SGETOOLS:-”unset“} = ”unset“ ]]
echo setting SGETOOLS

echo ”Starting the GigaSpaces Servers“
GSC=`qsub -sync n -N gsee-gsc -v GSEE_HOME=$GSEE_HOME -v GRID_HOME=$GRID_HOME -t 1-4:1 $GRID_HOME/bin/gsc`
GSM=`qsub -sync n -N gsee-gsm -v GSEE_HOME=$GSEE_HOME -v GRID_HOME=$GRID_HOME -t 1$GRID_HOME/bin/gsm $GRID_HOME/config/overrides/gsm-override.xml`
echo ${GSC}
echo ${GSM}

#SGE Job return syntax is XXXX:X-X:X where $JobID:$rested_min-$max:$Actual_min
# so trim out just the first XXXX which is a regex matched from the 3rd field
MATCH=”\\(.\*\\) \\(.\*\\) \\([0-9]\*\\)\\.\\([0-9]\*\\)-\\([0-9]\*\\):\\([0-9]\*\\)“ #simple match for multi-node job
MATCH2=”\\(.\*\\) \\(.\*\\) \\([0-9]\*\\) \\(.\*\\)“ #simple match for simple 1 node job

GSCparsed=( `echo $GSC | sed -n -e ”s/${MATCH}/\\3/p“` )
if [[ ${GSCparsed:-”unset“} = ”unset“ ]] then
GSCparsed=( `echo $GSC | sed -n -e ”s/${MATCH2}/\\3/p“`)

GSMparsed=( `echo $GSM | sed -n -e ”s/${MATCH}/\\3/p“` )
if [[ ${GSMparsed:-”unset“} = ”unset“ ]] then
GSMparsed=( `echo $GSM | sed -n -e ”s/${MATCH2}/\\3/p“`)
echo ”Jobs $GSCparsed and $GSMparsed submitted“

# wait for these jobs to showup in qstat
until [[(”$GSMstatus“ > 0) && (”$GSCstatus“ > 0)]]
#evaluate the qstat -s r response (running jobs) to make sure that the
#requisite jobs are running
GSCstatus=$(qstat -s r | nawk '/'${GSCparsed}'/{var1+=1} END {print var1}')
GSMstatus=$(qstat -s r | nawk '/'${GSMparsed}'/{var1+=1} END {print var1}')
echo ”GSCstatus = $GSCstatus“
echo ”GSMstatus = $GSMstatus“
echo Server status is $(qstat -s r)
sleep 10

#run our application - in this case, use multiple nodes to help us calculate prime factor
echo ”crunching“
~/ $1
echo ”done“
#clean up
#parse jobid's out of GSM and GSC
echo $(qdel $GSMparsed $GSCparsed)
#go ahead and print out the queue status on the way out to verify cleanup (optional)
sleep 10
echo ”Leaving...“ echo $(qstat)

Hopefully, this example sheds some light on some of the mechanisms that a developer might enlist in order to launch more complex, server dependent applications against the Sun Grid. Please let me know if I need to elaborate further. I want to take this opportunity to recognize GigaSpaces, and specifically Dennis Reedy for his help in putting together a grid job which could flex a couple of nodes against their GigaSpaces Enterprise Server 5.0 environment. I'd also like to thank Bill Meine and Fay Salwen for their scripting assistance.

Keywords: ,

Wednesday Feb 15, 2006

Mash-Ups and Dynamically Provisioned Services

As I have been watching all of the discussions about mash-ups, I have been wondering if traditional integration mechanisms employed by the developer community are really well suited for this new environment in which services (yours and others) are embraced, combined and extended in order to deliver some new aggregate value proposition.

I really view mash-ups as a way to take someones intellectual property and extend it to address some new use-case for which the original designer may/not have designed. This extension creates numerous problems that include license (approved use) - something that I'll allow the attorneys in the audience to argue about, but interesting to me is the context under which the component in question was designed to be used, and mechanisms to elaborate that context to help the mash-up developer understand the critical “ilities” - reliascalavailaserviceability of using their application in a “production” environment.

As I looked for analogous problem domains, I happened upon the Integrated Circuit industry - an industry that has been moving from discrete semiconductors to integrated circuits to Systems on-a Chip (SoC). The proliferation of specialized “cores” (provided by IP backed designs) and the recognition that customers are looking for single chip solutions for cost, space and power reasons, has evolved a set of processes and tools to combine these cores into a single system - and in doing so has forced substantial changes in the Electronic Design Automation (EDA) field.

I hypothesize that a similar revolution will unfold for the “service oriented” world that is being espoused by just about every rag, and in most every IT shop. Taking cues from EDA, it might look something like this:


Some components will need to be developed –allowing the expression of business processes and critical IC in a programmatic language. However, it is anticipated that as components are added to an Internet enabled / distributed registry (think about the catalogs that we used to receive from IC vendors) that developers will become focussed on assembly to enable business processes, and proven business process patterns will become building blocks at the next higher level.

Tooling/Paradigm: Specific business service assembly tools + Business Process Modelling to provide both component development and component assembly at the functional layer, with extensions to the domain model specific to the component abstraction that allow for systemic constraints to be suitably defined. The analog in chip design seems to be VHDL/Verilog & SystemC.

Once a workflow/business process has been composed, the developer needs to be able to verify that it behaves as intended (before worrying about the systemic constraints). The output of the verify step should be a constraining graph based upon what we know about new and existing components to allow the system to plan for the process deployment. Over time, as component sub-systems emerge, they will be pre-verified (ebay model).

Tooling/Paradigm: there need to be a set of tools / processes that can be run to ensure that interfaces are appropriately wired, test cases executed to ensure appropriate functional result is achieved. The analog in chip design is Functional Verification.

Each of the components has systemic constraints, the system nowneeds to leverage rules/policies to determine the overarching constraintswhich best characterize the characteristics of the defined model. In this way the system can begin to understand how things like trans-action performance (viewed as latency), high availability (viewed as uptime), can be elaborated and tradeoff decisions made with the help of the developer (cost / time).

Tooling/Paradigm: once the system is functionally defined, the constraints need to be organized to ensure performance, and discrepancies resolved. This results in a systemic design in which the no constraints remain “at odds” with one another. The analog in chip design is Design Synthesis

Now that the constraints are fully understood, we can begin to group components and map them against known capabilities of the infrastructure, selecting appropriate provisioning and operational policies/rules and bringing those component based plans together into a federated construct that can be used by the observability and management systems to deploy the system.

Once the plan has been developed, it can be delivered to an executor. There should be (at a minimum) 2 execution types: try it, do it. Try it should allow the interfaces to be excercised so that the plan can be validated/ verified against “production like”. At which point the plan can be certified to run at scale. This 2 step process is critical as it will help us maintain control of rogue applications (unintentional) that may not behave well. Furthermore, execution includes a mandatory monitoring/auditability that can enable an operator to better re-plan over time for better Service Level performance at lower cost.

Thanks for reading, as I stated above, this is just elaborating an analogy, whether it proves valuable in SOA is yet to be seen, but I'd love your comments.

Friday Nov 18, 2005

The Sun Grid Environment & the value to the developer eco-system.

<cross post from>

Sun Grid looks like a traditional IT stack, exposing common interfaces to enable developers and ISV's to target different abstraction layers:

Business Opportunities

The core environment is made up of a “Resource Factory” (RF), a production plant that is optimized to produce power at appropriately consumable chunks (balancing the economics of operations/distribution against typically demanded performance units). In this model the resources are managed by a Distributed Resource Manager and similarly metered/monitored using commercial DRM's and OSS/J technologies through a Service Data Record (a superset of the traditional telco model of a CDR or Call Data Record). Over time, Sun Grid will allow for multiple/pluggable Distributed Resource Managers through the support of a super scheduler which will allow a variety of resource management models to be deployed to support the various container/containment strategies for managed workloads.

Where the RF and it's blueprints are probably interesting to Data Center operators/managers who themselves are embarking on virtualization and distributed resource control/managment projects, the developers will probably not have a lot of use for this abstraction. IMO, what developers are really looking for is a set of runtime containers that are as thick/thin, complete or pluggable as their applications demand. This means that we need a container model that allows for specific orchestration.

For example, in a traditional 3-tiered (I know, nothing is traditional anymore) architecture, the separation of concern drives us to the isolation of the MVC across multiple tiers.... this means that we need a content delivery/presentation container, and a business logic container which is suitably aligned with it's data management containers/services. How do we describe these relationships in both functional and non-functional ways? Right now this is more art, than science which is to say that there are a number of different mechanisms, each with specific value propositions, and each with their own warts. Sun has the N1 Service Provisioning System (SPS), we also have DCML (an OASIS activity), WS-Management, and even techniques from the GGF which are trying to expose enterprise services using declarative markups.

This blog is getting a bit long, so I'll follow up in a second, but let me not leave without talking about the REAL value proposition for developers (the higher layers), here are some of the services that I'm thinking about:

  • a repository for open source components and libraries allowing for both forward, and regression testing that developers can use, without distribution licensing concerns (caveat emptor) to build, test, re-factor, test new version... their service / application
  • a repository, a vending machine really, for commercial components that can be used in a metered/rated way (see OSS/J note above) by other developers to shorten time to deployment, improve competitive comparison, open up new opportunities and markets, and allow components to compete in an open capital market/exchange (is my component really $xx better per use than yours... if it is then I'm incented to keep developing).
  • a service facade for public/private hosted services that build on the economics of the Sun Grid and the pay-as-you-use model, as well as providing improved integration through close proximity to other's services (proximity remains very important in reducing latency in distributed service based architectures).
  • common services, including things like federated identity and common entitlement policies that allow these ecosystems to emerge.
  • a piloting ground for commercial software... instead of shipping a disk, why not take a SaaS approach where a demonstration entitlement is granted and instantly provisioned... how much could we reduce the cost of software sale.
  • and finally, but not least, produce a mechanism, like the component/service mechanisms above to allow for business processes to be patterned, shared, extended and sold to perform regulatory or other tasks. Things that haven't yet emerged in the open communities, but are certain to once suitable service oriented substrates are available.

I hope that I've been able to elucidate some of the value that a common utility could provide, and facilitate through the Java.NET community as we move forward on this journey.

Please join us in the Sun Grid Community, sign up a new project, get some free grid time, and let's move this vision forward.

Tuesday Nov 15, 2005

So you have a new project & don't want to “buy” more computers?

Depending on who you talk to, computing is a competitive weapon or a cost of doing business, either way it's an investment. One question that each and every new project has as it moves from the business plan to implementation is the often excessive cost and time to put into place the infrastructure so that you can finally realize that killer application.

With the advent of utility computing, and before that, outsourcing, businesses gained the ability to shift the costs from capital costs to expenses, and in doing so have improved their capital portfolio and cash flow positions. Now you have a new initiative, and though you have models/projections wrt. the data transfer, loading, processing and storage rates, you are probably still not quite certain of how this should be architected. One potential BluePrint comes from Sun in it's Sun Grid Rack System K2A Gridrack 3and Sun Grid Utility Services approach. This blueprint gives you the ability to take industry standard x64 servers or the novel SPARC CoolThread (Niagra T1) based systems to grow your computing plant just in time, in small incremental grains with incremental cost ( the value engineering done by the core utility team).
I1 T1 Lg

Take advantage of package density, operational automation, systemic monitoring and metering and workload scheduling advancements that are currently under investigation by the Sun Grid team as your solution matures. Furthermore, this puts you under initial control of your near term deliverables (your hw, your sw, onsite) and aligns with options for fiscal improvements/flexibility in longer term. If your initiative is radically successful to the business, you look to “join” Sun Grid, allowing your work to be distributed onto the Sun Grid mesh of data centers in addition to your own; this gives both time to market, peak scale as well as geographic diversity for high availability.

How do we get started?

  1. Let's talk about the blueprinting - will your application fit this design... horizontal computing services are different from vertically integrated services in their treatment of memory (the largest issue) specifically the unified memory architecture of SMP machines is a critical facilitator to some large scale transactional systems.
  2. If we can go horizontal - which most if not all new applications (green field) can, then let's think about the data flow to look at the points of constriction. Areas where we become hardware limited because of disk, network and cpu speed/contention.
  3. With these “critical to scale points” in mind, let's determine a scalability strategy to help us parse this load coherently and with availability. Many of todays applications are well suited for pipelined approaches as in the image here.
  4. determine the core services that need to be shared, and how these core services can be federated benefiting both your company and your partners - federation is about the sharing of responsibility and control.
  5. go back to 2, and continue to refactor until scalability can be addressed with some fudge factor that lends the software developers some flexibility in approach.

We are ready when you are with a set of System Integrator Partners, ISV's, Client Solutions team, and a very active Open Community to help you take advantage of these emerging models, simplifying your Data Center, and changing the economics of corporate and research computing.


Tuesday Nov 01, 2005

MVAPICH for Solaris Released

We have seen numerous press releases on Message Passing Interfaces (MPI) lately including those from Microsoft who has been working with Argonne Labs (funding a Win32 port) of MPICH2, and this, most recent announcement of Ohio State University's port of MVAPICH to Solaris across Infiniband.

Sun has been collaborating with OSU for a long time, working with Linux and Solaris on both SPARC and x64 based platforms. The current announcement from OSU is a novel MPI-2 based design (at the ADI-3 level) providing uDAPL-Solairs support. So what is this acronym soup?

Infiniband: a high performance switched fabric providing high bandwidth (in excess of 30Gbps) and low latency (can be lower than (<)5ms for serial I/O (channel based) between two host channel adapters (HCAs which are available at costs < $70). This fabric utilizes a separate I/O | communications processor from the traditional node CPU to allow the independent scaling of I/O and the offloading of I/O responsibilities allowing performance & cost tuning of computing clusters. Typical per port costs are in the $300 range (HCA & TCA) vs. >$1k for 10GBE adapters, so performance@cost is definitely in IB's favor for the highest of performance needs.

Message Passing Interface (MPI): established in 1999 to provide a standard set of message passing routines that focus on both performance and portability, recognizing that these goals are often at odds with one another. MPI-2 work was begun in 1997 was designed to realize areas where the MPI forum was initially unable to reach consensus like one-sided communications & file I/O. Basically MPI, makes use of GETting or PUTting ( or ACCUMULATE) data from/to a remote window that reflects a shared memory space in non-blocking ways for parallelized performance (an older, but still relevant tutorial from University of Edinburgh).

User-Level Direct Access Transport APIs (uDAPL): there has been a need to standardize a set of user-level API's across a variety of RDMA capable transports such as InfiniBand (IB), VI and RDDP. The model, is a familiar one to most infrastructure programers, that of a interface producer (both local and remote) and an interface consumer that has visibility as to the localness of the provider. uDAPL is designed to be agnostic to transport ala IB to unlock consumers (like MPI) from the intricacies of the underlying transport in a standardized way. Within this layer cake, it is expected that a uDAPL consumer will talk across a fabric to another uDAPL consumer though this is not mandated, it is common practice.

MPICH & MVAPICH2: are implementations of MPI provided by a variety of entities (mostly government agencies/labs and universities) which are frequently competed on features and performance. MVAPICH2 has been focused on IB, whereas MPICH2 supports other interconnects including Quadratics and Myrinet, either way, the goal is to create a high performance consumer (programmer) interface that can sit on standard or customized interconnection stacks. Where MVAPICH2 tends to shine is in larger packets providing higher bandwidth (though at a cost to small packet latency). A reasonable comparison from OSU and Dr Panda here (though we have to remember Dr. Panda's sponsorship of MVAPICH).

So that was a short summary, but hopefully this just wets your appetite for looking at architectures like Infiniband for constructing highly performant Grids/Clusters, and some of the techniques that you might request from Sun Grid to accelerate your parallel applications.

BTW: Sun Grid has MPICH 1.2.6 pre-installed including Java wrappers, here is a sample deployment script:

Sunday Oct 23, 2005

jeri rocks

Jeri or “Jini Extensible Remote Invocation” has been in the back of my mind lately. My problem has been to find a way to allow for a richer interaction between a server side processing environment, and a set of client side executors. The challenge has been that traditional approaches dictate a tight coupling of a client side representation of a model and a server side implementation... with constant “polling” or firewall challenged notification.

As Sun Grid nears reality, the engineering team is focussed on enriching the experience enabling Sun Grid to be:

  • a stand alone computing environment
  • an integrated services platform (but still fairly discrete in functionality), or
  • an extension of your data center.

The last opportunity provides the holy grail, a compute model that can dynamically flex between running services in your data center to running services in the extranet (Internet). In this case one must be concerned with the security model, the proximity to critical data/services, audit & debug, service level agreement, and so on. One model pushed by many is using the WS\* “standards” to provide the framework for interoperability, but I think that many have already learned how the programming model needs to change as the business loosely couples components and the resultant impact on compute efficiency (latency & throughput).

Java, for a long time, has been looking at models that range from JRMP, IIOP, JMS and other protocol/communications stacks, but many Java developers keep coming back to RMI because of its ability to coherently operate systems made up of local and remote objects.

Jeri has been around for a couple of years (since 2002, I believe) when a couple of critical JSR's were voted down by the JCP. At this point, the Jini team realized that there was but one way to strengthen the security model behind Jini if it weren't going to happen in the core JRE, and that was to create a new implementation of RMI on top of a novel protocol stack.

Jeri itself is a layered implementation made up of a marshaling layer, an invocation layer, and a transport layer, this allows for separation of concern and the ability to replace individual layers so long as the contracts are maintained, thereby giving systems architects a tremendous amount of flexibility around specific implementation. A really good tutorial is available here, with a few examples that really helps to make some of these key points.

Back to Sun Grid, we need to expose application interfaces that include a portal (for browser based clients) with channels provided by Sun Grid and others, a WS environment including a registry, again for Sun Grid interfaces, and 3rd party interfaces, and will probably need a more coherent set of application interfaces that bridge the POJO, Swing Framework, Jini, domain so that we can create the kinds of dynamic systems that are emerging based upon Plain Old Java Object (POJO) based components running as services.

(It should go without saying that we also need monitoring, management, debugging, metering, entitlement, ... systemic interfaces), and though they will likely be different, I think that it's well within the realm of possibility to suggest that the afore mentioned interface technologies can play a large role in realization.

Keywords: , , ,

Wednesday Oct 12, 2005

Musings on SAAS

We need to evolve our thinking about why and how people purchase computers and software as todays model is terribly inefficient:

  • software “as a service” (SaaS) can be a very powerful tool to emerge ecosystems in which online services are consumed rather than “downloaded, licensed components”. Driving much of this is both the cost and complexity of distribution (both buyer and seller), as well as the low utilization of the systems where the components are deployed.
  • todays software distribution model favors the largest of players which is to say the players who can afford the high cost of sales and service under the buy it, get it, install it, use it model of distribution.
  • ecosystems will evolve which take the ebay or Amazon model of marketplace aggregation to services, but with software, unlike hard goods, the qualities of service become the critical differentiators
  • we do not yet have either an appropriate meta language nor “Trader Service” (yes, from my Corba days) which might allow services to be found and effectively bound in an on-demand fashion.
  • we also lack the repository which can manage these components and their entitlements in such a fashion to enable true monetization on a “fair and equal” playing field... .typically this is provided by an open marketplace using explicit symbology.

I haven't yet talked about next generation orchestration languages that could allow “processes” to be also purchased (things that System Integrators typically look at as their critical IP - a pre-proven HIPAA or SEC process that a company could use to improve their own process compliance, but it's certainly a possibility once the component marketplace is established. In all, very exciting new world that Public Utilities enable.

Overall, it's very exciting to watch Google, MSN, Amazon and Yahoo battle in this next frontier, after all I call for a trader service, and symbology, which IMO is their strong suits. The question will be which of these companies has the right tools for the communities of developers, and can provide the “grease” to make their marketplace the most valuable and therefore attractive.

Sunday Aug 21, 2005

Change or Die

When I was recently lamenting the challenges associated with Utility Computing, specifically that despite the tremendous benefits that are possible through a move to/toward a more virtualized development environment, and even executing against a public utility, where resources (and their costs) are amortized against an aggregate of shared use; we discussed the challenge with the inertia to change. This interesting article from Fast Company, certainly seemed to ring true: Change or Die.

Specifically, the statement from Dr. John Kotter, HBS:

The central issue is never strategy, structure, culture, or systems. The core of the matter is always about changing the behavior of people.”

The article goes on to detail that one shouldn't use the “fear of death” analog, but rather, the “joy of living”

So, live joyously, give yourself more free time, use
Sun Grid to free yourself from procurement hell, developer execution time hell, or even Hell with a Capital H (stop spending money on underutilized computing infrastructure).

Seriously, a team of developers has been working really hard all summer to launch our public grid instance, give it a try.

Tuesday Jun 14, 2005

OpenSolaris “More Open than Open”

Rob Gingell (a former Sun Fellow and Chief Engineer) once asked me to help put together a white paper on how Sun was “more open than open” which is to say that there are so many companies out there who touted Open Systems, Open Software and Open Standards, for whom their motives weren't on the straight and narrow. I wish that I had had this entry.

“Today we have taken our crown jewels, our flagship piece of software, Solaris and releasing it to the Internet community

I'm most excited about “what's in there” specifically today's consolidation is the core “OS/Networking” what we're calling our “ON” gate release, which as I have been playing with for a couple of weeks, delivers a very stable environment for most ISP like functions, and is the base for all of the other bundles / packages that “live on top”. And the ancillary developer tools - both Sun's Studio Tools and those from GNU.

Like Red Hat is to Fedora, OpenSolaris provides a community development pool, and Solaris becomes Sun's distribution. For the past months Sun has been seeding this community with pretty large tarballs (~ ) which could be manually built, today we're launching an interactive site, typical to most development programs, the CVS tree will be a managed entity.

And since I know that there are going to be a lot of takers, sorry if the download is a little slow - opening day is always popular.

Have fun ;)




« July 2016