Friday Jan 16, 2009

Cloud Computing: Its raining data.. hallelujah!

The information technology industry has never seen the likes of the data tsunami, or more appropriately, the perpetual data hurricane that is raining down on us. Many of the cloud pundits talk about the Infrastructure as a Service, Platform as a Service and Software as a Service. But very few discuss the critical aspect of cloud : big data. Bill Oreilly calls it the network effect in data, and Amazon recently gave their nod to big data by putting public sets of census, genome, economics and 3-d chemical data online. Google has been indexing public books ( Google scholar ) for quite a while now.

So how big is this data? and how fast is it growing?  I did some research - the results are astounding!

( For reference : Petabyte(PB) is 1000 terrabytes(TB), where 1 TB = 1000 GB )

 SIZE ( Plus Compound Annual Growth Rate )

Wikipedia 10GB ( 100% CAGR )
 Merck Bio Research DB  1.5TB/quarter
 Wal-Mark Transaction DB  600TB
 UPMC Hospitals Imaging Data  500TB/year
 Typical Oil Company data per oil field
 350 TB
 One day of Instant Messaging in 2002  750GB
 World Wide Web  1 PB
 Internet Archive  1 PB +
 Terashake earthquake model of LA Basin  1PB
 MIT BabyTalk Speech Experiment  1.5PB
 Estimated Online Ram in Google  8PB+
 Large hadron Collidor  15 PB per run ( 300Exabytes per year! )
 Annual Email traffic ( no spam )  300 PB
 Personal digital photos  1000 PB+ ( 100% CAGR )
 Human Genomics  7000 PB ( 1GB per person / 200 PB + captured ) ( 200% CAGR )
 Total Digital Data created  in 2007 ( IDC )  281,000 PB ( 281 Exabytes ) with 10% CAGR

There are some interesting data points to dwell upon here:

a. The TOTAL size of the documents on world wide web is only around 1PB - compared to the digital photography which is 1000 times the size of WWW or the current size of human genome which is 700 times the size of the WWW.

b. Many of the large data sets are not created by the social web but by large institutions. High Performance Computing  involving audio/video analysis and simulations produces data sets that dwarf the size of others.

 c. The IDC report quoted above estimates that by the year 2011 there will be 1,773 exabytes of digital data in the world!. The report contains many jewels of information, one being that only 5% of this data is generated from the enterprise and only 35% emanates from workers overall ( from their workstations ). Rest of it is created by consumers themselves or workers in enterprises capturing personal information for their customers. In fact if you evenly divide the data by the world population, each person is assumed to have about 45GB of data. I know I probably have created much more than that over last year.

This data points to some interesting trends -

1. The storage market will continue to grow by double digits over the next 5 years. System that are bigger, better, faster, more cost effective and easier to manage and operate will do better. ( Note to store 280exabytes of data, it will take  560,000 Sun Storage 7410 servers.)

2. The applications that can mine this data and expose meaningful information will become widely popular.  Best data mining providers that want to offer services above these data sets. Data Warehousing technologies and new distributed analytics models ( like hadoop ) will thrive.

3. Security of this data will remain relevant. With tons of PII information, privacy and security regulations ( think sarbox, HIPPAA,  GLB etc ) will continue to force enterprises and guardians of this data to address security at all levels.

I do see a bright future for companies that are in the data management , retention and safeguarding business. As well as those companies that can corral this data hurricane and offer meaningful analysis and services above these.

Thoughts/ comments - please fire away!

Wednesday Dec 03, 2008

Cloud Computing: A $42B industry by 2012?

When Industry pundits ( like IDC ) come out with a report that describes cloud computing to be a $42 B industry by 2012 , you always take these with a grain of salt.  As you know some of these analysts (not IDC though ) also predicted that the .com bubble would grow to a mega billion dollar industry. There are however some key takeaway points in this report that could make this number a realistic estimate.

Two  interesting claims the report makes are:

  • The $42 billion number is a small (10%) percentage of the overall IT spend - which is at a whopping $490B - however the year over year is at at 27% growth as compared to a 7% growth for IT spend.

  • As shown below, the Application Service provider market (SaaS) is almost half of this pie: at 57% of the pie. The rest is split between infrastructure and platform ( IaaS and PaaS )

So like any good scientist , I decided to validate these claims with some ground numbers and observations.

Given that there are close to 50 plus top "cloud" vendors out there , ( it would take a long time to research revenue across all these - you would hope that IDC did that ) , I focused at the big boys and see if the numbers match up with IDC assessment for 2008.

Amazon: Amazon classifies its AWS revenue under "Other" in the quarterly filings. Here is the total "other" column from their Q3FY08 report ( all numbers in $$M ).

Q3 2007  

Q4 2007

Q1 2008

Q2 2008

Q3 2008

Change Y/Y%







So with EC2 and S3 ( compute and storage ) this is about half a billion dollar a year business for amazon but growing at a whopping 42% Y/Y change.

Given that IDC claims that storage and compute is about a 14% of a $16 B market ( or about $2.24 B ) , and that there are numerous companies in this space ( from the list above ), Amazon is definitely in a good position by owning about a quarter of this market ( and looking to grow it ). As we all know with their recent announcements, they are looking to enter other parts of the pie.

Now lets look at the heavy weight gorilla in the SaaS space: : Reports earnings in the billion dollar ranges . For the quarter ending sep 2008, here are the numbers:

Three months ending

 Nine Months Ending

 Oct 2008

 Oct 2007




 Subscription and Support





 Prof Services and other










So salesforce is projected at approximately a $1B revenue with a 43% y/y growth! IDC estimates the market to be at around $7B dollars in FY08 and with so many players in this space, it is quite likely that controls about 15% of the market.Google, the other 800lb gorilla doesnt break out their numbers to get a sense but you can imagine their enterprise offerings ( in mail and google apps and other platform offerings ) are in the multi million dollar ranges at least.

So is the IDC survey sound plausible? Yes, the numbers seem to validate.

One thing the survey does not point to is that as this market evolves, the applications mentioned here will move into the more "custom space" targeting particular verticals : healthcare, education, finance, telecommunications, government , media  and so on. With the initiative of, you can already see the path salesforce is going. Could there be vertical clouds that could emerge with specialized platform and services? You betcha.

As with any rising tides, there will be many winners and losers but this market is definitely looking big enough to catch the attention of all big players. Delivering on Ray Ozzie's 2 year old memo taking Microsoft to this space, Microsoft just started delivering with their large azure push. And as history shows, when MS enters any market, the market is not only validated but the game face is on.

So what is Sun doing in this space? That is quite literally, the billion dollar question - stay tuned for that sooner than later!.




« July 2016