Tuesday Nov 18, 2008

Save money with Open Storage

Open Storage helps you save time and money for Web-scale Applications.

Want Proof?

Check this excellent  benchmark for Web 2.0 run on Sun Storage 7410 (aka AmberRoad) and CMT technology based Sun Fire T5120 servers.

Its also evident from the benchmark data that you don't suffer a performance penalty for using NAS.  There is a fairly common impression that performance could/would be slower than DAS, this shows that it's just not true with this environment.

System Processors Results
Storage Ch, Cr, Th GHz Type users Util RU watts / user users /RU
Sun Fire T5120 Sun Storage 7410 Unified storage array (NFS) 1, 8, 64 1.4 UltraSPARC T2 2,400 72% 1 0.20 2,400
Sun Fire T5120 LocalFS 1, 8, 64 1.4 UltraSPARC T2 2,400 60% 1 0.20


Monday Oct 27, 2008

Open Solaris + MySQL + ZFS Success Story in Production at SmugMug

SmugMug , a photos and videos publishing site, goes into Production on Open Solaris + MySQL + ZFS. Check out this story on why a Linux geek decided to move his site from Linux To Open Solaris.

Don MacAskill, chief geek and CEO of SmugMug says"  ZFS is the most amazing filesystem I’ve ever come across. Integrated volume management. Copy-on-write. Transactional. End-to-end data integrity. On-the-fly corruption detection and repair. Robust checksums. No RAID-5 write hole. Snapshots. Clones (writable snapshots). Dynamic striping. Open source software. " He is also excited about the CoolStack 5.1 stack available in Open Solaris along with MySQL.

Full Story on SumgMug Powered by  Open Solaris  and  MySQL


<script type="text/javascript">var addthis_pub="geekyjewel";</script> <script type="text/javascript" src="http://s7.addthis.com/js/152/addthis_widget.js"></script>

Monday Oct 20, 2008

Scaling WikiPedia with LAMP: 7 billion page views per month

I recently attended an interesting talk by Brion Vibber, CTO of WikiMedia Foundation, a non-profit organisation that runs the infrastructure for Wikipedia. He described how his team of 7 engineers manages the Wikipedia site that gets on an average of 7 billion page views per month. The highlights from the talk are listed below that included the architecture of the site infrastructure to scale up to the traffic that is received. They are ranked amongst the Top 10 sites in terms of traffic.

The site runs on the LAMP stack and you know what that is:

  • Linux
  • Apache
  • MySQL from Sun
  • Perl/PHP/Python/Pwhatever :-)

WikiMedia runs the site on about 400 x86 servers. Of those, about 250 run the webservers and the remaining run MySQL database. Recently they acquired the OpenSolaris Thumper machines from Sun which they are exploring. Sun Fire X4500 aka Thumper is the World's first Open Source Storage Server running Open Solaris and ZFS. Currently they are using the thumpers for storing the media files using the ZFS file system and they are simply loving it. They have also begun to use the DTrace feature of Open Solaris and cant stop raving about it!!

11/21/08 Update: Link to the recent Press Release WikiMedia selects Sun Microsystems to Enhance Multimedia Experience.

At the core, Wikipedia runs on a very simple system architecture as shown below and given its a non-profit organisation, almost all software is open source and FREE.

Simple is nice but it can be SLOW :-) In order to speedup, the first thing is to add cache in the front end as well as at the backend of the system. On the webfront side, Wikipedia uses the Squid reverse proxy cache for caching and at the backend, they use memcached as shown below:

Squid is a proxy server and  a web cache daemon.. It has a wide variety of uses, from speeding up a web server by caching repeated requests, to caching web, DNS and othercomputer network lookups for a group of people sharing network resources. Squid is good for static dynamic sites like Wiki where the content does not change as often. The public face of a given Wiki page does not change that often, so one can cache at the HTTP level. Wikipedia also uses Squid for geographical load balancing too so that they can use cheaper, faster local bandwidth.

Along with Apache/PHP servers, Wikipedia also uses APC, the alternate PHP caching tool. Since PHP compiles the scripts to bytecode, then throws it away after execution, compilation everytime adds a lot of un-necessary overhead. Hence it is recommended to always use an opcode cache with PHP. This drastically reduces the startup time for large apps.

Another speedup technique used by Wikipedia is memcached. memcached is a general-purpose distributed memory caching system often used to speed up dynamic database-driven websites by caching data and objects in memory to reduce the number of times the database must be read. memcached allows you to share temporary data in the network memory. Even though one needs to go over the network to get the data, the latency is still smaller than disk-based database access. Wikipedia usually stores the rendered pages in the memcached.

After adding all possible cache, next thing is to add CASH! :-) ie add more servers to gain scalability.

Those 250 or so webservers come with plenty of memory. The underutilized memory can be used for memcached and adds up to a big memcached store space.

Further, for getting the speedup at the database level, Wikipedia uses simple sharding techniques. They split the data along logical data partitions, such as subsites that dont interact closely.

They also do functional sharding and split the machines along functional boundaries for speedup.

Next popular technique used by them is Replication to gain speed. They have a master server for all writes and slave servers for most Reads. The secret truth they claim behind configuring the master and slave machines is to make sure the slave machines are faster than the masters as slaves need to keep up with the masters, hence handle writes faster than the master.

As you can see, the beauty of the architecture is that it is SIMPLE and all Open Source and it rocks!

Wednesday Oct 15, 2008

World Map of Social Networks

Interesting World Map of Social Networks Distribution.

Source: ValleyMag

Friday Oct 03, 2008

WebScale and CloudComputing Defined

Cloud Computing: There are multiple definitions of CloudComputing, here is one from Forrester Research: A pool of highly scalable, abstracted infrastructure, capable of hosting end-customer applications, that is billed by consumption. Another simple one by Appistry: Cloud computing consists of shared computing resources that are virtualized and accessed as a service, through an API on a pay-to-use basis, delivered by IP-based connectivity, providing highly scalable, reliable on-demand services with agile management capabilities.

WebScale: Not sure when and how this term was coined but is quite popular in the Sun marketing community. I like to define it as that segment of applications that need to scale to millions of users on the web. These applications would be of the likes of YouTube and Facebook. Such apps are being deployed increasingly in a cloud computing environments. This is because a lot of these apps need to scale dynamically depending on the unpredictable peak loads. Classic example is that of Animoto which had to scale from 50 EC2 instances to 4500 instances in 3 days after it was launched due to the unexpected increased demand of the app by the end users.  On the other extreme, some of the Web applications may not take off, in which case, the application provider has no long term commitment with the cloud hosting provider for leasing the infrastructure. Hence that upfront deployment costs can be avoided.




« April 2014