Thursday Dec 17, 2009

Cache, cache, cache! (Part 2: Architectures)

On Tuesday I focused mainly on memcached and PHP, but today I'll take a wider look at cacheing architectures in general. The main question about defining a cache architecture is to decide where to locate the caching component:


[1] Status quo, the three-tier architecture

In theory, the commonly accepted standard architecture of a software product is divided into three tiers: the presentation tier, the application tier and finally the data tier. In the context of web applications we rediscover this tiers in the trinity of web server, application server and database server.

In the above diagram we find these three tiers with the user (or in technical terms: the browser) on top of this stack.

[2] Cache on top

One very obvious idea is to place the cache in front of the web server, between user and web server. Usually we find this architecture in a so called reverse proxy configuration. A reverse proxy is quite easy to set up and has a positive impact for web sites with more static content. But for highly dynamic web applications - like most of today's Web 2.0 applications - the caching benefit of a reverse proxy may be not that big.

In general: having a reverse proxy is better than no caching at all. A reverse proxy will give you always a performance benefit.

[3] Cache in between of web and application server

Let's move the cache one level down the stack in between web server and application server. On the fist sight this may look like a very good idea, because the cache now protects the application server. But on the second sight you'll realize that this configuration is mostly the same as that one from architecture 2, just without the benefit also caching your web server's data.

For exotic scenarios there may be a good reason for this configuration (esp. in combination with load balancing functionality) but in general you should favor architecture 2 over this one.

[4] Cache in between of application and database

And another level down in the stack. The cache now sits between application server and database. Again this looks good, and seems to be a good idea - on the first sight. But on the second or third sight you may realize that nearly every database system has its own internal query cache and our cache is only a cache for a cache. And caching a cache is basically never a good idea and can lead to unpredictable, bad consequences.

Another difficulty with this approach is that it's hard to decide when the cache gets dirty (cache jargon for obsolete) and when it's time to clear the cache.

[5] Cache inside of application

And now half a level up again: right into the application tier. This is the most challenging but also the most powerful place to implement caching strategies. Identify time-critical and frequently accessed data during the development process and implement dedicated and customized caching mechanisms. But don't try do build an abstract, unified, common cache for everything.

It's very important to find a specific and suitable solutions for each kind of data you want to cache in your application. Otherwise you'll will probably just end with another row-based cache for your database (like architecture 4) or some kind of reverse proxy (like architecture 2).


Architectures 2, 3 and 4 can be easily setup by system administration without having to involve development in any way. It's mostly a matter of clever configuration which also may add some load balancing features. In general you'll definitely achieve a better performance of your application, but there is always a given limit by the architecture and scaling quality of your core application.

Architecture 5 is probably the best choice, but - to get best results - needs to be started in an early stage and during the whole development and designing process of your web application you should always have caching in mind. What data is most frequently accessed? What data is expensive (hard to retrieve)? What data depends on user sessions? How up to date does the data need to be?

If you are curious about these questions, please stay tuned for part 3.

Tuesday Dec 15, 2009

Cache, cache, cache! (Part 1: memcached)

Caching is probably the most important technique you should use in nowadays web sites or web application. Sure, scaling your hardware is still the final answer to all your load problems, but with some kind of caching your application will scale far better rather than without.


Currently my favorite caching tool is memcached. It's a slim and ultra fast distributed caching system. Memcached is basically a key-value store, which stores all data non-persistently in memory and if your server goes down all the data is also gone because it's not stored somewhere on a hard disk.

Memcached is not meant to be a database, and you'll still need a database to store your data persistently.

Setting up memcached

I'm a very lazy guy and try to avoid boring duties like installing memcached. That's why I love using Sun's Web Stack, which already includes memcached and is so easy to use. If you're not a Web Stack user please take a look at the memcached FAQ to learn how to install memcached on your system.

To add memcached to my IPS-based Web Stack installation I simply call these two commands:

[oswald@localhost ~/demo]$ bin/pkg install sun-memcached
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                    1/1         9/9     0.17/0.17 

PHASE                                        ACTIONS
Install Phase                                  30/30 
PHASE                                          ITEMS
Reading Existing Index                           7/7
Indexing Packages                                1/1
[oswald@localhost ~/demo]$ bin/setup-webstack

Now all I need to do is to start the daemon:

[oswald@localhost ~/demo]$ bin/sun-memcached start
Starting memcached

Memcached has no support for any access control at all and you should use memcached only on private networks or secure you installation with a firewall (port 11211, by the way).

Using memcached with PHP

As I already mentioned memcached is a simple key-value store which is very easy to use for programmers. To show the basic idea I put this small PHP script together:

        $memcache = new Memcache();
        if(!$memcache->connect('localhost', 11211))
                die("Couldn't connect to memcached! Cruel world!");


        $result = $memcache->get($key);

                echo "$key is $result";
                echo "Set $key to $value";

There are three main functions you will need to understand in order to work with memcached:

to connect to your memcached server. If you have multiple memcached servers running you can use addServer() to add one or more servers to the connection pool.
Retrieves the value for the given key.
Stores the given value for the given key. set() also allows you to define an expiration time for the key-value pair.

On the first execution of this script the cache is empty and you'll get this output:

Set zaphod to cool

On the second execution, the value for zaphod is already set and you'll see:

zaphod is cool

That's all. That's the basic way to use memcached.

What's next...

The next step is to decide what information you want to cache and where do you want to cache. Both are very crucial decisions which determine success or failure of your cache. So, stay tuned for part 2. ;)

Thursday Dec 10, 2009

Restoring normality...

»...just as soon as we are sure what is normal anyway. Thank you.« (HHGTTG)

The last few weeks were a little quiet here in this blog. I had to do some urgent programming for the next release of our Web Stack and last week I had the great pleasure to talk about web application development at the Codebits conference in Lisbon.

4166227540_48a7f716b6_o.jpg Photography by Lenz Grimmer.

Kai 'Oswald' Seidler writes about his life as co-founder of Apache Friends, creator of XAMPP, and technology evangelist for web tier products at Sun Microsystems.


« December 2009 »