Wednesday Jan 20, 2010

Good Idea: Python with FastCGI (mod_fcgid)

A couple of days ago, I stumbled over an installation in which CGI was used to run a Python-based web application. Of course the applications ran terribly slow, and as I mentioned earlier in »Save energy! Stop using CGI!«, it's (nowadays) always a bad idea to use CGI. Not only it's tediously slow and bad software design, it's also soooo 90's.

What's the difference between CGI and FastCGI?

Let me use a metaphor to start. Imagine a well...

python_cgi.jpg python_fastcgi.jpg
Here you see the old-fasioned way of CGI: For every request you have to let the bucket all the way down into the well (fork a new process), allowing water to enter the bucket (initialize and execute your application), pull the bucket up to the surface and empty it (send the data to the web server and free all allocated memory). And here is the modern FastCGI way: Install the faucet (start the FastCGI process) and every time you need water, turn it on (connect and send a request), get water (calculate and get the answer), and turn it off (close the connection). No need to fork, initialize your application, and free the allocated memory on every single request.

Okay, seriously, let me show you how this works in practice.

Installing Python

For this demo I use Sun's Web Stack. It's probably the easiest way to demonstrate the performance differences between CGI and FastCGI. XAMPP doesn't support FastCGI, because with mod_perl for Perl and mod_php for PHP there is no real need for a FastCGI interface.

First, let me add Python to my basic web stack installation:

[oswald@sol10u7 ~/webstack1.5]% bin/pkg install sun-python26
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                    1/1   2784/2784   12.65/12.65 

PHASE                                        ACTIONS
Install Phase                              2861/2861 
PHASE                                          ITEMS
Reading Existing Index                           7/7
Indexing Packages                                1/1
[oswald@sol10u7 ~/webstack1.5]% bin/setup-webstack

If you're familiar with Sun's Web Stack, you'll have noticed that I'm using the IPS installation of Web Stack. That's my favorite installation way, because it allows me to place the Web Stack in any directory I want and also allows me to run the stack without the need of root privileges.

Python with CGI

Setting up CGI is very, very easy and probably that's exactly the reason why so many people still use it.

Let me start with a simple "Hello World!" Python CGI script:

#!/home/oswald/webstack1.5/bin/python
print ""
print "Hello World!"

I named this file hello.py and put it into the cgi-bin directory of my Apache installation. In the case of Web Stack it's var/apache2/2.2/cgi-bin. Add execute permissions:

[oswald@sol10u7 ~]% chmod a+x var/apache2/2.2/cgi-bin/hello.py

Now I log into another box on the same network and use my favorite command-line web browser Lynx to test the newly created Hello World CGI:

[oswald@debian50 ~]% lynx -source http://sol10u7/cgi-bin/hello.py
Hello World!

Looks good. Now let's benchmark this script:

[oswald@debian50 ~]% ab -n 1000 http://sol10u7/cgi-bin/hello.py
...
Time taken for tests:   31.083 seconds
...
Total transferred:      256000 bytes
HTML transferred:       13000 bytes
Requests per second:    32.17 [#/sec] (mean)
...

32 requests/second. That's nothing to be proud of!

Python with FastCGI

And now let's try FastCGI by adding Apache's mod_fcgid to the Web Stack installation:

[oswald@sol10u7 ~/webstack1.5]% bin/pkg install sun-apache22-fcgid 
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                    1/1         6/6     0.09/0.09 

PHASE                                        ACTIONS
Install Phase                                  24/24 
PHASE                                          ITEMS
Reading Existing Index                           7/7
Indexing Packages                                1/1
[oswald@sol10u7 ~/webstack1.5]% bin/setup-webstack

Activate the default configuration:

[oswald@sol10u7 ~/webstack1.5]% cp etc/apache2/2.2/samples-conf.d/fcgid.conf etc/apache2/
2.2/conf.d/

For those, who are not able or don't want to use Sun's Web Stack, the above fcgid.conf file basically contains the following directives:

LoadModule fcgid_module libexec/mod_fcgid.so
SharememPath /home/oswald/webstack1.5/var/run/apache2/2.2/fcgid_shm
SocketPath /home/oswald/webstack1.5/var/run/apache2/2.2/fcgid.sock
AddHandler fcgid-script .fcgi
<Location /fcgid>
    SetHandler fcgid-script
    Options ExecCGI
    allow from all
</Location>

As usual after changing Apache's configuration, we need to reload (aka graceful restart) the Apache to let the new configuration take effect:

[oswald@sol10u7 ~/webstack1.5]% apache2/2.2/bin/apachectl graceful

Now I create a new directory named fcgid directly inside of Apache's document root folder and change into that folder:

[oswald@sol10u7 ~/webstack1.5]% mkdir var/apache2/2.2/htdocs/fcgid
[oswald@sol10u7 ~/webstack1.5]% cd var/apache2/2.2/htdocs/fcgid

To let Python to talk with my Apache's mod_fcgid I need to install a so-called Python FastCGI/WSGI gateway. There are several solutions available for Python, but I personally prefer Allan Saddi's fcgi.py:

[oswald@sol10u7 htdocs/fcgid]% wget -q http://svn.saddi.com/py-lib/trunk/fcgi.py

The "Hello World!" Python FastCGI script looks a little different this time:

#!/home/oswald/webstack1.5/bin/python
from fcgi import WSGIServer
def app(environ, start_response):
	start_response('200 OK', [('Content-Type', 'text/html')])
	return('''Hello world!\\n''')
WSGIServer(app).run()

This time it's not the output of a script which is sent back to the browser, it's the return value of a function add() defining the data which goes to the user's browser. In this case it's the simple character string "Hello World!\\n".

Like in the CGI example above, the Python script needs to be executable:

[oswald@sol10u7 htdocs/fcgid]% chmod a+x hello.py

The content of my fcgid directory now looks like this:

[oswald@sol10u7 htdocs/fcgid]% ls -l
total 90
-rw-r--r--   1 oswald   other      44113 Jul 26  2006 fcgi.py
-rwxr-xr-x   1 oswald   other        223 Jan 19 12:48 hello.py

And - like in my CGI example above - I now test the script with Lynx:

[oswald@debian50 ~]% lynx -source http://sol10u7/fcgid/hello.py
Hello world!

And after everything looks fine, I start a little benchmark:

[oswald@debian50 ~]% ab -q -n 1000 http://sol10u7/fcgid/hello.py
...
Time taken for tests:   1.747 seconds
...
Total transferred:      235000 bytes
HTML transferred:       13000 bytes
Requests per second:    572.44 [#/sec] (mean)
...

Yes, gotcha. 572 requests per seconds: that sounds reasonable. Remember the 32 requests/second from CGI? Do you want the well or do you take the faucet? Sure, implementing a FastCGI program is far more challenging then coding a simple CGI solution, but 572 against 32 requests per second? Do I need to say more?

Fotos: On the right "Faucet" by Joe Shlabotnik, and on the left "Well" by echiner1. Both licensed under Creative Commons.

Wednesday Jan 13, 2010

Web Stack and the TLS Vulnerability

If you're a Web Stack user, please read Jyri's brief article about Web Stack and the TLS Vulnerability.

Thursday Jan 07, 2010

Cache, cache, cache! (Part 3: What to cache?)

Happy New Year everyone! Hope you could enjoy your holidays!!

Let's start this year with the third part of my little series of thoughts about caching. After my small memcached intro and thoughts about caching architectures, I now focus on the data you should consider to cache in your web application.

cache-what.png

[1] Cache HTML

Obviously the biggest performance win you can achieve is by caching the whole output of your web application: a simple reverse proxy scenario. This works very well for mostly static pages, but for highly dynamical and user-specific content this is not an option: there is no advantage in caching a web page, which gets obsolete within the next moment.

Probably the best way to solve this dilemma is to implement a so-called partial-page cache: Let your application cache just portions of the page and leave the rest, where it makes no sense to cache, dynamic.

It's very important that you implement this in a very top layer of your application. Probably exactly that layer, which software architects will call presentation layer. Sure, this is likely to break you framework architecture, but to quote chapter 55 of the Tao Te Ching:

The movement of the Tao
By contraries proceeds;
And weakness marks the course
Of Tao's mighty deeds.

But seriously: If you have to stay in the boundaries of a framework, Ajax is a good way to bypass this restrictions and helps to implement such a cache in a restricted architecture. But be aware that this will raise the number of HTTP requests on your frontend web servers.

An effective caching strategy will always mess your beautifully designed software architecture up. Having just one (central) caching layer looks great in system diagrams and it's better than no cache at all, but it's definitely not the end of the rope.

[2] Cache complex data structures

If you don't want to break your framework architecture or you don't like the idea of caching HTML at all, and I totally understand your point, you should consider about caching other (lower level, but still complex) structures of data.

Some examples for suitable data structures:

  • user profiles
  • friends lists
  • current user lists
  • list of locations, branches, countries, languages, ...
  • top 10 (whatever) lists
  • public statistical data
  • ...

The main challenge lies in identifying the most proper data structures. This is no easy task and strongly depends on the kind of web application you run or plan to run. Avoid caching simple data sets, like row-level data from the database. Don't think row-level. That's the best advice you should keep in mind. (Note to myself: I need to put this on a t-shirt. I found this phrase in Memcached Internals, a wonderful article inspired by a talk by Brian Aker and Alan Kasindorf.)

At a first glance Ajax may be an obvious technology to combine with such a cache. But please be aware that moving application logic away from the server-side application to the client side is always a very dangerous task, which easily may compromise the security of your application.

Which allows me to end this post with another quote from Laozi (Tao Te Ching, chapter 63):

All difficult things in the world
are sure to arise from a previous state
in which they were easy.

Thursday Dec 17, 2009

Cache, cache, cache! (Part 2: Architectures)

On Tuesday I focused mainly on memcached and PHP, but today I'll take a wider look at cacheing architectures in general. The main question about defining a cache architecture is to decide where to locate the caching component:

cache-architectures.png

[1] Status quo, the three-tier architecture

In theory, the commonly accepted standard architecture of a software product is divided into three tiers: the presentation tier, the application tier and finally the data tier. In the context of web applications we rediscover this tiers in the trinity of web server, application server and database server.

In the above diagram we find these three tiers with the user (or in technical terms: the browser) on top of this stack.

[2] Cache on top

One very obvious idea is to place the cache in front of the web server, between user and web server. Usually we find this architecture in a so called reverse proxy configuration. A reverse proxy is quite easy to set up and has a positive impact for web sites with more static content. But for highly dynamic web applications - like most of today's Web 2.0 applications - the caching benefit of a reverse proxy may be not that big.

In general: having a reverse proxy is better than no caching at all. A reverse proxy will give you always a performance benefit.

[3] Cache in between of web and application server

Let's move the cache one level down the stack in between web server and application server. On the fist sight this may look like a very good idea, because the cache now protects the application server. But on the second sight you'll realize that this configuration is mostly the same as that one from architecture 2, just without the benefit also caching your web server's data.

For exotic scenarios there may be a good reason for this configuration (esp. in combination with load balancing functionality) but in general you should favor architecture 2 over this one.

[4] Cache in between of application and database

And another level down in the stack. The cache now sits between application server and database. Again this looks good, and seems to be a good idea - on the first sight. But on the second or third sight you may realize that nearly every database system has its own internal query cache and our cache is only a cache for a cache. And caching a cache is basically never a good idea and can lead to unpredictable, bad consequences.

Another difficulty with this approach is that it's hard to decide when the cache gets dirty (cache jargon for obsolete) and when it's time to clear the cache.

[5] Cache inside of application

And now half a level up again: right into the application tier. This is the most challenging but also the most powerful place to implement caching strategies. Identify time-critical and frequently accessed data during the development process and implement dedicated and customized caching mechanisms. But don't try do build an abstract, unified, common cache for everything.

It's very important to find a specific and suitable solutions for each kind of data you want to cache in your application. Otherwise you'll will probably just end with another row-based cache for your database (like architecture 4) or some kind of reverse proxy (like architecture 2).

Conclusion

Architectures 2, 3 and 4 can be easily setup by system administration without having to involve development in any way. It's mostly a matter of clever configuration which also may add some load balancing features. In general you'll definitely achieve a better performance of your application, but there is always a given limit by the architecture and scaling quality of your core application.

Architecture 5 is probably the best choice, but - to get best results - needs to be started in an early stage and during the whole development and designing process of your web application you should always have caching in mind. What data is most frequently accessed? What data is expensive (hard to retrieve)? What data depends on user sessions? How up to date does the data need to be?

If you are curious about these questions, please stay tuned for part 3.

Tuesday Dec 15, 2009

Cache, cache, cache! (Part 1: memcached)

Caching is probably the most important technique you should use in nowadays web sites or web application. Sure, scaling your hardware is still the final answer to all your load problems, but with some kind of caching your application will scale far better rather than without.

cachecachecache.png

Currently my favorite caching tool is memcached. It's a slim and ultra fast distributed caching system. Memcached is basically a key-value store, which stores all data non-persistently in memory and if your server goes down all the data is also gone because it's not stored somewhere on a hard disk.

Memcached is not meant to be a database, and you'll still need a database to store your data persistently.

Setting up memcached

I'm a very lazy guy and try to avoid boring duties like installing memcached. That's why I love using Sun's Web Stack, which already includes memcached and is so easy to use. If you're not a Web Stack user please take a look at the memcached FAQ to learn how to install memcached on your system.

To add memcached to my IPS-based Web Stack installation I simply call these two commands:

[oswald@localhost ~/demo]$ bin/pkg install sun-memcached
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                    1/1         9/9     0.17/0.17 

PHASE                                        ACTIONS
Install Phase                                  30/30 
PHASE                                          ITEMS
Reading Existing Index                           7/7
Indexing Packages                                1/1
[oswald@localhost ~/demo]$ bin/setup-webstack

Now all I need to do is to start the daemon:

[oswald@localhost ~/demo]$ bin/sun-memcached start
Starting memcached

Memcached has no support for any access control at all and you should use memcached only on private networks or secure you installation with a firewall (port 11211, by the way).

Using memcached with PHP

As I already mentioned memcached is a simple key-value store which is very easy to use for programmers. To show the basic idea I put this small PHP script together:

<?php
        $memcache = new Memcache();
        if(!$memcache->connect('localhost', 11211))
                die("Couldn't connect to memcached! Cruel world!");

        $key="zaphod";

        $result = $memcache->get($key);

        if($result)
        {
                echo "$key is $result";
        }
        else
        {
                $value="cool";
                echo "Set $key to $value";
                $memcache->set($key,$value);
        }
?>

There are three main functions you will need to understand in order to work with memcached:

connect(host,port)
to connect to your memcached server. If you have multiple memcached servers running you can use addServer() to add one or more servers to the connection pool.
get(key)
Retrieves the value for the given key.
set(key,value)
Stores the given value for the given key. set() also allows you to define an expiration time for the key-value pair.

On the first execution of this script the cache is empty and you'll get this output:

Set zaphod to cool

On the second execution, the value for zaphod is already set and you'll see:

zaphod is cool

That's all. That's the basic way to use memcached.

What's next...

The next step is to decide what information you want to cache and where do you want to cache. Both are very crucial decisions which determine success or failure of your cache. So, stay tuned for part 2. ;)

Thursday Dec 10, 2009

Restoring normality...

»...just as soon as we are sure what is normal anyway. Thank you.« (HHGTTG)

The last few weeks were a little quiet here in this blog. I had to do some urgent programming for the next release of our Web Stack and last week I had the great pleasure to talk about web application development at the Codebits conference in Lisbon.

4166227540_48a7f716b6_o.jpg Photography by Lenz Grimmer.

Thursday Nov 19, 2009

PHP: session.gc_maxlifetime vs. session.cookie_lifetime

PHP and sessions: Very simple to use, but not as simple to understand as we might want to think.

session.gc_maxlifetime

This value (default 1440 seconds) defines how long an unused PHP session will be kept alive. For example: A user logs in, browses through your application or web site, for hours, for days. No problem. As long as the time between his clicks never exceed 1440 seconds. It's a timeout value.

PHP's session garbage collector runs with a probability defined by session.gc_probability divided by session.gc_divisor. By default this is 1/100, which means that above timeout value is checked with a probability of 1 in 100.

session.cookie_lifetime

This value (default 0, which means until the browser's next restart) defines how long (in seconds) a session cookie will live. Sounds similar to session.gc_maxlifetime, but it's a completely different approach. This value indirectly defines the "absolute" maximum lifetime of a session, whether the user is active or not. If this value is set to 60, every session ends after an hour.

Wednesday Nov 18, 2009

PHP's MySQLi extension: Storing and retrieving blobs

There are a lot of tutorial out there describing how to use PHP's classic MySQL extension to store and retrieve blobs. There are also many tutorials how to use PHP's MySQLi extension to use prepared statements to fight SQL injections in your web application. But there are no tutorials about using MySQLi with any blob data at all.

Until today... ;)

Preparing the database

Okay, first I need a table to store my blobs. In this example I'll store images in my database because images usually look better in a tutorial than some random raw data.

mysql> CREATE TABLE images (
       id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
       image MEDIUMBLOB NOT NULL,
       PRIMARY KEY (id)
       );
Query OK, 0 rows affected (0.02 sec)

In general you don't want to store images in a relational database. But that's another discussion for another day.

Storing the blob

To make a long story short, here's the code to store a blob using MySQLi:

<?php
	$mysqli=mysqli_connect('localhost','user','password','db');

	if (!$mysqli)
		die("Can't connect to MySQL: ".mysqli_connect_error());

	$stmt = $mysqli->prepare("INSERT INTO images (image) VALUES(?)");
	$null = NULL;
	$stmt->bind_param("b", $null);

	$stmt->send_long_data(0, file_get_contents("osaka.jpg"));

	$stmt->execute();
?>

If you already used MySQLi, most of the above should look familiar to you. I highlighted two pieces of code, which I think are worth looking at:

  1. The $null variable is needed, because bind_param() always wants a variable reference for a given parameters. In this case the "b" (as in blob) parameter. So $null is just a dummy, to make the syntax work.

  2. In the next step I need to "fill" my blob parameter with the actual data. This is done by send_long_data(). The first parameter of this method indicates which parameter to associate the data with. Parameters are numbered beginning with 0. The second parameter of send_long_data() contains the actual data to be stored.

While using send_long_data(), please make sure that the blob isn't bigger than MySQL's max_allowed_packet:

mysql> SHOW VARIABLES LIKE 'max_allowed_packet';
+--------------------+----------+
| Variable_name      | Value    |
+--------------------+----------+
| max_allowed_packet | 16776192 | 
+--------------------+----------+
1 row in set (0.00 sec)

If your data exceeds max_allowed_packet, you probably don't get any errors returned from send_long_data() or execute(). The saved blob is just corrupt!

Simply raise the value max_allowed_packet to whatever you'll need. If you're not able to change MySQL's configuration, you'll need to send the data in smaller chunks:

	$fp = fopen("osaka.jpg", "r");
	while (!feof($fp)) 
	{
 	   $stmt->send_long_data(0, fread($fp, 16776192));
	}

Usually the default value of 16M should be a good start.

Retrieving the blob

Getting the blob data out of the database is quite simple and follows the usual way of MySQLi:

<?php
	$mysqli=mysqli_connect('localhost','user','password','db');

	if (!$mysqli)
		die("Can't connect to MySQL: ".mysqli_connect_error());

	$id=1;  
	$stmt = $mysqli->prepare("SELECT image FROM images WHERE id=?"); 
	$stmt->bind_param("i", $id);

	$stmt->execute();
	$stmt->store_result();

	$stmt->bind_result($image);
	$stmt->fetch();

	header("Content-Type: image/jpeg");
	echo $image; 
?>

Connect to the database, prepare the SQL statement, bind the parameter(s), execute the statement, bind the result to a variable, and fetch the actual data from the database. In this case there is no need to worry about max_allowed_packet. MySQLi will do all the work:

3925128491.jpg

By the way...

If you want to insert a blob from the command line using MySQL monitor, you can use LOAD_FILE() to fetch the data from a file:

mysql> INSERT INTO images (image) VALUES( LOAD_FILE("/home/oswald/osaka.jpg") );

Be aware that also in this case max_allowed_packet limits the amount of data you're able to send to the database:

mysql> SHOW VARIABLES LIKE 'max_allowed_packet';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| max_allowed_packet | 7168  | 
+--------------------+-------+
1 row in set (0.00 sec)

mysql> INSERT INTO images (image) VALUES( LOAD_FILE("/home/oswald/osaka.jpg") );
ERROR 1048 (23000): Column 'image' cannot be null
mysql> SET @@max_allowed_packet=16777216;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE 'max_allowed_packet';
+--------------------+----------+
| Variable_name      | Value    |
+--------------------+----------+
| max_allowed_packet | 16777216 | 
+--------------------+----------+
1 row in set (0.00 sec)

mysql> INSERT INTO images (image) VALUES( LOAD_FILE("/home/oswald/osaka.jpg") );
Query OK, 1 row affected (0.03 sec)

        
    

Friday Nov 13, 2009

Little-known PHP commands: scandir()

Always messed around with a combo of opendir(), readdir(), and closedir() if you wanted to read the contents of a directory? Since PHP 5 there is a new sheriff in town: scandir():

<?php   
	$files=scandir("/etc/php5");
	print_r($files);
?>

Outputs:

Array
(
    [0] => .
    [1] => ..
    [2] => apache2
    [3] => conf.d
)

Okay, you still need to traverse an array, but it's much easier to use than the traditional way.

Performance Tuning the Sun GlassFish Web Stack

My colleague Brian Overstreet wrote a must-read paper about tuning different components of the Sun GlassFish Web Stack focusing on Apache, MySQL, and PHP: Performance Tuning the Sun GlassFish Web Stack.

Thursday Nov 05, 2009

Importing a VDI in VirtualBox

If you're used to be a VMware user and try to switch to the Open-Source side of the Force by using VirtualBox, you may run into difficulties if you try to import an existing VDI file into VirtualBox. Actually it's quite easy, if you know how.

The main difference between VMware and VirtualBox is that VMware captures a whole virtual machine in an image, whereas VirtualBox only supports images of a hard disk. So in VirtualBox's world, you first need to create a new virtual machine, before using an existing VirtualBox image.

  1. First copy your VDI file into VirtualBox's virtual hard disks repository. On Mac OS X it's $HOME/Library/VirtualBox/HardDisks/.

  2. Start VirtualBox and create a new virtual machine (according to the OS you expect to live on the VirtualBox image):

    virtualbox1.jpg
  3. When you're asked for a hard disk image, select Use existing hard disk and click on the small icon on the right:

    virtualbox2.jpg
  4. Which will brings you to the Virtual Media Manager. Click on Add and select the VDI file from step 1.

    virtualbox3.jpg
  5. After leaving the Virtual Media Manager, you'll be back in your virtual machine wizard. Now you can select your new VDI as existing hard disk and finalize the creation process.

    virtualbox4.jpg
  6. Back in the main window, you're now able to start your new virtual machine:

    virtualbox5.jpg

It's quite easy, if you know how.

Wednesday Nov 04, 2009

Store PHP sessions in memcached

My last week blog topic was very much marked by Apache load balancing. Well, I promised to leave this topic alone for a while, but there is one related topic that is worth spending a minute on.

The Theory

If your web application is distributed across multiple servers you'll quickly run in sessions problems because each backend server (aka worker) usually stores its session informations locally. Now, if subsequent HTTP requests are handled by different workers, every time a new sessions is created or, even worse, sessions getting mixed up.

To overcome this problem there are two solutions:

  1. Use a session-aware load balancer that binds a user session to the same worker.
  2. Or keep all session data in a central storage.

Both solutions have the similar drawback: if a worker goes down, all session data of this worker are lost. If the central storage goes down, all sessions are lost. But consider the following: you'll probably have tons of workers, and since every computer is supposed to fail after a specific period of time, the probability of a worker failure is much higher than for a single storage server. It depends on what do you want: A system that runs all the time with small failures or a system that fails completely from time to time?

And finally, losing session data sounds worse than it actually is: usually the users only have to login again to restore their session data. That's sad, but it's not the end of the world. Okay, your system may get into trouble if thousands of users try to re-login at the same time, but that's another problem.

The Solution

My favorite solution is the second one: keep all session data in a central place. And in this scenario I'll use Apache/PHP as my "application server" and memcached as central storage for my session data. If you read and still remember the title of this post, you're probably not surprised.

phpmemcached.jpg

On the left: my load balancer, in the middle my worker farm, and on the right: my single and central memcached server. By the way: You can also have multiple memcached servers, but for this blog post I'll keep it simple.

The Requirements

First, let's check if PHP was build with memcached support:

serverA ~% php -m | egrep memcache
memcache

...on each worker node: serverA to serverD.

Second, I check if memcached is running on serverM:

serverM ~% ps -efa | egrep memcached
oswald  1543     1   0 15:21:17 ?         0:00 /home/oswald/webstack1.5/lib/memcached -d ...

Perfecto.

The Configuration

Now I need to change the PHP configuration on each worker node: Open php.ini on serverA to serverD and search for these lines:

[Session]
; Handler used to store/retrieve data.
session.save_handler = files

And change the configuration like this:

[Session]
; Handler used to store/retrieve data.
session.save_handler = files
session.save_handler = memcache
session.save_path = "tcp://serverM:11211"

Make sure that the settings are the same on all your workers.

That's all. Yes, that's the basic configuration. PHP's sessions will now get stored on the memcached node serverM. No more magic needed.

The Proof

But as we say in Germany: "Prudence is the mother of the china cabinet." Before we can grab the beer, we should make sure everything works as we expect it to.

I put this code in a file named session.php in the document root directory of all my worker nodes:

<?php   
	session_start();
	if(isset($_SESSION['zaphod']))
	{       
		echo "Zaphod is ".$_SESSION['zaphod']."!\\n";
	}       
	else    
	{       
		echo "Session ID: ".session_id()."\\n"; 
		echo "Session Name: ".session_name()."\\n";
		echo "Setting 'zaphod' to 'cool'\\n";
		$_SESSION['zaphod']='cool';
	}       
?>

From the outside I use lynx to access this file:

% lynx -source 'http://serverA/session.php'
Session ID: df58bc9465f27aa20218c11caba6750f
Session Name: PHPSESSID
Setting 'zaphod' to 'cool'

A new session with the ID df58bc9465f27aa20218c11caba6750f was created and PHP uses the session name PHPSESSID to identify the session parameter. And the session variable zaphod was set to the value cool.

Now I add the session information PHPSESSID=df58bc9465f27aa20218c11caba6750f to my URL and rerun the new lynx command:

% lynx -source 'http://serverA/session.php?PHPSESSID=df58bc9465f27aa20218c11caba6750f'
Zaphod is cool!

Yes, I got the expected output: Zaphod is cool! Proving the session data is available on serverA. But that's not a big surprise, what's about the other nodes? I replace serverA with serverB in my URL:

% lynx -source 'http://serverB/session.php?PHPSESSID=df58bc9465f27aa20218c11caba6750f'
Zaphod is cool!

Bingo, serverB also has the same session data as serverA.

And for serverC? It's also the same:

% lynx -source 'http://serverC/session.php?PHPSESSID=df58bc9465f27aa20218c11caba6750f'
Zaphod is cool!

And so on... for each worker node the session data will be the same.

A dream came true.

Thursday Oct 29, 2009

I'll give a talk at Codebits 2009

vb200x150.png

In December, I'll give a talk at SAPO Codebits 2009 in Lisbon, Portual. SAPO Codebits is a hacking event held in Portugal annually, completely organized and sponsored by SAPO, a portuguese ISP and subsidiary company of the Portugal Telecom Group.

I will be speaking about web server architectures, web services in general and discuss the pros and cons of different programing languages (like PHP, Java, Python, Ruby, Perl, JavaScript, ASP.NET) and database technologies in the field of web application development, deployment and hosting. In order to save time and keep development costs at a reasonable level, it's very important to identify system flaws and architectural weaknesses in an early stage of the development process.

The talk shows pitfalls and common mistakes developers make when building web-based applications and also provides useful hints how to avoid them in an early stage. The talk ends with a quick introduction of horizontal and vertical scaling.

So if you happen to be there, drop me a line or simply come and say hello.

PS: I heard Lenz will give a talk there too. Great news!

Apache load balancer: If a worker doesn't show up....

Since the Apache load balancer seems to be my topic of the week, let's focus on another related question: What happens if a worker (backend server) doesn't show up for work?

Let's say server B needed to go down for maintenance and is no longer available for the cluster:

loadbalancer2bsc.jpg

For this example I simply shut server B's Apache daemon down. I made no other changes to my configuration. And voila:

# repeat 12 lynx -source http://loadbalancer
This is A.
This is C.
This is D.
This is A.
This is C.
This is D.
This is A.
This is C.
This is D.
This is A.
This is C.
This is D.

The load balancer automatically notices that server B isn't available any more and simply skips it while cycling though his list of workers.

After getting server B up again, it takes 60 seconds (configurable default value) until server B shows up again in my cluster:

# repeat 12 lynx -source http://loadbalancer
This is A.
This is B.
This is C.
This is D.
This is A.
This is B.
This is C.
This is D.
This is A.
This is B.
This is C.
This is D.

Nice.

Wednesday Oct 28, 2009

Apache load balancer: Redirections pwned

HTTP load balancers have one natural enemy: redirections. For example, a "trailing slash" redirect is issued when the server receives a request for a URL http://servername/dir where dir is a directory. In such a case the server redirects the browser to http://servername/dirname/ (including the trailing slash):

# lynx -mime_header http://loadbalancer/dir | egrep Location:
Location: http://serverA/dir/
# lynx -mime_header http://loadbalancer/dir | egrep Location:
Location: http://serverB/dir/

Accessing http://loadbalancer/dir will result in a redirect to http://serverA/dir/ (if it's serverA's turn) instead of http://loadbalancer/dir/. This happens because serverA simply doesn't know about the load balancer at all.

The solution is to tell the load balancer to rewrite all serverX addresses to the load balancer's address:

	ProxyPassReverse / http://serverA/
	ProxyPassReverse / http://serverB/
	ProxyPassReverse / http://serverC/
	ProxyPassReverse / http://serverD/

Now all server generated redirects will get rewritten to the load balancers address:

# lynx -mime_header http://loadbalancer/dir | egrep Location:
Location: http://loadbalancer/dir/

Of course in real life the load balancer address would be something like http://www.sun.com.

About

Kai 'Oswald' Seidler writes about his life as co-founder of Apache Friends, creator of XAMPP, and technology evangelist for web tier products at Sun Microsystems.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today