Wednesday Jan 20, 2010

Good Idea: Python with FastCGI (mod_fcgid)

A couple of days ago, I stumbled over an installation in which CGI was used to run a Python-based web application. Of course the applications ran terribly slow, and as I mentioned earlier in »Save energy! Stop using CGI!«, it's (nowadays) always a bad idea to use CGI. Not only it's tediously slow and bad software design, it's also soooo 90's.

What's the difference between CGI and FastCGI?

Let me use a metaphor to start. Imagine a well...

python_cgi.jpg python_fastcgi.jpg
Here you see the old-fasioned way of CGI: For every request you have to let the bucket all the way down into the well (fork a new process), allowing water to enter the bucket (initialize and execute your application), pull the bucket up to the surface and empty it (send the data to the web server and free all allocated memory). And here is the modern FastCGI way: Install the faucet (start the FastCGI process) and every time you need water, turn it on (connect and send a request), get water (calculate and get the answer), and turn it off (close the connection). No need to fork, initialize your application, and free the allocated memory on every single request.

Okay, seriously, let me show you how this works in practice.

Installing Python

For this demo I use Sun's Web Stack. It's probably the easiest way to demonstrate the performance differences between CGI and FastCGI. XAMPP doesn't support FastCGI, because with mod_perl for Perl and mod_php for PHP there is no real need for a FastCGI interface.

First, let me add Python to my basic web stack installation:

[oswald@sol10u7 ~/webstack1.5]% bin/pkg install sun-python26
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                    1/1   2784/2784   12.65/12.65 

PHASE                                        ACTIONS
Install Phase                              2861/2861 
PHASE                                          ITEMS
Reading Existing Index                           7/7
Indexing Packages                                1/1
[oswald@sol10u7 ~/webstack1.5]% bin/setup-webstack

If you're familiar with Sun's Web Stack, you'll have noticed that I'm using the IPS installation of Web Stack. That's my favorite installation way, because it allows me to place the Web Stack in any directory I want and also allows me to run the stack without the need of root privileges.

Python with CGI

Setting up CGI is very, very easy and probably that's exactly the reason why so many people still use it.

Let me start with a simple "Hello World!" Python CGI script:

#!/home/oswald/webstack1.5/bin/python
print ""
print "Hello World!"

I named this file hello.py and put it into the cgi-bin directory of my Apache installation. In the case of Web Stack it's var/apache2/2.2/cgi-bin. Add execute permissions:

[oswald@sol10u7 ~]% chmod a+x var/apache2/2.2/cgi-bin/hello.py

Now I log into another box on the same network and use my favorite command-line web browser Lynx to test the newly created Hello World CGI:

[oswald@debian50 ~]% lynx -source http://sol10u7/cgi-bin/hello.py
Hello World!

Looks good. Now let's benchmark this script:

[oswald@debian50 ~]% ab -n 1000 http://sol10u7/cgi-bin/hello.py
...
Time taken for tests:   31.083 seconds
...
Total transferred:      256000 bytes
HTML transferred:       13000 bytes
Requests per second:    32.17 [#/sec] (mean)
...

32 requests/second. That's nothing to be proud of!

Python with FastCGI

And now let's try FastCGI by adding Apache's mod_fcgid to the Web Stack installation:

[oswald@sol10u7 ~/webstack1.5]% bin/pkg install sun-apache22-fcgid 
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                    1/1         6/6     0.09/0.09 

PHASE                                        ACTIONS
Install Phase                                  24/24 
PHASE                                          ITEMS
Reading Existing Index                           7/7
Indexing Packages                                1/1
[oswald@sol10u7 ~/webstack1.5]% bin/setup-webstack

Activate the default configuration:

[oswald@sol10u7 ~/webstack1.5]% cp etc/apache2/2.2/samples-conf.d/fcgid.conf etc/apache2/
2.2/conf.d/

For those, who are not able or don't want to use Sun's Web Stack, the above fcgid.conf file basically contains the following directives:

LoadModule fcgid_module libexec/mod_fcgid.so
SharememPath /home/oswald/webstack1.5/var/run/apache2/2.2/fcgid_shm
SocketPath /home/oswald/webstack1.5/var/run/apache2/2.2/fcgid.sock
AddHandler fcgid-script .fcgi
<Location /fcgid>
    SetHandler fcgid-script
    Options ExecCGI
    allow from all
</Location>

As usual after changing Apache's configuration, we need to reload (aka graceful restart) the Apache to let the new configuration take effect:

[oswald@sol10u7 ~/webstack1.5]% apache2/2.2/bin/apachectl graceful

Now I create a new directory named fcgid directly inside of Apache's document root folder and change into that folder:

[oswald@sol10u7 ~/webstack1.5]% mkdir var/apache2/2.2/htdocs/fcgid
[oswald@sol10u7 ~/webstack1.5]% cd var/apache2/2.2/htdocs/fcgid

To let Python to talk with my Apache's mod_fcgid I need to install a so-called Python FastCGI/WSGI gateway. There are several solutions available for Python, but I personally prefer Allan Saddi's fcgi.py:

[oswald@sol10u7 htdocs/fcgid]% wget -q http://svn.saddi.com/py-lib/trunk/fcgi.py

The "Hello World!" Python FastCGI script looks a little different this time:

#!/home/oswald/webstack1.5/bin/python
from fcgi import WSGIServer
def app(environ, start_response):
	start_response('200 OK', [('Content-Type', 'text/html')])
	return('''Hello world!\\n''')
WSGIServer(app).run()

This time it's not the output of a script which is sent back to the browser, it's the return value of a function add() defining the data which goes to the user's browser. In this case it's the simple character string "Hello World!\\n".

Like in the CGI example above, the Python script needs to be executable:

[oswald@sol10u7 htdocs/fcgid]% chmod a+x hello.py

The content of my fcgid directory now looks like this:

[oswald@sol10u7 htdocs/fcgid]% ls -l
total 90
-rw-r--r--   1 oswald   other      44113 Jul 26  2006 fcgi.py
-rwxr-xr-x   1 oswald   other        223 Jan 19 12:48 hello.py

And - like in my CGI example above - I now test the script with Lynx:

[oswald@debian50 ~]% lynx -source http://sol10u7/fcgid/hello.py
Hello world!

And after everything looks fine, I start a little benchmark:

[oswald@debian50 ~]% ab -q -n 1000 http://sol10u7/fcgid/hello.py
...
Time taken for tests:   1.747 seconds
...
Total transferred:      235000 bytes
HTML transferred:       13000 bytes
Requests per second:    572.44 [#/sec] (mean)
...

Yes, gotcha. 572 requests per seconds: that sounds reasonable. Remember the 32 requests/second from CGI? Do you want the well or do you take the faucet? Sure, implementing a FastCGI program is far more challenging then coding a simple CGI solution, but 572 against 32 requests per second? Do I need to say more?

Fotos: On the right "Faucet" by Joe Shlabotnik, and on the left "Well" by echiner1. Both licensed under Creative Commons.

Thursday Jan 07, 2010

Cache, cache, cache! (Part 3: What to cache?)

Happy New Year everyone! Hope you could enjoy your holidays!!

Let's start this year with the third part of my little series of thoughts about caching. After my small memcached intro and thoughts about caching architectures, I now focus on the data you should consider to cache in your web application.

cache-what.png

[1] Cache HTML

Obviously the biggest performance win you can achieve is by caching the whole output of your web application: a simple reverse proxy scenario. This works very well for mostly static pages, but for highly dynamical and user-specific content this is not an option: there is no advantage in caching a web page, which gets obsolete within the next moment.

Probably the best way to solve this dilemma is to implement a so-called partial-page cache: Let your application cache just portions of the page and leave the rest, where it makes no sense to cache, dynamic.

It's very important that you implement this in a very top layer of your application. Probably exactly that layer, which software architects will call presentation layer. Sure, this is likely to break you framework architecture, but to quote chapter 55 of the Tao Te Ching:

The movement of the Tao
By contraries proceeds;
And weakness marks the course
Of Tao's mighty deeds.

But seriously: If you have to stay in the boundaries of a framework, Ajax is a good way to bypass this restrictions and helps to implement such a cache in a restricted architecture. But be aware that this will raise the number of HTTP requests on your frontend web servers.

An effective caching strategy will always mess your beautifully designed software architecture up. Having just one (central) caching layer looks great in system diagrams and it's better than no cache at all, but it's definitely not the end of the rope.

[2] Cache complex data structures

If you don't want to break your framework architecture or you don't like the idea of caching HTML at all, and I totally understand your point, you should consider about caching other (lower level, but still complex) structures of data.

Some examples for suitable data structures:

  • user profiles
  • friends lists
  • current user lists
  • list of locations, branches, countries, languages, ...
  • top 10 (whatever) lists
  • public statistical data
  • ...

The main challenge lies in identifying the most proper data structures. This is no easy task and strongly depends on the kind of web application you run or plan to run. Avoid caching simple data sets, like row-level data from the database. Don't think row-level. That's the best advice you should keep in mind. (Note to myself: I need to put this on a t-shirt. I found this phrase in Memcached Internals, a wonderful article inspired by a talk by Brian Aker and Alan Kasindorf.)

At a first glance Ajax may be an obvious technology to combine with such a cache. But please be aware that moving application logic away from the server-side application to the client side is always a very dangerous task, which easily may compromise the security of your application.

Which allows me to end this post with another quote from Laozi (Tao Te Ching, chapter 63):

All difficult things in the world
are sure to arise from a previous state
in which they were easy.
About

Kai 'Oswald' Seidler writes about his life as co-founder of Apache Friends, creator of XAMPP, and technology evangelist for web tier products at Sun Microsystems.

Search

Categories
Archives
« January 2010
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
8
9
10
11
12
14
15
16
17
18
19
21
22
23
24
25
26
27
28
29
30
31
      
Today