Tuesday Oct 27, 2009

Easy HTTP load balancing with Apache

Usually a single AMP system is enough to serve - let's say - around 500 concurrent users. Sometimes more, sometimes less, strongly depending on the particular web application, the overall architecture of your system, of course the hardware itself, and how you define "concurrent users".

Nevertheless, if your server gets too slow, you'll need to take actions. You may upgrade your server up to the maximum (aka vertical scaling), optimize your software (aka refactoring), and finally add more servers (aka horizontal scaling). The whole process of horizontal scaling is quite complex and far too much for a single blog post, but here's a first shot. Others will follow.

Today I'll focus on one single aspect of horizontal scaling: an HTTP load balancer.


On the left: a whole crowd of people ready to visit our web site. On the right: our server farm (called workers). And in the middle: our current hero, the load balancer. The purpose of the load balancer (in this case an HTTP load balancer) is to distribute all incoming requests to our backend web servers. The load balancer hides all our backend servers to the public, and from the outside it looks like a single server doing all of the work.

The Recipe

Okay, let's start. Step by step.

  1. Since version 2.2 the Apache web server ships a load balancer module called mod_proxy_balancer. All you need to do is to enable this module and the modules mod_proxy and mod_proxy_http:

    LoadModule proxy_module mod_proxy.so
    LoadModule proxy_http_module mod_proxy_http.so
    LoadModule proxy_balancer_module mod_proxy_balancer.so

    Please don't forget to load mod_proxy_http, because you wouldn't get any error messages if it's not loaded. The balancer just won't work.

  2. Because mod_proxy makes Apache become an (open) proxy server, and open proxy servers are dangerous both to your network and to the Internet at large, I completely disable this feature:

    	ProxyRequests Off
    	<Proxy \*>
    		Order deny,allow
    		Deny from all

    The load balancer doesn't need this feature at all.

  3. Now I need to make sure all my backend web servers have the same content:

    serverA htdocs% cat index.html
    This is A.
    serverB htdocs% cat index.html
    This is B.
    serverC htdocs% cat index.html
    This is C.
    serverD htdocs% cat index.html
    This is D.

    Okay, in this case the content differs, but I need this to show how the load balancer works.

  4. And here's the actual load balancer configuration:

    	<Proxy balancer://clusterABCD>
    		BalancerMember http://serverA
    		BalancerMember http://serverB
    		BalancerMember http://serverC
    		BalancerMember http://serverD
    		Order allow,deny
    		Allow from all
    	ProxyPass / balancer://clusterABCD/

    The <Proxy>...</Proxy> container defines which backend servers belong to my balancer. I chose the name clusterABCD for this server group, but you are free to choose any name you want.

    And the ProxyPass directive instructs the Apache to forward all incoming requests to this group of backend servers.

  5. That's all? Yes, that's all. Here's the prove:

    # repeat 12 lynx -source http://loadbalancer
    This is A.
    This is B.
    This is C.
    This is D.
    This is A.
    This is B.
    This is C.
    This is D.
    This is A.
    This is B.
    This is C.
    This is D.

    Each request to the load balancer is forwarded to one of the backend servers. By default Apache simply counts the number of requests and makes sure every backend server gets the same amount of requests forwarded.

    If you want to know more about available balancing algorithms please refer to Apache's mod_proxy_balancer manual.

Did you ever imagine setting up a load balancer would be this easy? Of course, there is more to say about (HTTP) load balancing and much more about vertical scaling too, but this is only a blog posting and not a place for such an expansive reference. If time and space allows I'll go into further details on this in the near future.

Monday Oct 26, 2009

Roller: Category list with entry counter

Sun Microsystem's blogs.sun.com employee blogging site (affectionately named BSC) uses Apache Roller to manage the site and house all the blogs. Roller is an open source Java blog software that for example also drives the US Government's blog.usa.gov and the IBM Developer Works blogs. But there is one feature I really missed: An entry counter in my sidebar's category list. It's a standard feature in the Wordpress world, but in Roller I didn't found any option enabling such a counter.


But as they say, "If the mountain won't come to Mohammed, then Mohammed will go to the mountain": If the feature will not come to me, then I will need to take care by myself.

And here is my mountain:

#set($rootCategory = $model.weblog.getWeblogCategory("nil"))

#set($cats = $rootCategory.getWeblogCategories())
#foreach($cat in $cats)
  #set($entriesList = $model.weblog.getRecentWeblogEntries($cat.name, 500))
  #set($count = $entriesList.size())
  #if($model.weblogCategory && $model.weblogCategory.path == $cat.path)
    <li class="selected"><a href="$url.category($cat.path)">$cat.name ($count)</a></li>
    <li><a href="$url.category($cat.path)">$cat.name ($count)</a></li>

It's quite easy and straight forward: Get a list of all categories, for each category get all the blog entries (in this case limited to 500, because I don't know how this scales), count the entries, and finally generate some HTML.

If everything would be that easy! :)

Thursday Oct 22, 2009

Typographic headlines with PHP (Part 2)

Yesterday, I started a small tutorial on how to implement typographic headlines with PHP. There were some aspects to be aware of, but in general it was an easy and straight forward process. The final result looked like this:


But there was one big issue I had with my script: It was far to slow (33 requests per second) for use in a production environment. But today, I'll extend my previous script with a simple caching mechanism to make it ready for the real world.

Welcome to the Pleasuredome

Basically that's where I left yesterday:

$font_file="./FFF Tusj.ttf";
$text = "An Example";

$bb = imagettfbbox($font_size,0,$font_file,$text);

$bb_width = $bb[4]-$bb[6];
$bb_height = $bb[3]-$bb[5];

$image  =  imagecreate($bb_width, $bb_height);     

$fillcolor  =  imagecolorallocate($image, 255, 255, 255);    
$fontcolor  =  imagecolorallocate($image, 69, 138, 186);    

imagefill($image, 0, 0, $fillcolor);    

imagettftext($image, $font_size, 0, abs($bb[6]), $bb_height-$bb[3], $fontcolor, 
		$font_file, $text);

header("Content-Type: image/png");

I'll put this in a function - let's say - fancyheadline($text) and will add two lines of code at the beginning of this function and change the last lines a little bit.

The final script will look like this (changes highlighted):

function fancyheadline($text)
		$font_file="./FFF Tusj.ttf";

		$bb = imagettfbbox($font_size,0,$font_file,$text);

		$bb_width = $bb[4]-$bb[6];
		$bb_height = $bb[3]-$bb[5];

		$image  =  imagecreate($bb_width, $bb_height);    

		$fillcolor  =  imagecolorallocate($image, 255, 255, 255);    
		$fontcolor  =  imagecolorallocate($image, 69, 138, 186);    

		imagefill($image, 0, 0, $fillcolor);    

		imagettftext($image, $font_size, 0, abs($bb[6]), $bb_height-$bb[3],
				$fontcolor, $font_file, $text); 

		header("Content-Type: image/png");
	return '<img src="'.$cache.'" alt="'.htmlspecialchars($text).'">';

echo fancyheadline("An Example");

What's happening here is that I first generate a MD5 hash of my headline string $text and check if a cache file containing this hash exists or not. If it doesn't exists, create the file containing the rendered headline image. If it exists, do nothing.

At the end of the script I return a string consisting of an IMG HTML tag which would display the cached image file. (With htmlspecialchars () I convert special characters like ", &, < and > to their corresponding HTML entities, because these chars may break the validity of HTML.)

Before I can run my script, I need to create the cache directory and give it appropriate permissions:

% mkdir cache
% chmod a+rwx cache

On a production server you probably don't want to give write permissions to everyone, but will change the ownership of the directory to www-data or whatever user my web server runs as.

Now it's time to access the script in my trusted browser:


This looks exactly like my original yesterday's example above, but this time the script generates the headline image "in the background" and only outputs some HTML code referring to this image:

<img src="cache/e6cf1c67e6acfa204bb784cd6b25839f.png" alt="An Example">

In other words: I reduced the use and need of PHP as much as possible.

Final words by ApacheBench

First let's check the PHP script itself:

% ab -n 1000 http://demo/headline.php
This is ApacheBench, Version 2.3
Document Path:          /headline.php
Document Length:        71 bytes
Requests per second:    1801.51 [#/sec] (mean)

1800 requests per second. Yes, that's what I wanted to hear. But it's easy to explain: The headline image is generated only once, upon the first request. And for all following requests the script only refers to the already generated image.

And if I benchmark the image itself:

% ab -n 1000 http://demo/cache/e6cf1c67e6acfa204bb784cd6b25839f.png
This is ApacheBench, Version 2.3
Document Path:          /cache/e6cf1c67e6acfa204bb784cd6b25839f.png
Document Length:        7888 bytes
Requests per second:    3308.09 [#/sec] (mean)

Of course, because it's now just static data, and I'll get the most performance possible out of my server.

I started with 33 requests per second and ended somewhere in between of 1800 and 3300 requests per second.

Moving at one million miles an hour... Welcome to the Pleasuredome!

Wednesday Oct 21, 2009

Typographic headlines with PHP (Part 1)

My recent blog post about scaling images with PHP gave me the idea to write something about creating typographic headlines with PHP. At Apache Friends we're using this technique since many years to get rid of the usual boring and everywhere available "web fonts" like Helvetica, Times and Verdana.

For this example I chose the font Tusj by Norwegian graphic designer Magnus Cederholm. Okay, this font will only work for very large headlines, but it's looks cool and it's perfect for this demo's purposes because the TTF file is very huge (1.5 MB) and that makes the processing in PHP quite slow. (Yes, in this case, I want to slow down my PHP script.)

The Basics

First, I define some basic parameter: TTF font file, the font size, and an example text.

$font_file = "./FFF Tusj.ttf";
$font_size = 64;
$text = "An Example";

In the next step I've to find out, how big my image needs to be to take the rendered text. That's not so trivial because it strongly depends on the used characters, the choosen size and of course the font itself. To solve this problem PHP offers the imagettfbbox() function:

$bb = imagettfbbox($font_size, 0, $font_file, $text);
$bb_width = $bb[4]-$bb[6];
$bb_height = $bb[3]-$bb[5];

With imagecreate() I now can create the image using $bb_width and $bb_width for the size:

$image = imagecreate($bb_width, $bb_height);    

Define two colors: one for the background and one for the foreground.

$fillcolor = imagecolorallocate($image, 255, 255, 255);    
$fontcolor = imagecolorallocate($image, 69, 138, 186);    

Fill the background using $fillcolor:

imagefill($image, 0, 0, $fillcolor);    

Render the text to the image:

imagettftext($image, $font_size, 0, abs($bb[6]),$bb_height-$bb[3], $fontcolor,
		$font_file, $text);

I don't want to go into the details of this function, for a detailed explanation of all parameters please refer to the PHP manual.

And finally send it to the browser:

header("Content-Type: image/png");   

Basically that's all you need to do. Here's the output of the above PHP script:


Because I used imagecreate() and imagepng() the file I got is an indexed-colored 8-bit PNG with a file size of 8 KB.

Some Variations

If I use imagecreatetruecolor() and imagepng() I will get a truecolored 24-bit PNG with a file size of 41 KB:


And if I use imagejpeg() instead of imagepng() I will get a truecolored 24-bit JPEG with a quality setting of 100 and a file size of 41 KB:


All three variations look exactly the same, but the indexd-colored 8-bit PNG is the smallest one. So for this purpose an 8-bit PNG seems to be the best choice.

Turning off antialiasing?

By default imagettftext() uses an antialiasing algorithm to smooth the output. Using the negative of a color index turns this feature off:

imagettftext ($image, $font_size, 0, abs($bb[6]),$bb_height-$bb[3], -$font_color, 
		$font_file, $text);

Sometimes (usually in case of small font sizes) this will give you a sharper and better looking result, but in this special case it definitely looks worse:


Welcome to the Real World

As I mentioned in the beginning, I intentionally chose a font based on a very large TTF file, which makes it very expensive for PHP to render a headline. Let's take a look at some quick benchmark results:

% ab -n 1000 http://demo/headline.php
This is ApacheBench, Version 2.3
Benchmarking demo (be patient)
Document Path:          /headline.php
Document Length:        7888 bytes
Requests per second:    33.52 [#/sec] (mean)

Autsch... 33 requests per second is far to slow for a real world scenario. Yes, if I had chosen a smaller font the results would be much better, but probably the script will still be not suitable for use in a production environment. However, a simple caching mechanism should easily solve this issue.

But not today, stay tuned for part 2 of this tutorial. Live long and prosper.

Tuesday Oct 20, 2009

Scaling images in PHP (done right)

Scaling images in PHP is quite easy, but there are some things to consider. (If you're short of time, right at the end you'll find the final script.)

Read the original image with imagecreatefromjpeg()

First of all you'll need to read the original image. If it's a JPEG file the imagecreatefromjpeg() function is the right choice:

	$source_image = imagecreatefromjpeg("osaka.jpg");

If it's a GIF file you'll take imagecreatefromgif(), and if it's a PNG you will prefer imagecreatefrompng().

For this small tutorial I'll use this image from the Osaka Aquarium Kaiyukan.

Get the size of the original image: getimagesize() vs. imagesx() and imagesy()

Reading the image is quite easy, but the first pitfall you'll encounter if you prepare to scale an image, and need to find out the dimensions of the original image.

Most tutorials will propose this way:

	$source_image = imagecreatefromjpeg("osaka.jpg");
	$source_image_size = getimagesize("osaka.jpg");

The problem with the getimagesize() function is that it needs to reopen the file to get the actual image size. This is usually not a big issue if you're reading a local file, but it will get critical if you're reading a file from the network like in:

	$source_image = imagecreatefromjpeg("http://someurl/osaka.jpg");
	$source_image_size = getimagesize("http://someurl/osaka.jpg");

Every time you call getimagesize() the whole image file get transferred over the network, and that quickly became mission-critical. Since you already have the image loaded with imagecreatefromjpeg() there is no need to load it again:

	$source_image = imagecreatefromjpeg("osaka.jpg");
	$source_imagex = imagesx($source_image);
	$source_imagey = imagesy($source_image);

Create the destination image: imagecreate() vs. imagecreatetruecolor()

Now we need to prepare the (scaled) destination image. There are two PHP functions which can be used to create an image: imagecreate() and imagecreatetruecolor(). The first creates a palette based image (with a maximum of 256 different colors), and the second one creates a true color image (with a maximum - as far as I know - of 256\*256\*256 = 16 million colors).

Let's compare the results of both function: On the left imagecreate() and on the right imagecreatetruecolor():


It's obvious: As long as you work with photographic images you'll need more than 256 colors.

So let's decide to use imagecreatetruecolor() and define a target size of 300x200 pixels:

	$dest_imagex = 300;
	$dest_imagey = 200;
	$dest_image = imagecreatetruecolor($dest_imagex, $dest_imagey);

Scale the image: imagecopyresized() vs. imagecopyresampled()

Now it's time to do the actual scaling of the image. Again PHP offers to function for this purpose: imagecopyresized() and imagecopyresampled(). The first one uses a very simple algorithm to scale the image, it's fast but the quality is really poor. The second one uses a better, but slower algorithm, resulting in a very high quality image.

Poor quality, but fast:

	imagecopyresized($dest_image, $source_image, 0, 0, 0, 0, 
				$dest_imagex, $dest_imagey, $source_imagex, $source_imagey);

Best quality, but slow:

	imagecopyresampled($dest_image, $source_image, 0, 0, 0, 0, 
				$dest_imagex, $dest_imagey, $source_imagex, $source_imagey);

I don't want to go into the details of this functions, for a detailed explanation of all parameters please refer to the PHP manual.

Let's compare the results: On the left imagecopyresized() and on the right imagecopyresampled():


Again it's obvious: The quality of imagecopyresampled() is much better. In my opinion there is never a reason to use the faster imagecopyresized(). Why would I ever want a low quality image? Even if it's faster to get?

And finally push the image to the browser: imagejpeg() vs. imagepng()

After scaling the image, we now need to push the image to the user's browser. Probably the most popular image formats in the Internet are currently PNG and JPEG. Both will work great with photographic images but true-colored and loss-less PNG images usually results in larger file sizes than JPEG images.

To send a PNG image (with best compression rate 9) to the browser:

	header("Content-Type: image/png");

Or a JPEG image (with best quality 100):

	header("Content-Type: image/jpeg");

And in comparison, imagepng() on the left vs. imagejpeg() on the right:


Both look absolutely the same, but the JPEG image has a size of 57 KB (using the best quality of 100) and the PNG image is 102 KB big (using the highest available compression rate).

What's the best JPEG quality to choose?

JPEG images are not only smaller but also give you the flexibility to choose the quality and by this indirectly the file size. In PHP you can choose the quality in a range from 0 (worst quality, smaller file) to 100 (best quality, biggest file). Let' take a look.

Quality 100 (57 KB) and quality 80 (16 KB):

scaledjpg-100.jpg scaledjpeg-80.jpg

If you look very carefully at the quality 80 version on the right, you'll see very small artifacts by the JPEG compression.

Quality 60 (12 KB) and quality 40 (8 KB):

scaledjpeg-60.jpg scaledjpeg-40.jpg

The loss of quality gets worse, and in my opinion the quality 40 image on the right looks terrible.

And now the whole script...

Many words end in a small script:

	$source_image = imagecreatefromjpeg("osaka.jpg");
	$source_imagex = imagesx($source_image);
	$source_imagey = imagesy($source_image);

	$dest_imagex = 300;
	$dest_imagey = 200;
	$dest_image = imagecreatetruecolor($dest_imagex, $dest_imagey);

	imagecopyresampled($dest_image, $source_image, 0, 0, 0, 0, $dest_imagex, 
				$dest_imagey, $source_imagex, $source_imagey);

	header("Content-Type: image/jpeg");

In this script I used a quality of 80, that's just my personal preference. You may choose whatever you like. But please, not less than 40.


In many tutorials the PHP script ends with several imagedestroy() function calls. imagedestroy() frees any memory associated with an image. This is a good idea if you sequentially work with different image resources within a single PHP script. But if the imagedestroy() is right at the end of a script, you may omit this function. When the script ends PHP will automatically free any resources.

Wednesday Oct 14, 2009

Easy deploy your web apps with Sun GlassFish Web Stack

Today I want to show you a quick installation walkthrough of Sun GlassFish Web Stack. I'm using Solaris 10 in this walkthrough, but installation on RHEL is absolutely the same. As a small deployment example for a web application I'll do an installation of WordPress.

Web Stack Installation

  1. Okay, first step: Get the Sun GlassFish Web Stack.

    Simply enter http://sun.com/webstack in your browser.


    After clicking on the Get It button, you're asked to pick your platform: Red Hat Enterprise Linux or Solaris 10. (You may wonder why there is no OpenSolias download, that's quite simple: because the Web Stack is already sipped with OpenSolaris (2009.06), all you need to do to install the main components is to do a pkg install amp. And you'll get the Apache, MySQL and PHP components of Web Stack installed right on your system.)

    After picking the platform, you need to decide which kind of distribution you want to download: native packages or the IPS-based distribution:


    The native packaging distribution (aka RPM/SVR4) differs two version: one including Java-based components like Tomcat, GlassFish and Hudson, and one without.

    For this demo I picked my personal favorite, the IPS-based distribution, because it allows me a non-root install and gives me the ability to place my Web Stack anywhere I want in the directory tree of my system. So I don't need to have root access and can install Web Stack as a regular user.

  2. After the download is finished I open a terminal and take a look at my system environment:

    [oswald@localhost ~]$ df -h
    Filesystem            Size  Used Avail Use% Mounted on
                          7.2G  2.3G  4.5G  34% /
    /dev/hda1              99M   12M   83M  13% /boot
    tmpfs                 125M     0  125M   0% /dev/shm
    [oswald@localhost ~]$ pwd
    [oswald@localhost ~]$ id
    uid=500(oswald) gid=500(oswald) groups=500(oswald)
    [oswald@localhost ~]$ ls -l
    total 14452
    -rw-r--r--  1 oswald 14765264 Oct 12 14:43 webstack-image-1.5-b09-redhat-i586.zip

    My current working directory is my home directory. There is enough space left on my file system. I'm a non-root, regular user, and I see the downloaded IPS installer image.

  3. Now I'm unzipping the image to my home directory:

    [oswald@localhost ~]$ unzip -q webstack-image-1.5-b09-redhat-i586.zip 

    This creates a directory named webstack1.5 by default. You can rename it to anything you want or simply keep the default name. In this case I choose to name it demo:

    [oswald@localhost ~]$ mv webstack1.5 demo
  4. Now I change into that directory and start the Update Tool:

    [oswald@localhost ~]$ cd demo
    [oswald@localhost demo]$ bin/updatetool 

    So, what we're seeing here is the update tool. Our current Sun GlassFish Web Stack installation is highlighted on the left hand side. If you have other products installed or multiple Web Stack installs, all these images will show up too. In this case I've only one image and once I click on Available Add-ons.

    Now I pick all the components I want to use in this demo, so I select Apache HTTP server, MySQL server, PHP Server and the PHP MySQL connector. By the way, if I realize I need more packages later on, I can come back here anytime and install whatever I need. For now, I've selected all I want at this point, and all I have to do now is to press the beautiful green install button above the package list.

  5. After downloading, I get back to the shell and use the command line tool pkg to review the list of installed packages:

    oswald@localhost demo]$ bin/pkg list
    NAME (PUBLISHER)                              VERSION         STATE      UFIX
    pkg                                           1.111.3-30.2210 installed  ----
    pkg-toolkit-incorporation                     2.2.1-30.2210   installed  ----
    python2.4-minimal                    installed  ----
    sun-apache22                                  2.2.11-1.5      installed  ----
    sun-mysql51                                   5.1.30-1.5      installed  ----
    sun-mysql51lib                                5.1.30-1.5      installed  ----
    sun-php52                                     5.2.9-1.5       installed  ----
    sun-php52-mysql                               5.2.9-1.5       installed  ----
    sun-wsbase                                    1.5-1.5         installed  ----
    updatetool                                    2.2.1-30.2210   installed  ----
    wxpython2.8-minimal                           2.8.8-30.2210   installed  ----

    You can also use pkg to install components, like in the Update Tool. So for the terminal addicted: if you don't like a GUI, there is also a command-line alternative.

  6. Okay, let's go on. I now start the Apache web server:

    [oswald@localhost demo]$ bin/sun-apache22 start
    Starting apache22

    And to check if the Apache is really running and working I'll take my browser to http://localhost. But wait a second...

    As we probably all know, by default, the Apache web server runs on port 80, which is the default port on which most Web servers run. But in the Unix world all port numbers below 1024 require root permissions to bind. And since I started Apache as a normal non-root user, the Apache will be unable to bind to port 80.

    If you use Web Stack as non-root user, Web Stack will add 10000 to every port number below 1024. So in this case Apache will use 10080 as it's favorite port.

  7. At http://localhost:10080/ we find the welcome page of Sun GlassFish Web Stack:

  8. But what's life with just an Apache, let's also start the MySQL and let the database server join our team:

    [oswald@localhost demo]$ bin/sun-mysql51 start
    Starting mysql51
  9. Because everyone on my system and on my network can now access my MySQL server, I strongly need to secure my installation by setting a password for MySQL's root user:
    [oswald@localhost demo]$ mysql/5.1/bin/mysqladmin -u root password "demo"

    Okay, demo is probably not a secure password for a production environment, but for this demo purposes it's a very appropriate one.

That's all. Your AMP stack is now ready.

Example Deployment

As an example I will now install the famous blogging software WordPress within my new AMP stack. Not because it's so difficult to do, but it's a good and pragmatic way to show the AMP stack is working and out-of-the-box capable to run popular web applications.

  1. First I need to create a database and a user for WordPress to access the database and to store its data.

    Okay, that should be an everyday task for a MySQL DBA:

    [oswald@localhost demo]$ bin/mysql -uroot -pdemo
    Welcome to the MySQL monitor.  Commands end with ; or \\g.
    Your MySQL connection id is 2
    Server version: 5.1.30-log Source distribution
    Type 'help;' or '\\h' for help. Type '\\c' to clear the buffer.
    mysql> CREATE DATABASE wordpress;
    Query OK, 1 row affected (0.01 sec)
    mysql> GRANT ALL PRIVILEGES ON wordpress.\* TO 'wordpress'@'localhost' \\
          IDENTIFIED BY 'demo';
    Query OK, 0 rows affected (0.98 sec)
    mysql> QUIT

    Get into the MySQL monitor, aka the MySQL command line tool, to connect to the database server. I use the CREATE DATABASE statement to create a database named wordpress. And with the GRANT statement I create a user named wordpress identified by the password demo. Again this is probably not a good password for production use, but for a demo this is perfecto.

  2. After changing into Apache's document root directory I wget the latest wordpress version:
    [oswald@localhost demo]$ cd var/apache2/2.2/htdocs
    [oswald@localhost htdocs] wget -q http://wordpress.org/latest.tar.gz

    With GNU tar I extract the archive:

    [oswald@localhost htdocs]$ /usr/sfw/bin/tar xfz wordpress-2.8.4.tar.gz

    This creates a directory called wordpress.

  3. Now I can access my WordPress installation by pointing my browser to http://localhost:10080/wordpress.


    From now on it's very easy. Simply enter the details for the database server connection:

    The database name: wordpress.
    The user name: wordpress.
    The password: demo.
    The database host: localhost.
    And the table prefix wp_.

    And on the next page:


    I choose a name for my blog, and enter my email address. Press Install WordPress. And now it only takes a few moments and the installation is done. (Write down your random admin password, you will probably need it later.)

  4. Now I want to check if the Wordpress installation was successful and again point my browser to http://localhost:10080/wordpress.


    And, voilà, there it is: my shiny new "Sun GlassFish Web Stack" blog. Hooray.

That's all. Welcome to the wonderful world of AMP.

Tuesday Oct 13, 2009

What is Java?

Just learned that the question "what is java" is on place 3 of Google's Zeitgeist ranking 2008 for "what is" questions - right after "what is love" (first place) and "what is life" (second place).

Indeed a good question. Let's take a look a Google's top hits to this question:

About.com (Hit #9)

Java is a computer programming language. It enables programmers to write computer instructions using English based commands, instead of having to write in numeric codes. It’s known as a “high-level” language because it can be read and written easily by humans. Like English, Java has a set of rules that determine how the instructions are written. These rules are known as its “syntax”. Once a program has been written, the high-level instructions are translated into numeric codes that computers can understand and execute. (Source: http://java.about.com/od/gettingstarted/a/whatisjava.htm)

Webopedia.com (Hit #7)

A high-level programming language developed by Sun Microsystems. Java was originally called OAK, and was designed for handheld devices and set-top boxes. Oak was unsuccessful so in 1995 Sun changed the name to Java and modified the language to take advantage of the burgeoning World Wide Web.
(Source: http://www.webopedia.com/TERM/J/Java.html)

Wikipedia.org (Hit #4)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities. Java applications are typically compiled to bytecode (class file) that can run on any Java Virtual Machine (JVM) regardless of computer architecture. (Source: http://en.wikipedia.org/wiki/Java_(programming_language))

Boutell.com (Hit #3)

Java is a technology that allows software designed and written just once for an idealized "virtual machine" to run on a variety of real computers, including Windows PCs, Macintoshes, and Unix computers. On the web, Java is quite popular on web servers, used "under the hood" by many of the largest interactive websites. Here it serves the same role that PHP, ASP or Perl might, although traditionally Java has been used for larger-scale projects.
(Source: http://www.boutell.com/newfaq/definitions/java.html)

Java.com (Hit #1)

What is Java? Java allows you to play online games, chat with people around the world, calculate your mortgage interest, and view images in 3D, just to name a few. It's also integral to the intranet applications and other e-business solutions that are the foundation of corporate computing.
(Source: http://www.java.com/en/download/whatis_java.jsp)

As you see, it's not an easy question to answer!

Thursday Oct 08, 2009

Apache's graceful restart (reprise)

In my last week's blog entry Urban legends: Apache reload(ed) I tried to prove that an Apache reload is quite exactly the same as the restart of an Apache web server.

One of my dear readers - yes, at least someone seems to read this blog - pointed out that this is not always true, and a reload sometimes work and sometimes not. In contrast to a restart, which always work like a charm.

I strongly assume that's a classic observer effect:

In physics, the term observer effect refers to changes that the act of observation will make on the phenomenon being observed.

Let's imagine: You have a web site, you have an Apache web server. With your browser you're on your web site, you change your Apache's configuration, you reload your Apache, you reload your browser, and - surprise - you don't see the new configuration active. You reload your browser again and again. Still, the old configuration. What's wrong?

Okay, let's do this step by step.

  1. My Apache is running:

    # apache2ctl status
                           Apache Server Status for localhost
       Current Time: Friday, 02-Oct-2009 11:46:29 CEST
       Restart Time: Friday, 18-Sep-2009 09:55:56 CEST
       Parent Server Generation: 12
       Server uptime: 14 days 1 hour 50 minutes 32 seconds
       87 requests currently being processed, 89 idle workers

    Three childs are closing the connection (C), 3 are sending a reply to any browser (W), 89 are waiting for a new connection (_) and 81 childs are kept alive by KeepAlive (K). The one red K represents my own bowser's connection.

  2. Now I'm reloading my Apache:

    # /etc/init.d/apache2 reload

    Wait a second, and ask again for the status:

    # apache2ctl status
                           Apache Server Status for localhost
       Current Time: Friday, 02-Oct-2009 11:47:04 CEST
       Restart Time: Friday, 18-Sep-2009 09:55:56 CEST
       Parent Server Generation: 13
       Server uptime: 14 days 1 hour 51 minutes 8 seconds
       71 requests currently being processed, 80 idle workers

    Now, after one or two seconds, 51 Apache childs are waiting for their graceful end. Including the G representing my own bowser's connection.

  3. In parallel I'm reloading my browser (which accesses the web site my Apache's hosting) in a 3 seconds interval.

  4. After 2 minutes I look again at my Apache's status:

    # apache2ctl status
                           Apache Server Status for localhost
       Current Time: Friday, 02-Oct-2009 11:48:40 CEST
       Restart Time: Friday, 18-Sep-2009 09:55:56 CEST
       Parent Server Generation: 13
       Server uptime: 14 days 1 hour 52 minutes 44 seconds
       77 requests currently being processed, 51 idle workers

    There is still one child process waiting for it's graceful end. That's the one I kept alive with of my own browser. And this child still has it's old configuration active and that's why I'll never notice the new config within my own bowser, but everyone else already got the new configuration.

  5. To catch up, I just have to wait at least KeepAlive seconds, and than doing a final reload in my browser.

    # apache2ctl status
                           Apache Server Status for localhost
       Current Time: Friday, 02-Oct-2009 11:49:00 CEST
       Restart Time: Friday, 18-Sep-2009 09:55:56 CEST
       Parent Server Generation: 13
       Server uptime: 14 days 1 hour 53 minutes 4 seconds
       63 requests currently being processed, 76 idle workers

    No more gracefully dying childs anywhere. And finally I noticed the new configuration in my own browser.

That's the reason why people think an Apache reload sometimes work and sometimes not.

Wednesday Oct 07, 2009

Qs about you and Linux

My last blog entry about Linus Torvals' thoughts on the goto statement brought and old email interview back to my mind, which I had the honor to have with him a long time ago in 1994.

From cs.Helsinki.FI!Linus.Torvalds Wed Jun 22 20:45:59 1994
From: torvalds@cs.Helsinki.FI (Linus Torvalds)
Date: Wed, 22 Jun 1994 21:45:52 +0300
In-Reply-To: Kai Seidler's message as of Jun 22, 12:13
X-Mailer: Mail User's Shell (7.2.4 2/2/92)
To: oswald@duplox.wz-berlin.de (Kai Seidler)
Subject: Re: Qs about you and linux
Status: RO

Kai Seidler: "Qs about you and linux" (Jun 22, 12:13):
> Did your role as linux programmer has changed over the time? From the
> alone linux programmer (1991) to a linux god? How much time do you
> spend today in programming in opposition to manage kernel-patches,
> answer stupid questions (about bugs, and like this one :), visit
> congresses?

Oh, it has changed, all right.  In 1991, I essentially coded 8 hours a
day and didn't mind about other people or "secondary" stuff like
portability etc.  As it stands now, I get to code occasionally when I
find some time and have something interesting to do, but most of my
linux time is simply "management" these days.  I'm not wearing any
suits, though :-)

Just reading mail takes about 2 hours a day - I also read the newsgroups
when I can, but that usually means just col.announce and selected
articles from col.development.  Applying patches isn't that bad: I have
people I trust that do the large patches and then I just need to check
them over against obvious problems.  The "un-trusted" patches are much

Actually, the above sounds worse than it is.  The fact is that the basic
kernel mostly works well enough and one reason for me not coding quite
as much as I use to do is simply that the basic functionality that I've
personally always concentrated on doesn't need that much care any more. 
The patches these days are mostly networking and device drivers with the
occasional smaller stuff elsewhere. 

Conferences haven't been a problem until recently, and on the whole they
haven't really proved too distractive.  I don't really like giving
talks, but I like meeting people and traveling and I also feel it's
simething that needs to be done at this point. 

> Why do you, and all the other people, such an enormous work for free?
> What do you mean? Is it fame? What did you get back from the Linux
> community?

Well, the fame certainly doesn't hurt, of course: I expect to be able to
get a good job once I get my studies completed and decide to leave the
university.  But mostly it's just a project I like doing, and one which
people appreciate.  A hobby of the best kind, in short..  I think that's
true for most of the kernel developers.

> How is your relation to the FreeBSD community? As far as I see, the 
> Linux and FreeBSD don't like each other very much, but I may be wrong.
> Maybe it's "only" the old "war" between SysV and BSD?

Actually, we are on a friendly standing with the FreeBSD people (I
haven't been much in contact with NetBSD).  The communities easily get
inte flames over which is better, but I know both the linux and BSD
kernel developers are much too involved with their (our) own projects to
really mind any of the flames. 

I've met with some of the FreeBSD people a few times (on conferences),
and they are nice (jkh has something like 11 cats: I just have two).
It's hard to co-operate too much, though: it takes a lot of time and it
seems to actually be easier just to concentrate on your own project.


That was 15 years ago. Nobody would have thought at that time that Linux would later become such a big competitor for commercial Unix-esque operating systems.

Tuesday Oct 06, 2009

Is goto the root of all evil?

As PHP 5.3 introduced the goto statement the old discussion about the evilness of goto came back to the surface of the Internet.

By coincidence I stumbled over this 6 years old discussion with Linus Torvalds about this topic. He's arguing goto makes the source code more readable, but read his thoughts and decide for yourself:

From: Linus Torvalds
Subject: Re: any chance of 2.6.0-test\*?
Date: 	Sun, 12 Jan 2003 12:22:26 -0800 (PST)

On Sun, 12 Jan 2003, Rob Wilkens wrote:
> However, I have always been taught, and have always believed that
> "goto"s are inherently evil.  They are the creators of spaghetti code

No, you've been brainwashed by CS people who thought that Niklaus Wirth
actually knew what he was talking about. He didn't. He doesn't have a
frigging clue.

> (you start reading through the code to understand it (months or years
> after its written), and suddenly you jump to somewhere totally
> unrelated, and then jump somewhere else backwards, and it all gets ugly
> quickly).  This makes later debugging of code total hell.  

Any if-statement is a goto. As are all structured loops.

And sometimes structure is good. When it's good, you should use it.

And sometimes structure is _bad_, and gets into the way, and using a 
"goto" is just much clearer.

For example, it is quite common to have conditionals THAT DO NOT NEST.

In which case you have two possibilities

 - use goto, and be happy, since it doesn't enforce nesting

	This makes the code _more_ readable, since the code just does what 
	the algorithm says it should do.

 - duplicate the code, and rewrite it in a nesting form so that you can 
   use the structured jumps.

	This often makes the code much LESS readable, harder to maintain, 
	and bigger.

The Pascal language is a prime example of the latter problem. Because it 
doesn't have a "break" statement, loops in (traditional) Pascal end up 
often looking like total shit, because you have to add totally arbitrary 
logic to say "I'm done now".


Read the full discussion at: http://kerneltrap.org/node/553/2131

Friday Oct 02, 2009

Sun GlassFish Web Stack 1.6 as development and testing version available

Great news! Sriram anncounced the availability of the upcoming Sun GlassFish Web Stack 1.6 as development and testing version for Solaris 10 U5+ and RHEL 5.2+.

The development version is currently only available as IPS (Image Packaging System) distribution which basically gives you a small installer like application allowing you to download and pick only those components you're really need. IPS also allows you to pick up updates easily and - for me - the most impressive feature is the non-root install and the ability to place your web stack anywhere you want in the directory tree of your system. So you don't need to have root access and you can install web stack as a regular user. That's quite an amazing feature. And as far as I know no other web stack software for Unix offers this kind of feature.

Find all the details at Siram's blog: Web Stack 1.6 development builds are available for testing.

Wednesday Sep 30, 2009

How PHP handles $variable data?

If you're using PHP you're usually don't care how PHP stores variables internally. But if you start working with references you probably better know what's going on behind the scenes.


(Without) References

Let's assume the following code:


You probably would assume that PHP now keeps the string Zaphod three times in memory. Actually all $a, $b and $c internally(!) reference to the same string Zaphod in memory. See diagram #1.

You, the user, will never know you are actually working with references, because PHP hides this very well. For example: If you change the value of any of this variable the reference just points to the new string:


Now $c internally references to the new value Beeblebrox. See diagram #2.

And after deleting the variable $b:


the variable $a still references to Zaphod. See diagram #3.

With References

If you start using references it's - of course - slightly different. Let's start over:


Now $b and $c are real references to the value of $a. If you output the values you'll see no difference. See diagram #4.

But if you change the value of $b...


...you'll also change the value of $a and $c because both are pointing the the same value (btw, a value is stored in a so called zval structure). See diagram #5.

Deleting a variable is likely the same as without references:


And the variable $b disappears but the value (as long as referenced by any other variable) stays in memory. Diagram #6.


Even if you don't know, you're probably using references all the time. You don't need to use references to save memory, because PHP already uses references internally, and hides this very well.

Monday Sep 28, 2009

Urban legends: Apache reload(ed)

What's the difference between reloading and restarting an Apache web server? If you google for this you'll find a lot of (wrong) information which may sum up like this:

A reload just let Apache re-read it's configuration file, without restarting the Apache. But if you need to do bigger changes to the config, like adding or removing modules or virtual hosts, you'll may need to do a real restart.

Something like this.

If Penn & Teller would care about Apache configuration, they would agree: This is bullshit!

Apache never supported something like a reload mechanism. And therefore there is no such functionality. If you accept this fact, you're a step closer to the truth.


One origin of this legend is probably to be found in the fact that classic Unix daemons have a "reload" mechanism which is triggered by sending an HUP (hang up) signal to the process. A process getting such a HUP signal didn't hang up but reloaded its own configuration file. Without the need of restarting. Later this functionality was made available by the System V init scripts, which are still the most common and popular way of controlling Unix services. That's what we use if we call some script within /etc/init.d, /etc/rc.d, etc.

Most of this scripts are enabling the user (aka root) to start, stop, reload a specific service (aka daemon). And for example if you look into /etc/init.d/crond on a RHEL 5.2 you can track down the reload to a single HUP signal:

echo -n $"Reloading cron daemon configuration: "
killproc crond -HUP

And as for every other system daemon also Apache's init scripts offer the user to reload the Apache web server. An example from Debian 5.0:

# /etc/init.d/apache2 
Usage: /etc/init.d/apache2 {start|stop|restart|reload|...}.

BTW: On OpenSolaris it's called refresh instead of reload. But that's just another wording.

The Truth

If you track down this reload functionality you'll find something like this:

On Debian's /etc/init.d/apache2:

log_daemon_msg "Reloading web server config" "apache2"
$APACHE2CTL graceful $2

Or on OpenSolaris /lib/svc/method/http-apache22:

${APACHE_BIN}/apachectl ${STARTUP_OPTIONS} ${cmd}

So if you track down the "reload" you end up with a "graceful". And now we're at the beginning of this blog entry: Apache never supported something like a reload mechanism. And therefore there is no such functionality.

And "graceful" means according to the Apache HTTP Server 2.2 Documentation:

Graceful Restart: The USR1 or graceful signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they're not serving anything). The parent re-reads its configuration files and re-opens its log files. As each child dies off the parent replaces it with a child from the new generation of the configuration, which begins serving new requests immediately.

So the "reload" ends up in a restart of all the Apache children and from a internal configuration-releated view, a graceful restart is exactly the same as a regular restart. It's just better for the stability of your web site, because the children are ended after finishing their current HTTP request and not terminated while serving a client.

The Conclusion

If you change you Apache's configuration do a "reload" or whatever your system calls it. There is no need for a regular restart. But you may need to wait some seconds until all the tiny Apache children processes catched up with the new configuration.


On my RHEL 5.2 an Apache "reload" actually ends in an HUP signal:

echo -n $"Reloading $prog: "
killproc $httpd -HUP
Which would be right for a usual Unix daemon like crond but in case of Apache this means a restart and not a graceful restart. A graceful restart is triggered by an USR1 signal. Looks like a copy and paste error in RHEL 5.2. Probably this is fixed in newer releases.

Friday Sep 25, 2009

How to upgrade VirtualBox Guest Additions on Solaris/OpenSolaris?

Today I needed to play around work out something on RHEL and OpenSolaris. I have both systems running in a VirtualBox on my Mac and because of the latest update to VirtualBox I was supposed to update also the so called Guest Additions on RHEL and OpenSolaris.

The update went smoothly on RHEL, but on OpenSolaris I got this frightened message:

Current administration requires that a unique instance of the
<SUNWvboxguest> package be created.  However, the maximum number of
instances of the package which may be supported at one time on the
same system has already been met.

No changes were made to the system.

As you already may have noticed I'm not a native English speaker, but is requires a unique instance and maximum number of instances has already been met a proper way to express: The package is already installed, and that's why the package can't get installed another time?

Indeed, as the VirtualBox User Manual (3.0.6) states:

The Guest Additions should be updated by first uninstalling the existing Guest Additions and then installing the new ones. Attempting to install new Guest Additions without removing the existing ones is not possible.

But how? That's not mentioned in the users manual. Why? I'm new to VirtualBox and it's the first time I (want to) upgrade the guest additions.

»Look Dave, I can see you're really upset about this. I honestly think you ought to sit down calmly, take a stress pill, and think things over.«

Okay, you're right, calm down. To cut a long story short:

  1. To uninstall the current package:
    # pkgrm SUNWvboxguest
    The following package is currently installed:
       SUNWvboxguest  Sun VirtualBox Guest Additions
                      (i386) 3.0.4,REV=r50677.2009.
    Do you want to remove this package? [y,n,?,q] y
    ## Removing installed package instance <SUNWvboxguest>
    This package contains scripts which will be executed with super-user
    permission during the process of removing this package.
    Do you want to continue with the removal of this package [y,n,?,q] y
    ## Verifying package <SUNWvboxguest> dependencies in global zone
    ## Processing package information.
    ## Executing preremove script.
    Removal of <SUNWvboxguest> was successful.
  2. And install the new package:
    /media/VBOXADDITIONS_3.0.6_52128# pkgadd -d VBoxSolarisAdditions.pkg 
    The following packages are available:
      1  SUNWvboxguest     Sun VirtualBox Guest Additions
                           (i386) 3.0.6,REV=r52128.2009.
    Select package(s) you wish to process (or 'all' to process
    all packages). (default: all) [?,??,q]: a
    Processing package instance <SUNWvboxguest> from 
    Sun VirtualBox Guest Additions(i386) 3.0.6,REV=r52128.2009.
    VirtualBox Personal Use and Evaluation License (PUEL)
    This package contains scripts which will be executed with super-user
    permission during the process of installing this package.
    Do you want to continue with the installation of <SUNWvboxguest> [y,n,?] y
    Installing Sun VirtualBox Guest Additions as <SUNWvboxguest>
    Please re-login to activate the X11 guest additions.
    If you have just un-installed the previous guest additions a REBOOT is required.
    Installation of <SUNWvboxguest> was successful.
  3. And because I just un-installed the previous guest additions I now initiate reboot:
    # /sbin/init 6
εὕρηκα! ;)

Thursday Sep 24, 2009


I love AWK. It is a wonderful tool for data processing on Unix systems. I truly love it. There is certainly no better tool to process and aggregate log files. I remember back when I introduced AWK to my students, there was always an immediately appreciative murmur in the round when the first AWK scripts showed their power.

And it's so stable. Perhaps even more stable than all the new-fashioned stuff nowadays. I have one AWK monster script running since 1992 unchanged in a productive environment, without ever encountering any problems. No need to change anything after countless system upgrades. A dream within a dream. In the same time PHP would have released in ten new major releases and certainly wouldn't be no longer compatible with itself.


In time of HTML entities, URL encoding and XML you quickly run unto the limits of AWK and it's becoming a very hard job to use AWK in this new context. Ever tried to URL decode or encode some string within AWK? Or process an XML file?


Since PHP 5 the command line interface of PHP offers the wonderful options -R with its siblings -B and -E. With this three fellows, you can run PHP in a quasi AWK mode: -R for a line by line processing of the input stream, the option -B as an equivalent for AWK's BEGIN and -E for AWK's END.

A small example

An example as old as the world: counting lines with AWK.

# time awk 'BEGIN {s=0} {s++} END {print s}' access_log
0.148u 0.040s 0:00.17

The same in awkish PHP:

# time php -B '$s=0;' -R '$s++;' -E 'print "$s\\n";' < access_log
1.064u 0.044s 0:01.20

Yes, of course one would use wc in the real world:

# time wc -l access_log
327970 access_log
0.036u 0.048s 0:00.07

The time command in this examples shows the big drawback of PHP: In comparison to AWK it's really slow. But this is understandable, because PHP is far more complex than AWK and PHP was made for web programming and not as a sysadmin's power tool. Always use a tool for the job it was designed to do. On the other hand PHP has all the functions you absolutely need in today's work environments. (Please don't mention XMLgawk or WebAWK now.)

But to be honest: PHP is only a small blink in the history of the Internet. AWK is the A and O, the beginning and the end, the first and the last. Aho, Weinberger and Kernighan. You're my trinity.


Kai 'Oswald' Seidler writes about his life as co-founder of Apache Friends, creator of XAMPP, and technology evangelist for web tier products at Sun Microsystems.


« July 2016