Thursday Oct 29, 2009

Apache load balancer: If a worker doesn't show up....

Since the Apache load balancer seems to be my topic of the week, let's focus on another related question: What happens if a worker (backend server) doesn't show up for work?

Let's say server B needed to go down for maintenance and is no longer available for the cluster:

loadbalancer2bsc.jpg

For this example I simply shut server B's Apache daemon down. I made no other changes to my configuration. And voila:

# repeat 12 lynx -source http://loadbalancer
This is A.
This is C.
This is D.
This is A.
This is C.
This is D.
This is A.
This is C.
This is D.
This is A.
This is C.
This is D.

The load balancer automatically notices that server B isn't available any more and simply skips it while cycling though his list of workers.

After getting server B up again, it takes 60 seconds (configurable default value) until server B shows up again in my cluster:

# repeat 12 lynx -source http://loadbalancer
This is A.
This is B.
This is C.
This is D.
This is A.
This is B.
This is C.
This is D.
This is A.
This is B.
This is C.
This is D.

Nice.

Wednesday Oct 28, 2009

Apache load balancer: Redirections pwned

HTTP load balancers have one natural enemy: redirections. For example, a "trailing slash" redirect is issued when the server receives a request for a URL http://servername/dir where dir is a directory. In such a case the server redirects the browser to http://servername/dirname/ (including the trailing slash):

# lynx -mime_header http://loadbalancer/dir | egrep Location:
Location: http://serverA/dir/
# lynx -mime_header http://loadbalancer/dir | egrep Location:
Location: http://serverB/dir/

Accessing http://loadbalancer/dir will result in a redirect to http://serverA/dir/ (if it's serverA's turn) instead of http://loadbalancer/dir/. This happens because serverA simply doesn't know about the load balancer at all.

The solution is to tell the load balancer to rewrite all serverX addresses to the load balancer's address:

	ProxyPassReverse / http://serverA/
	ProxyPassReverse / http://serverB/
	ProxyPassReverse / http://serverC/
	ProxyPassReverse / http://serverD/

Now all server generated redirects will get rewritten to the load balancers address:

# lynx -mime_header http://loadbalancer/dir | egrep Location:
Location: http://loadbalancer/dir/

Of course in real life the load balancer address would be something like http://www.sun.com.

Tuesday Oct 27, 2009

Easy HTTP load balancing with Apache

Usually a single AMP system is enough to serve - let's say - around 500 concurrent users. Sometimes more, sometimes less, strongly depending on the particular web application, the overall architecture of your system, of course the hardware itself, and how you define "concurrent users".

Nevertheless, if your server gets too slow, you'll need to take actions. You may upgrade your server up to the maximum (aka vertical scaling), optimize your software (aka refactoring), and finally add more servers (aka horizontal scaling). The whole process of horizontal scaling is quite complex and far too much for a single blog post, but here's a first shot. Others will follow.

Today I'll focus on one single aspect of horizontal scaling: an HTTP load balancer.

loadbalancer1bsc.jpg

On the left: a whole crowd of people ready to visit our web site. On the right: our server farm (called workers). And in the middle: our current hero, the load balancer. The purpose of the load balancer (in this case an HTTP load balancer) is to distribute all incoming requests to our backend web servers. The load balancer hides all our backend servers to the public, and from the outside it looks like a single server doing all of the work.

The Recipe

Okay, let's start. Step by step.

  1. Since version 2.2 the Apache web server ships a load balancer module called mod_proxy_balancer. All you need to do is to enable this module and the modules mod_proxy and mod_proxy_http:

    LoadModule proxy_module mod_proxy.so
    LoadModule proxy_http_module mod_proxy_http.so
    LoadModule proxy_balancer_module mod_proxy_balancer.so

    Please don't forget to load mod_proxy_http, because you wouldn't get any error messages if it's not loaded. The balancer just won't work.

  2. Because mod_proxy makes Apache become an (open) proxy server, and open proxy servers are dangerous both to your network and to the Internet at large, I completely disable this feature:

    	ProxyRequests Off
    	<Proxy \*>
    		Order deny,allow
    		Deny from all
    	</Proxy>
    

    The load balancer doesn't need this feature at all.

  3. Now I need to make sure all my backend web servers have the same content:

    serverA htdocs% cat index.html
    This is A.
    serverB htdocs% cat index.html
    This is B.
    serverC htdocs% cat index.html
    This is C.
    serverD htdocs% cat index.html
    This is D.

    Okay, in this case the content differs, but I need this to show how the load balancer works.

  4. And here's the actual load balancer configuration:

    	<Proxy balancer://clusterABCD>
    		BalancerMember http://serverA
    		BalancerMember http://serverB
    		BalancerMember http://serverC
    		BalancerMember http://serverD
    		Order allow,deny
    		Allow from all
    	</Proxy>
    	ProxyPass / balancer://clusterABCD/
    

    The <Proxy>...</Proxy> container defines which backend servers belong to my balancer. I chose the name clusterABCD for this server group, but you are free to choose any name you want.

    And the ProxyPass directive instructs the Apache to forward all incoming requests to this group of backend servers.

  5. That's all? Yes, that's all. Here's the prove:

    # repeat 12 lynx -source http://loadbalancer
    This is A.
    This is B.
    This is C.
    This is D.
    This is A.
    This is B.
    This is C.
    This is D.
    This is A.
    This is B.
    This is C.
    This is D.

    Each request to the load balancer is forwarded to one of the backend servers. By default Apache simply counts the number of requests and makes sure every backend server gets the same amount of requests forwarded.

    If you want to know more about available balancing algorithms please refer to Apache's mod_proxy_balancer manual.

Did you ever imagine setting up a load balancer would be this easy? Of course, there is more to say about (HTTP) load balancing and much more about vertical scaling too, but this is only a blog posting and not a place for such an expansive reference. If time and space allows I'll go into further details on this in the near future.

Thursday Oct 08, 2009

Apache's graceful restart (reprise)

In my last week's blog entry Urban legends: Apache reload(ed) I tried to prove that an Apache reload is quite exactly the same as the restart of an Apache web server.

One of my dear readers - yes, at least someone seems to read this blog - pointed out that this is not always true, and a reload sometimes work and sometimes not. In contrast to a restart, which always work like a charm.

I strongly assume that's a classic observer effect:

In physics, the term observer effect refers to changes that the act of observation will make on the phenomenon being observed.

Let's imagine: You have a web site, you have an Apache web server. With your browser you're on your web site, you change your Apache's configuration, you reload your Apache, you reload your browser, and - surprise - you don't see the new configuration active. You reload your browser again and again. Still, the old configuration. What's wrong?

Okay, let's do this step by step.

  1. My Apache is running:

    # apache2ctl status
                           Apache Server Status for localhost
    
       Current Time: Friday, 02-Oct-2009 11:46:29 CEST
       Restart Time: Friday, 18-Sep-2009 09:55:56 CEST
       Parent Server Generation: 12
       Server uptime: 14 days 1 hour 50 minutes 32 seconds
       87 requests currently being processed, 89 idle workers
    
    ..KK_._K_K.KK_._KK.KKK.K_K.K_K_._KKKK._KKK__._KK_C_KK_.KK.C_K__K
    K_K.K_K_KKK_KK_C_KKK__K_KKKK.KWK__KKKK.K_.__..K.._..___K___W__KK
    W_K_K_K_._K_K____K_K__K__K..K_KK__K______K_KK_K_K____K___K___.__
    .____K_K__._.K..................................................
    

    Three childs are closing the connection (C), 3 are sending a reply to any browser (W), 89 are waiting for a new connection (_) and 81 childs are kept alive by KeepAlive (K). The one red K represents my own bowser's connection.

  2. Now I'm reloading my Apache:

    # /etc/init.d/apache2 reload

    Wait a second, and ask again for the status:

    # apache2ctl status
                           Apache Server Status for localhost
    
       Current Time: Friday, 02-Oct-2009 11:47:04 CEST
       Restart Time: Friday, 18-Sep-2009 09:55:56 CEST
       Parent Server Generation: 13
       Server uptime: 14 days 1 hour 51 minutes 8 seconds
       71 requests currently being processed, 80 idle workers
    
    G.GG_.__GG.GG_._GG.___.___.____._GG__.__GGG_.K_GGK__K_.G_.___K_G
    ___.G___GGG_____KG__GG__.GG_.__GG__KG_.__._W.._.._..___GG__K_GG_
    _K__G_G_.GK_K____GKG__G___..K..G.KK..G...K.KGGWGC....KG..G......
    .GG...._G.._.G..................................................
    

    Now, after one or two seconds, 51 Apache childs are waiting for their graceful end. Including the G representing my own bowser's connection.

  3. In parallel I'm reloading my browser (which accesses the web site my Apache's hosting) in a 3 seconds interval.

  4. After 2 minutes I look again at my Apache's status:

    # apache2ctl status
                           Apache Server Status for localhost
    
       Current Time: Friday, 02-Oct-2009 11:48:40 CEST
       Restart Time: Friday, 18-Sep-2009 09:55:56 CEST
       Parent Server Generation: 13
       Server uptime: 14 days 1 hour 52 minutes 44 seconds
       77 requests currently being processed, 51 idle workers
    
    KKK__KKKWKK__K_K..W_K_KKK___K_K____K_K_K__.K_K.._K_K.K..KWK._KKK
    .K_._KKK_KK___..KG_K_KK___.K.KK__KK_K.KK_K_W..K.....__K..W__K..W
    _KK_._....K.KKK__.K.KK.K_K..K...._W......K.K.._.K....K..........
    .......K...K....................................................

    There is still one child process waiting for it's graceful end. That's the one I kept alive with of my own browser. And this child still has it's old configuration active and that's why I'll never notice the new config within my own bowser, but everyone else already got the new configuration.

  5. To catch up, I just have to wait at least KeepAlive seconds, and than doing a final reload in my browser.

    # apache2ctl status
                           Apache Server Status for localhost
    
       Current Time: Friday, 02-Oct-2009 11:49:00 CEST
       Restart Time: Friday, 18-Sep-2009 09:55:56 CEST
       Parent Server Generation: 13
       Server uptime: 14 days 1 hour 53 minutes 4 seconds
       63 requests currently being processed, 76 idle workers
    
    .K__KKKK_K_K______..KKK______W_K_C_K___KK.___KKK.___K_K__K__KKK.
    W_KC_KK_CK._K_KK___K_KK____._KK___K_____C__K__CK.___KK_K___K_KK_
    C_K..K__K.C..K..__......K......_..K....._....K......C.K.K._.....
    ................................................................

    No more gracefully dying childs anywhere. And finally I noticed the new configuration in my own browser.

That's the reason why people think an Apache reload sometimes work and sometimes not.

Monday Sep 28, 2009

Urban legends: Apache reload(ed)

What's the difference between reloading and restarting an Apache web server? If you google for this you'll find a lot of (wrong) information which may sum up like this:

A reload just let Apache re-read it's configuration file, without restarting the Apache. But if you need to do bigger changes to the config, like adding or removing modules or virtual hosts, you'll may need to do a real restart.

Something like this.

If Penn & Teller would care about Apache configuration, they would agree: This is bullshit!

Apache never supported something like a reload mechanism. And therefore there is no such functionality. If you accept this fact, you're a step closer to the truth.

Origins

One origin of this legend is probably to be found in the fact that classic Unix daemons have a "reload" mechanism which is triggered by sending an HUP (hang up) signal to the process. A process getting such a HUP signal didn't hang up but reloaded its own configuration file. Without the need of restarting. Later this functionality was made available by the System V init scripts, which are still the most common and popular way of controlling Unix services. That's what we use if we call some script within /etc/init.d, /etc/rc.d, etc.

Most of this scripts are enabling the user (aka root) to start, stop, reload a specific service (aka daemon). And for example if you look into /etc/init.d/crond on a RHEL 5.2 you can track down the reload to a single HUP signal:

echo -n $"Reloading cron daemon configuration: "
killproc crond -HUP

And as for every other system daemon also Apache's init scripts offer the user to reload the Apache web server. An example from Debian 5.0:

# /etc/init.d/apache2 
Usage: /etc/init.d/apache2 {start|stop|restart|reload|...}.

BTW: On OpenSolaris it's called refresh instead of reload. But that's just another wording.

The Truth

If you track down this reload functionality you'll find something like this:

On Debian's /etc/init.d/apache2:

log_daemon_msg "Reloading web server config" "apache2"
$APACHE2CTL graceful $2

Or on OpenSolaris /lib/svc/method/http-apache22:

cmd="graceful"
${APACHE_BIN}/apachectl ${STARTUP_OPTIONS} ${cmd}

So if you track down the "reload" you end up with a "graceful". And now we're at the beginning of this blog entry: Apache never supported something like a reload mechanism. And therefore there is no such functionality.

And "graceful" means according to the Apache HTTP Server 2.2 Documentation:

Graceful Restart: The USR1 or graceful signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they're not serving anything). The parent re-reads its configuration files and re-opens its log files. As each child dies off the parent replaces it with a child from the new generation of the configuration, which begins serving new requests immediately.

So the "reload" ends up in a restart of all the Apache children and from a internal configuration-releated view, a graceful restart is exactly the same as a regular restart. It's just better for the stability of your web site, because the children are ended after finishing their current HTTP request and not terminated while serving a client.

The Conclusion

If you change you Apache's configuration do a "reload" or whatever your system calls it. There is no need for a regular restart. But you may need to wait some seconds until all the tiny Apache children processes catched up with the new configuration.

Postscript

On my RHEL 5.2 an Apache "reload" actually ends in an HUP signal:

echo -n $"Reloading $prog: "
killproc $httpd -HUP
Which would be right for a usual Unix daemon like crond but in case of Apache this means a restart and not a graceful restart. A graceful restart is triggered by an USR1 signal. Looks like a copy and paste error in RHEL 5.2. Probably this is fixed in newer releases.

Friday Sep 18, 2009

Save energy! Stop using CGI!

The day CGI was invented was a great day for the Internet, but a dark day for the history of how-to-do-thinks-right. CGI was great, because it gave us (standard computer nerds) the ability to easily implement dynamically generated HTML pages - the predecessor of todays web applications. The interface was so ingeniously simple and powerful that everyone could use his favorite programming language for implementing web services. You could use C, Perl, AWK, PostScript or even the Bourne Shell to write your web application. And people did.

However, there is another side, a darker side, of CGI. It is not just ingeniously simple and powerful, in fact, it's also terribly slow. For every incoming HTTP request your system needs to fork a new process, and forking a process is an extremely expensive operation.

It was a great blessing, but nowadays there are so many programming languages especially suited for web development - designed to run in a web server environment - armed with the features and functions best fitting in the needs of a web application. I say PHP, you say Java. Whatever.

But if you stuck to a programming language of the past, please don't use CGI any more, use FastCGI or something comparable to connect your application with your web server. Or if you're lucky and your programming language is already aware of the web, there may be an even easier and better way.

For example, if your're a real Perl programmer, then use mod_perl. It's so simple to run unaltered(!) Perl-CGI scripts with mod_perl:

PerlModule ModPerl::PerlRun
<Files ~ "\\.pl$">
      SetHandler perl-script
      PerlResponseHandler ModPerl::PerlRun
      PerlOptions +ParseHeaders
      Options +ExecCGI
</Files>

Compared to a simple Hello World in Perl with CGI...

% ab -n 1000 http://demo/helloworld.pl
...
Requests per second:    304.84 [#/sec] (mean)
...

... the same script with mod_perl...

% ab -n 1000 http://demo/helloworld.pl
...
Requests per second:    956.16 [#/sec] (mean)
...

...is three times faster. Without touching your Perl script.

And if I actually write a Hello World which takes advantage of mod_perl, I get:

# ab -n 1000 http://demo/helloworld
...
Requests per second:    1777.50 [#/sec] (mean)
...

Nearly six times faster, and that's only a simple example script!

Stop using CGI! Green your IT! Save energy! Save the world! Save the Cheerleader!

Wednesday Sep 16, 2009

Hide X-Powered-By: PHP

About two weeks ago I showed a simple way to beautify your URLs and hide the use of PHP as the backend of your web site. Since I got a lot of emails from people indirectly asking me how to also hide the X-Powered-By: PHP header which is still showing up in ones web server's HTTP response.

As so much in our beautiful world of IT it's very easy:

  1. Open your php.ini in the editor of your trust.
  2. Find this line:
    ; Decides whether PHP may expose the fact that it is installed on the server
    ; (e.g. by adding its signature to the Web server header).  It is no security
    ; threat in any way, but it makes it possible to determine whether you use PHP
    ; on your server or not.
    expose_php = On
    
  3. Change the On to an Off:
    expose_php = Off
    
  4. Reload your Apache and the X-Powered-By: PHP header is gone.

Very easy, no magic(k), and no rocket science.

Wednesday Sep 02, 2009

Easily boost your AMP website combo with mod_rewrite

The Scenario

Imagine: You run a popular website using AMP technologies, but your hardware is awfully old and over the years, as your website became more and more popular, it's got slower and slower and slower. Today it runs with a load of 42 and smoke pours out of the TCP ports. And because you're a silly idealistic fool you've no money for a new hardware or to pay a reasonable hosting service. What now?

The answer is in the book, the book of caches.

The Example

Let me set up a simple example:

You've a PHP (or whatever) based system for your website and your requests usually look like:

http://domain/index.php?page=welcome

To get human readable and more SEOish addresses you're already using Apache's mod_rewrite in your .htaccess:

RewriteEngine on
RewriteRule \^([\^/]\*).html$ /index.php?page=$1 [L]
Now all your website's URLs look like static HTML pages:
http://domain/welcome.html

Perfecto.

Now let's focus on the backend. For this example I'll use the following very(!) simple(!) PHP code:

<?php
    $mysqli=mysqli_connect('localhost','user','password','db');

    if (!$mysqli)
        die("Can't connect to MySQL: ".mysqli_connect_error());

    include("header.php");
    include("navigation.php");

    $stmt = $mysqli->prepare("SELECT content FROM pages WHERE name=?");
    $stmt->bind_param('s', $_REQUEST['page']);
    $stmt->execute();
    $stmt->bind_result($content);
    $stmt->fetch();

    echo $content;

    $stmt->close();
    $mysqli->close();

    include("footer.php");
?>

The MySQL table pages looks like this:

+----+---------+-----------------------------------------+
| id | name    | content                                 |
+----+---------+-----------------------------------------+
|  1 | welcome | Dear Traveler, welcome to my AMP world! | 
|  2 | about   | This is about AMP!                      | 
|  3 | team    | Apache, MySQL, and PHP.                 | 
+----+---------+-----------------------------------------+

The files header.php, navigation.php and footer.php contain some mix of PHP and HTML to build the navigation and some basic page layout.

Everything put together may look like this in a browser:

screenshot.png

The Benchmark

Now, I'm using ApacheBench (included in every Apache installation) and fire 1000 sequential request at my website:
% ab -n 1000 http://demo/welcome.html
...
Requests per second:    397.25 [#/sec] (mean)
...
In this case my AMP system was able to serve 397 request per second. Not bad, but it's also a very(!) simple(!) PHP script.

Setting up the cache

First, I add some lines of code to my previous PHP script.

One line just before the include("header.php") statement:

    ob_start();

And this four lines at the end just after the include("footer.php") statement:

    $output=ob_get_contents();
    file_put_contents("cache/".basename($_REQUEST['page']).".html", $output);
    ob_end_clean();
    echo $output;

ob_start() instructs PHP keep the generated output into an internal buffer. And the last 4 lines tell PHP to save this buffer into a file, for example: cache/welcome.html.

Now I create a directory called cache next to my index.php file and make sure my Apache is able to write and access that directory:
% mkdir cache
% chmod a+rwx cache
If I now reload my welcome page in my browser, a file named welcome.html gets created in this cache directory:
% ls -l cache
total 4
-rw-r--r-- 1 www-data www-data 732 2009-09-02 13:02 welcome.html
Now I add this lines to my mod_rewrite configuration (new lines highlighted):
RewriteEngine on

RewriteCond %{REQUEST_URI} \\.html$
RewriteCond %{DOCUMENT_ROOT}/cache/%{REQUEST_URI} -s
RewriteRule . /cache/%{REQUEST_URI} [L]

RewriteRule \^([\^/]\*).html$ /index.php?page=$1 [L]

These three lines reads like this: (first line) For all requests ending with ".html": (second line) If there is a file in the cache directory, named exactly like the resource my web server's got asked for, than (third line) send this file to the browser. If there is no such file, continue with calling the PHP script.

The Rerun of the Benchmark

That's all, now I rerun my benchmark from earlier:
% ab -n 1000 http://demo/welcome.html
...
Requests per second:    1287.57 [#/sec] (mean)
...
Wow, that's about three times faster as the regular PHP version. And in this example I'm using a very(!) simple(!) PHP script. On a more complex system, the boost will be much higher. For example on www.apachefriends.org we're using a cache based on this recipe and we got a performance win of 300 times. (That's because we have a very complex - some may say crappy - CMS running.)

Pros & Cons

Pros:
  • quite easy to set up
  • no additional software is needed, just an Apache with mod_rewrite
  • very high performance win on slow systems
  • works with every web programming language, not only PHP
Cons:
  • the cache will never refresh
  • the system doesn't work with user sessions
But all these drawbacks can be relatively easily solved by adding some more lines of program code or mod_rewrite configurations.

Friday Aug 28, 2009

Apache: No listening sockets available, shutting down

One of the advantages of maintaing a popular project like XAMPP is the immense amount of community feedback you get. In other words: your inbox is a daily challenge. PHP 5.3 incompatibilities are hot topics these days. Here's an Apache related one which hits my inbox on a very regular basis.

Variant 1

Apache complains about:

(98)Address already in use: make_sock: could not bind to address [::]:80
no listening sockets available, shutting down
Unable to open logs
The "Unable to open logs" in this error message is very confusing and often points people in the wrong direction. There is nothing wrong with directory or file permissions, in this case there is simply another process already occupying port 80 and that prevents Apache from binding to this port. Find out which process is using this port and stop it. Alternatively let Apache use another port.

Variant 2

Actually if it's a permission issue you will see:

(13)Permission denied: make_sock: could not bind to address [::]:80
no listening sockets available, shutting down
Unable to open logs
Here Apache tried to bind port 80, but got rejected by the system because the Apache wasn't started as root. On Unix systems only processes by the user root are allowed to bind ports below 1024. To fix this: Let Apache bind to a port above 1024 or start Apache as root.

Variant 3

If you just get this error, without "no listening sockets available...":
(98)Address already in use: make_sock: could not bind to address [::]:80
Looks very similar to the first variant, but in this case you probably just have multiple "Listen 80" directives in you Apache's configuration. Become aware of all Include directives in your httpd.conf and remove all Listen duplicates.
About

Kai 'Oswald' Seidler writes about his life as co-founder of Apache Friends, creator of XAMPP, and technology evangelist for web tier products at Sun Microsystems.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today