Wednesday Jun 02, 2010

Goodbye my Sun

Goodbye my Sun.

(I don't eat fish and the Douglas Adams quote is overused, anyway.)

I joined Sun Microsystems on March 10, 1997 so I was just about to complete thirteen years at Sun when the acquisition happened. It may have had its highs and lows but all in all it was a wonderful ride at a legendary company. Having been part of Sun was a dream fulfilled.

I'll always remember Sun for the relentless innovation and the highest standards for technical excellence. Sun was a place where engineering reason prevailed over bureaucrats, assumptions were questioned and critical thinking stood over mere paperwork. We may not have marketed it very well but it sure was the best technology on Earth.

I worked on many projects over the years but the longest running and most special one was the Web Server [iPlanet|SunONE|JES|SJS - nobody loved continuous product renaming like Sun!] Always a small team but with the highest passion for producing the best code on Earth, even against all odds. Thanks for the good memories, Web Server team!

All wonderful things come to an end, I suppose, so did Sun and so does this blog now. Hopefully the articles will continue to be here for future reference but if not, I have also made them available on my own site at http://virkki.com/jyri/articles/.

Interesting challenges have fallen my way so it is time to pursue them.

You know where to find me. Keep in touch!


Saturday Mar 27, 2010

Joining the ZFS Revolution

For a long time now I've been meaning to migrate my home file storage over to a ZFS server but the project kept getting postponed due to other priorities. Finally it's alive!

For the last ten years or so my home fileserver has been through the general purpose debian box in the garage. It has three disks, one for the system and home directories, a larger one which gets exported over NFS and the largest one which backs up the other two (nightly rsync). It has been an adequate solution, in as far as I've never lost data. But whenever a disk dies I always have several days of downtime and have to scramble to restore from backups and maybe reinstall.

There are many articles about this topic that make for good reading if you're considering the same. My goals were:

1. Data reliability, above all.
Initially I had visions of maximizing space, mainly for the geek value of having many terabytes of home storage. But in the end, I don't really need that much. The NFS export drive on my debian box was currently only 500GB and that was used not only by the shared data (pictures, mostly, and documents) but also for MythTV storage. Since I wasn't planning on moving the MythTV data to the ZFS pool, even 500GB would be plenty adequate for some time.

2. Low power consumption.
Since this is another server that'll need to run 24/7, I wanted to keep an eye on the power it uses.

3. But useful for general computing.
Since this will be the only permanent (24/7) OpenSolaris box on my home network, I also wanted to be able to use it for general purpose development work and testing whenever needed. So despite the goal of low power consumption, I didn't want to go all out with the lowest possible power setup, needed a compromise.

Here's the final setup:

CPU: AMD Phenom II X4 (quad core) 925. Reasonable power consumption and the quad cores give me something fun to play with.

Memory: 8GB ECC memory. Since I'm going primarily for data reliability, might as well go with ECC.

ZFS pool: 3 x 1TB drives. These are in a mirror setup, so total storage is just 1TB. That's still about three times as much as I really need right now. With three drives, even if two fail before I get to replace them I should be ok. I got each of the three drives from a different manufacturer, hopefully that'll make them fail at different times.

        NAME        STATE     READ WRITE CKSUM
        represa     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c8d0    ONLINE       0     0     0
            c8d1    ONLINE       0     0     0
            c9d0    ONLINE       0     0     0

System disk: I expected to just use one older drive I had on the shelf, but after installing it I found it was running hot. Maybe it is ok but decided to do a two-way mirror of the rpool as well, maybe it'll save me some time down the road. I don't need much space here so found the cheapest drive I could get ($40) to add to the rpool. At that price, might as well mirror!

        NAME         STATE     READ WRITE CKSUM
        rpool        ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c9d1s0   ONLINE       0     0     0
            c10d1s0  ONLINE       0     0     0

Total power consumption for the box hovers around 78-80W most of the time.


Friday Jan 22, 2010

Web Server Updates Available

A Sun Alert on Web Server has been published and updated bits for both 7.0 and 6.1 are already available for download:

As always, but more so for these updates, I urge you to update immediately.


Thursday Jan 21, 2010

Sun!

So here I sit, at the very end of Sun Microsystems.

(oblink to James Gosling's entry)

Who would've thought!

Close to twenty years ago the university received a shiny new SPARCserver/390. Sure we had other hardware from HP (HP-UX, ugh) and IBM (AIX, even worse!) but that 390 with SunOS was special. I cajoled my way into being the sysadmin for that lab mainly so I could get unlimited play time with it.

Later after finishing grad school I ended up elsewhere but Sun was still the coolest company on Earth. I quickly "found" (not by accident) myself with a SPARCstation10 which later became a 20 and so on... Today my 'desktop' is a SunFire server but since it is insanely noisy I keep it in a lab and display through a SunRay in my office.

Inevitably, I later ended up here at Sun (coincidentally, right when Bellcore got acquired) and the engineering culture was as great inside as the products were cool from a customer perspective (as to the management side of the company, the less said the better I suppose). So here we are, at the Sunset of it all. Now, a very red sunrise.

So, what's next for Sun Web Server?


Tuesday Jan 12, 2010

Web Stack and the TLS Vulnerability

Recently I have written to some length about the SSL/TLS renegotiation vulnerability (CVE-2009-3555) from the perspective of our Sun Web Server 7.

What about Web Stack?

Unlike Web Server which uses NSS for its SSL/TLS implementation, the various components in Web Stack use OpenSSL for the same purpose. Therefore, the state of the vulnerability for Web Stack components depend on whether OpenSSL has been updated to prevent the renegotiation attack (mostly).

The key is that Web Stack does not ship a private copy of OpenSSL - it uses the OpenSSL libraries present in the system. So it comes down to whether the system OpenSSL is vulnerable or not.


Unfortunately on Red Hat the patched OpenSSL is not yet available so Web Stack remains vulnerable on that platform.


 


[1] The Solaris 10 patches are documented in the corresponding Sun Alert: http://sunsolve.sun.com/search/document.do?assetkey=1-66-273029-1

[2] In Web Stack 1.6 Apache httpd module mod_ssl has been patched to disable client-initiated renegotiation, offering potential relief. However, this combination is safe only if customer makes sure no server-initiated renegotiation is configured.

Wednesday Dec 09, 2009

Where Did The Time Go?

Back in June I posted my initial entry on email time management with my intention of allocating time windows for email instead of spending all day on that alone. In August I posted some observations and a pie chart of the data so far.

As 2009 winds down, I now have about seven months of data so seems like a good time to revisit the numbers. I'll start with the bottom line, how much time went to what?

So the email timesuck has continued to be the problem it ever was, with 45% of the time between May to December spent on that alone. No matter how I slice it, that's really bad. My observations from the August entry are as valid as then. The relentless email firehose is hard to shut down so keeping it at a reasonable percentage of time is seemingly impossible. But at least segregating email reading into well-defined time windows during the day does help a great deal.

The really sad story in that pie chart is that the two slices which represent Real Work - the fun part of software engineering - add up to less than 10% of the total! The blue slice is Web Stack engineering work (8%) and the green slice is Web Server engineering work ( ploticus obscured the text but it's about 1.5%). So there you have it, about 90% of the time of a product architect/senior engineer at Sun is spent on email and overhead.

Put that way, those numbers are appalling beyond words. Clearly this must change, so I'm going to change it. Starting this week I will cut down email time down to 90-120 minutes a day (I've been doing well over three hours minimum) and start to prioritize Real Work higher on my task list above TPS report-kind of work.

Can this plan succeed? Well I guess you can read my update in a few months to see. I'm looking forward to a much more reasonably balanced pie chart...


Friday Dec 04, 2009

More Thoughts on Web Server 7 and TLS Vulnerability

Please read my article on Web Server 7 and the TLS vulnerability for background and recommendation on this issue.

In this entry I'll add some random thoughts and examples to illustrate the problem. The ideas in this entry are purely experimental.

Is My Web Application Vulnerable?

You may be tempted to wonder that even if the SSL/TLS connection to your Web Server is vulnerable to the renegotiation attack, maybe your web application cannot be exploited?

While technically the answer is "not necessarily", for most real web sites which exist today the answer is usually yes. Unless your web site is firmly entrenched in 1994 (nothing but static content and no processing of user input of any kind), a clever attacker can surely find ways to cause mischief (or worse) using this vulnerability. So I'd like to discourage attempting to talk yourself info believing your site is safe. Instead, upgrade to Web Server 7.0u7.

As one example, shortly after the vulnerability was made public, it was used to grab Twitter passwords.

As noted earlier, at a high level, the attack relies on the MITM attacker being able to interact with your Web Server to pre-establish some state and then trick the legitimate client into executing actions based off that state in their name.

What this means in practice will vary widely, depending on what your web application does and how it processes input.

To answer with certainty whether your web application can be succesfully exploited requires analyzing in detail how the web application handles user input and keeps state so it is not possible to give a universal answer. However, given the complex and often unintended access points available into most web applications, it is safest to assume there are exploitable vulnerabilities unless you can prove otherwise.

A Textbook Worst Case

As one example, consider a simple banking web site subdivided as follows:

/index.html             Welcome page, no authentication required
/marketing/\*            Company info, no authentication required
/clients/info.jsp       Show customer balance info, client-cert auth needed
/clients/payment.jsp    Form to initiate payments, client-cert auth needed
/clients/do-payment.jsp Process payment, client-cert auth needed

The site expects users to enter via the home page, click on a link which takes them to the protected area under /clients at which point the Web Server requires client cert authentication to proceed. Once authenticated, the client can view their account balances or click on the payment page (payment.jsp) which contains a form to enter payment info (dollar amount, recipient, etc). The form will do a POST to do-payment.jsp which actually processes the payment and removes the money from the customer account.

Exploiting the renegotiation vulnerability with this site is trivially easy:

  1. Looking to check their balance, a legitimate client sends request for /clients/info.jsp (the user probably had it bookmarked)
  2. MITM attacker sends a POST to do-payment.jsp with the amount/recipient info of their choosing
  3. Because do-payment.jsp requires client authentication, the Web Server triggers a renegotiation and asks for a client certificate.
  4. The attacker hands off the connection to the legitimate client
  5. Legitimate client provides valid client certificate and renegotiation succeeds
  6. Web Server executes do-payment.jsp with the POST data sent earlier (by the attacker) and returns a transaction confirmation to the legitimate client
  7. User panics! Why did the bank just make a payment from my account to some unknown entity when all I requested was the balance page?!

This is a very real scenario. I have seen customer web applications doing something precisely analogous to what I describe above.

Application Layer Protection

Is it at all possible to take some precautions against the exploit at the application layer?

The renegotiation vulnerability is recent as of this writing and there have not been too many exploits in the wild yet. History teaches us that over time, vulnerabilities will be exploited in ways far more clever than anyone predicted at first. Given that we are barely in the initial stages of the adoption curve (so to speak) of this vulnerability, I'm only prepared to predict that we haven't seen its more devious applications yet.

For completeness, I'll share some thoughts on application layer protections. If you web application is handling anything important (and it probably is, since it is running https), I wouldn't today recommend relying on a purely application layer protection.

Conceptually, your web application should not trust any input it received before the "hand off" renegotiation for the purpose of taking actions after it (i.e. do-payment.jsp should not process the POST data it received before the renegotiation to complete a payment after it).

Unfortunately, while that is easy to say it is impossible to implement! That is because your web application has no way to know that a "hand off" renegotiation occurred. The Web Server itself does not necessarily know either. Remember the renegotiation may occur at any time and it happens directly at the SSL/TLS layer, invisible to both Web Server and application.

How about if we lower our goal and rephrase the guideline: the web application should not trust any input it received before succesful authentication was complete for the purpose of taking actions after it. Since the web application does have access to authentication data (or lack of it), it becomes plausible to implement some defenses based on that knowledge. Is this lowered bar sufficient to protect against all attacks using the renegotiation vulnerability?

Picture a shopping web site with a flow like this:

  1. populate cart with items
  2. click on checkout (assume the site has payment info stored already)
  3. authenticate
  4. order is processed

Here the renegotiation is triggered at step 3, so when the legitimate client logs in they suddenly get an order confirmation screen for something they didn't order.

The flow could be restructured to be:

  1. populate cart with items
  2. click on checkout (assume the site has payment info stored already)
  3. authenticate
  4. present cart contents and order info again for review
  5. if user agrees again, order is processed

Here the legitimate user would enter the flow at step 3 and then see the unexpected order confirmation screen at which point they get a chance to hit cancel.

Do not be overconfident of such ordering. Just because the developer hoped that the request flow is supposed to be
info.jsp -> payment.jsp -> do-payment.jsp
nothing actually prevents the attacker from carefully crafting a request straight to do-payment.jsp. Paranoid enough yet?

Defensive web programmic is a difficult problem, one that many web applications get very wrong. It is a vast topic, now made that much more difficult by the SSL/TLS renegotiation vulnerability. So I'll leave it at that for the moment.

So in closing, I'll just repeat that I'd like to discourage attempting to talk yourself info believing your site is safe. It probably is not. Instead, upgrade to Web Server 7.0u7.


Thursday Dec 03, 2009

Web Server 7 and the TLS renegotiation vulnerability

Web Server 7 and the SSL/TLS Vulnerability (CVE-2009-3555)

The recent SSL/TLS protocol vulnerability has been thoroughly covered in the press. Refer to the above link for the formal vulnerability report and refer to any one of many articles on the web for commentary on it.

While the vulnerability is at the SSL/TLS protocol level and impacts all products which support SSL/TLS renegotiation, this article covers it only from the Web Server angle.

Please keep in mind this is not a bug in Web Server nor a bug in NSS. It is flaw in the SSL/TLS protocol specification itself.

What is the vulnerability?

To quote from the CVE report:

    [The protocol] 'does not properly associate renegotiation
    handshakes with an existing connection, which allows
    man-in-the-middle attackers to insert data into HTTPS sessions,
    and possibly other types of sessions protected by TLS or SSL, by
    sending an unauthenticated request that is processed retroactively
    by a server in a post-renegotiation context, related to a
    "plaintext injection" attack'

In terms of the Web Server, this means that the MITM (man-in-the-middle) attacker may interact with the web application running on the Web Server for a while and later "hand off" the same SSL/TLS session to the legitimate client in such a way that as far as the Web Server is concerned it was same [legitimate] client all along.

This "hand off" occurs when a renegotiation is done on the SSL/TLS connection. Note that renegotiation may be triggered by either the client(attacker) or the Web Server. The protocol is vulnerable either way, regardless of which party triggers the renegotiation (contrary to some popular belief).

A key point is that the vulnerability is at the SSL/TLS protocol level, in other words, at a lower level than the HTTP connection layer. Even if your Web Server is not configured to ever perform renegotiation explicitly, renegotiation can still occur and thus your site can still be vulnerable. There is nothing you can do to configure Web Server (prior to 7.0u7) to disable renegotiation from happening.

This is why you must upgrade to Web Server 7.0u7 (or later).

The rest of this article goes into more detail for the curious, but the bottom line remains that it is time to upgrade to Web Server 7.0u7 (or later).

Is My Web Server Vulnerable?

If you are not using https at all, your site is not vulnerable (of course, if the site is sending and receiving any sensitive data in clear text it is vulnerable to plenty of other problems, just not this one!)

If your Web Server (pre-7.0u7) is configured to use https and it is not configured to require client-auth, it is open to the renegotiation attack, period.

If client-auth is 'required' then that server is not vulnerable. Specifically, you are safe only if the http-listener has this in its configuration:

   <http-listener>
        ...
        <ssl>
            ...
            <client-auth>required</client-auth>
            ...
        </ssl>
        ...
   </http-listener>

When client-auth is 'required' it means that the Web Server will require the client to provide a valid certificate as part of the initial handshake when establishing the SSL/TLS connection. If no valid client certificate is provided at that point, the connection is never established. Because the HTTP-level connection is never created, there is no window of opportunity for the attacker to send data before the client authentication takes place, defeating the attack.

In short, if you are running Web Server 7.0u6 or earlier and using https the only way to remain safe from this attack is to set <client-auth>required</client-auth> on all the <http-listener> elements which use <ssl>.

Unfortunately there is a significant drawback to doing this. Now all the content on your https site requires client-cert authentication. Clients who access the site without a valid certificate not only cannot read even the home page, they also can't even get a useful error page. Because the connection attempt is rejected before it ever gets established at the HTTP level, it is not possible for the Web Server to redirect the client to a helpful error page. Your site is safe but the user experience will most likely not be acceptable.

Web Server 7.0u7 - What's New

Earlier I pointed out that there is nothing you can do to disable renegotiation from occurring. Even if the Web Server is never configured to trigger renegotiation, it can still happen transparently thus it remains vulnerable.

Web Server 7.0u7 includes the latest release of NSS (NSS 3.12.5). The significant change in this release is that SSL/TLS renegotiation is completely disabled. Any attempt to trigger renegotiation (whether initiated by the Web Server itself or by the remote client) will cause the connection to fail.

The good news is that by simply upgrading to Web Server 7.0u7 your site is now automatically safe from this vulnerability.

Whether there are bad news or not depends on whether your site had any legitimate need for renegotiation. If it did not, there is no bad news. Your site is now safe from this vulnerability and everything continues to work as before.

On the other hand if your site did make use of renegotiation, that capability is now broken.

Does My Site Use Renegotiation?

There is no check box anywhere that says the server needs or does not need renegotiation, so until this vulnerability became public you may have not given any though to whether your Web Server configuration is using renegotiation.

The Web Server uses renegotiation when the web application is configured to require a client certificate for some parts of the content but not for all. This permits the client to request the anonymous areas without presenting a client certificate. If the client clicks on a link to a protected area the Web Server then triggers a renegotiation to obtain the client certificate.

There are a couple of ways to configure this in Web Server 7:

  • Using get-client-cert in obj.conf
    If obj.conf contains a fn="get-client-cert" dorequest="1" line, that is going to trigger renegotiation to obtain client certificate under some conditions (depending on where and how in obj.conf it is invoked).
  • From Java Servlets, using the CLIENT-CERT auth-method in web.xml:
            <login-config>
                <auth-method>CLIENT-CERT</auth-method>
            <login-config>
      
    Same as get-client-cert this also triggers a renegotiation to obtain the client certificate only when needed. Refer to the Servlet specification for more info on web.xml.

If the server.xml <client-auth> element is not set to 'required' and your web application uses either of the above mechanisms to trigger the need for a client certificate for some parts of the application, then the Web Server is using renegotiation. This means this functionality will be broken after upgrading to Web Server 7.0u7.

Unfortunately there is no way around this. The current SSL/TLS renegotiation is fundamentally broken so it cannot be used safely.

But I Like It When My Web Site Is Vulnerable To Attacks!

Really?

If you absolutely must have renegotiation support, please reread this document from the top. There is no safe way to enable renegotiation, if you enable it your site is vulnerable.

If despite everything you still feel you must have the broken renegotiation support, it can be done as follows:

Environment variable:  NSS_SSL_ENABLE_RENEGOTIATION
Values: "0" or "Never"     (or just "N" or "n")  is the default setting
           disables ALL renegotiation
        "1" or "Unlimited" (or just "U" or "u")
           re-enables the old type of renegotiation and IS VULNERABLE

If you set NSS_SSL_ENABLE_RENEGOTIATION=1 in the environment from where you start the Web Server 7 instance, renegotiation will work as it did in Web Server 7.0u6 and earlier. Which is to say, you'll be vulnerable to attacks again. Obviously, we never recommend doing this.

Other Possibilities

The current state is very unfortunate. Renegotiation was a useful mechanism for requesting client certificate authentication for only some parts of the web application. Now there is no way to do so safely. As noted earlier this vulnerability is not a bug in the Web Server implementation of SSL/TLS, it is a fundamental flaw in the protocol specification. Therefore there it can only be fixed at the protocol level (see next section). Until that happens there is nothing the Web Server can do to provide a safe implementation so it is a fact of life that renegotiation can no longer be used.

Here is one possibility which may ameliorate the limitation for some sites. It requires some site refactoring work but may offer relief (thanks to Nelson Bolyard of the NSS team for the idea):

Consider refactoring your https content into two separate http-listeners:

  • http-listener ls1: port 443 (standard SSL port), no client-auth
  • http-listener ls2: some other port (say, 2443), client-auth=required

Because you have upgraded to Web Server 7.0u7, listener ls1 is safe because renegotiation is disabled. Listener ls2 is also safe because it has client-auth=required.

Refactor your web application so that whenever a link into a protected area is accessed it it sent to https://example.com:2443/... (where example.com is your site) instead.

This allows clients to access the anonymous content on https://example.com/ and also allows requesting client certificate authentication when needed, on https://example.com:2443/, all while avoiding any use of renegotiation.

If you decide to try this approach feel free to share your experiences on the Web Server forum. Keep in mind that if your Web Server is behind a reverse proxy or a load balancer or other such frontend, you'll need to arrange so the proper ports are reached as needed.

The Future

Work is underway on an enhanced TLS renegotiation protocol which will not be susceptible to the vulnerability. For info refer to: http://tools.ietf.org/html/draft-ietf-tls-renegotiation-01

As soon as the work is complete and a stable implementation is released, a future update of Web Server 7 will contain support for this enhanced renegotiation. Further details on it will be documented at that time.

Keep in mind that both the server and the clients will need to be upgraded in order to communicate via the new protocol. While Web Server 7 will be upgraded as soon as possible and browsers which use NSS (such as Firefox) will likely also be upgraded promptly, there will remain a vast installed base of older browsers which will not be compatible with the enhancements for a long time. Some clients, such as those in embedded devices, may well never be upgraded. Therefore, a full transition to the new renegotiation will take considerable time.


Wednesday Nov 04, 2009

Web Server and TLS/SSL

Well, noticed tonight on twitter that the recent SSL/TLS protocol vulnerability is being mentioned and information has been published here.

As far as our Sun Web Server 7 is concerned, we have been looking at this for a number of weeks now but the timing of the public discussion tonight is a bit of a surprise. I will point you at further information from here once officially available.


Thursday Aug 27, 2009

Endless Night (Take Four)

Is it that time of the year again? I guess so!

Back in February I posted my (sort-of) biannual review of SFW build times so it has been six months. The SFW build continue to chug along towards collapsing under its own weight as I predicted two years ago. I can't claim much in the way of visionary powers for the observation since it is a rather obvious outcome of consolidating all sources into one tree. Unfortunately not obvious enough though since the practice still continues!

To review, refer to my original article on unconsolidating which exposes the problem with the Solaris build concept of building all applications together in one single source tree. I updated the data in June 2008 and later in February 2009.

Aside from the build times nothing has really changed so no other news. If you haven't read the previous articles check them out since I won't repeat the data here.

As of this month, SFW build produces 416 packages, takes 9.7 hours to build and a built workspace takes 19.4GB!

VersionOpenSolarisSolaris 10Red Hat Linux
Web Stack 1.5 / OpenSolaris 2009.06 Safe only when openssl-0.9.8l shows up in release repo Safe, once Sun Alert[1] fixes are installed by user Vulnerable, no relief available
Upcoming Web Stack 1.6 / OpenSolaris 2010.03 Safe (openssl-0.9.8l already in dev repo) Safe, once Sun Alert[1] fixes are installed by user Vulnerable[2], no relief available
2007/12: 158 packages2008/06: 205 packages2009/02: 302 packages2009/08: 416 packages5000 package predictions
2.8 hours3.6 hours7.5 hours9.7 hours88,89,116,124
7.5 GB10 GB12.3 GB19.4 GB203,233,237,244

The time/space predictions for 5000 packages are within the range previously seen (current one in bold in above table: 116 hours and 233GB) so no big surprises.

Well the big surprise is we're still building OpenSolaris applications this way!

For the Web Stack project and components we are now looking into dropping out of SFW since this is unsustainable and is consuming too much of our limited resources. Hopefully my next biannual SFW update will be that there isn't one! ;-) Time will tell...


Tuesday Aug 25, 2009

Request Processing Capacity

Q: How many requests per second can the Web Server handle?

Short answer: It depends.

Long answer: It really depends on many factors.

Ok, ok.. sillyness aside, can we make any ballpark estimates?

The Web Server can be modeled as a queue. By necessity such modeling will be a simplification at best, but it may provide a useful mental model to visualize request processing inside the server.

Let's assume your web application has a fairly constant processing time[1], so we'll model the Web Server as a M/D/c queue where c is the number of worker threads. In this scenario, the Web Server has a maximum sustainable throughput of c / (processing time).

To use some simple numbers, let's say your web app takes 1 second to process a request (that's a very slow web application!). If the Web Server has c=128 worker threads, that means it can indefinitely sustain a max request rate of:

128/1 = 128 requests per second

This makes a lot of sense if we think about it:

  • At t = 0 seconds, 128 request come in and each one is taken by a worker thread, fully utilizing server capacity.
  • At t = 1 second, all those requests complete and responses are sent back to the client and at the same time 128 new requests come in and the cycle repeats.

At this request rate we don't need a connection queue at all[2] because all requests go straight to a worker thread. This also means that at this request rate the response time experienced by the end user is always 1 second.

To expand on that, the response time experienced by the end user is:

end user response time = (connection queue wait time) + (processing time)

Since we're not using the connection queue the end user response time is simply the same as the processing time[3].

So far so good. Now, what happens if the incoming request rate exceeds the maximum sustainable throughput?

  • At t = 10 seconds, 129 requests come in. 128 go straight to worker threads, 1 sits in wait in the connection queue.
  • At t = 11 seconds, 128 requests come in. 128 (the one which was waiting + 127 of the new ones) go straight to worker threads, 1 sits in wait in the connection queue.

The connection queue absorbs the bumps in the incoming request rate, so connections are not dropped and worker threads can remain fully utilized at all times. Notice that now out of every 128 requests, one of them will have a response time of 2 seconds.

So what happens next?

If we go back to receiving a steady 128 requests per second, there will always be one requests in the connection queue.

If at some point we only receive 127 requests (or less), the server can "catch up" and the connection queue goes back to staying empty.

On the other hand, if the incoming request rate remains at 129 per second we're in trouble! Every second the connection queue waiting list will grow longer by one. When it reaches 129 entries, one end user will experience a response time of three seconds, and so on.

And of course, the connection queue is not infinite. If the max connection queue size is 4096 then 4096 seconds later it will fill up and from that point onwards, one incoming request will simply be dropped every second since it has no place to go. At this point the server has reached a steady state. It continues processing requests at the same rate as always (128 per second), it continues accepting 128 of the 129 new requests per second and dropping one. End users are certainly unhappy by now because they are experiencing response times of over 30 seconds (4096 / 128 = 32, so it takes 32 seconds for a new request to work its way through the queue. Almost like going to the DMV...

If the incoming request rate drops below the maximum sustainable rate (here, 128/sec) only then can the server start to catch up and eventually clear the queue.

In summary, while this is certainly a greatly simplified model of the request queue behavior, I hope it helps visualize what goes on as request rates go up and down.

Theory aside, what can you do to tune the web server?

  • The single best thing to do, if possible, is to make the web app respond quicker!
  • If you want to avoid dropped connections at all cost, you can increase the connection queue size. This will delay the point where the server reaches a steady state and starts dropping connections. Whether this is useful really depends on the distribution of the incoming requests. In the example above we've been assuming a very steady incoming rate just above the maximum throughput rate. In such a scenario increasing the connection queue isn't going to help in practice because no matter how large you make it, it will fill up at some point. On the other hand, if the incoming request rate is very bumpy, you can damp it by using a connection queue large enough to avoid dropping connections. However... consider the response times as well. In the example above your end user is already seeing 33 second response times. Increasing the connection queue length will prevent dropped connections but will only make the response times even longer. At some point the user is simply going to give up so increasing the connection queue any further won't help!
  • Another option is to increase the number of worker threads. Whether this will help or hurt depends entirely on the application. If the request processing is CPU bound then it won't help (actually, if it were truly CPU bound, which is rare, then you'll probably benefit from reducing the number of worker threads unless your server has 128+ CPUs/cores...) If the web app spends most of its time just waiting for I/O then increasing the worker threads may help. No set answer here, you need to measure your application under load to see.

[1] In reality the response time can't be deterministic. At best it may be more or less constant up to the point where the server scales linearly but after that the response time is going to increase depending on load. On the flip side, cacheing might make some responses faster than expected. So M/D/c is certainly a simplification.

[2] Not true for several reasons, but it'll do for this simplified model and it helps to visualize it that way.

[3] Plus network transmission times but since we're modeling only the web server internals let's ignore that.


Monday Aug 24, 2009

Web Server 7 Request Limiting Revisited

Coincidentally last week I heard a couple related queries about check-request-limits from different customers. I haven't covered that feature in a while so it's a good time to revisit it for a bit.

To review, Web Server 7 has a feature (function) called check-request-limits which can be used to monitor and limit the request rate and/or concurrency of request which match some criteria. It can be used to address denial of service attacks as well as just to limit request rates to some objects or from some clients for other reasons (for example to reduce bandwidth or cpu usage).

I usually refer to 'matching requests' when speaking of this capability. Matching what? Probably the most common use case is to match the client IP address. This is useful when you wish to limit request rates coming from a given client machine. Here's a basic example of that scenario:

PathCheck fn="check-request-limits" max-rps="10" monitor="$ip"

The common theme to both customer requests I heard last week was whether it is possible to limit requests based on something other than the client IP?

Yes, certainly!

The monitor parameter above is set to "$ip" which expands to the client IP address but you can set it to anything that you prefer. In my introduction to check-request-limits article I gave examples of both "$ip" and "$uri" (and even both combined). You're not restricted to only these though, you can use any of the server variables available in WS7 as the monitor value.

You can also construct more complicated scenarios using the If expressions of Web Server 7. I gave a few examples of that in this article on check-request-limits.

To give a couple more examples, let's say your web server is behind a proxy and this the client $ip is always the same (the proxy IP). Clearly monitoring the $ip value isn't terribly useful in that case. Depending on how your application works you may be able to find other useful entries to monitor. For example if the requests contain a custom header named "Usernum" which contains a unique user number, you could monitor that:

PathCheck fn="check-request-limits" max-rps="1" monitor="$headers{'usernum'}"

Or maybe there's a cookie named customer which can serve as the monitor key:

PathCheck fn="check-request-limits" max-rps="1" monitor="$cookie{'customer'}" 

These two are made-up examples, you'll need to pick a monitor value which is suitable for your application. But I hope these ideas will help you get started.

By the way check-request-limits can also be used to limit concurrency.


Tuesday Aug 18, 2009

Time Allocation

Last week I wrote about the time spent dealing with email. While writing that entry I though it'd be nice to also visualize where all the time went, not just how much was spent on email. So tonight I went over the data to generate the following pie chart, showing relative allocation of working hours from early May until today, split into high level categories:

The 'Email' slice is self evident.. 'ARC' is the time I've spent in my role in Sun's Architecture Review Committee. 'Communications' includes conferences, presentation, blog entries, articles and other related work. 'Administrivia' is a catch-all category for all kinds of mindless unproductive overhead. Finally, 'Engineering' represents the time spent doing "real work".

About the only thing I can add is that this is about as concise a representation as we can get on why very large companies have trouble competing with agile startups. Part of my goal in this exercise is to find ways to grow that nice blue pie slice, but I realize there's a limit to what can be achieved in this environment. All those TPS\^H\^H\^HPTL reports needs to be filed, after all.


Wednesday Aug 12, 2009

Let Me Check My Email

About two months ago I posted on attempting to keep email in check so it's a good time to review some statistics and results...

The following graph shows the percentage of time I spent reading email each day:

The average over the past three months is about 45% Wow.. So over the last quarter I've spent just under half of all working hours reading (and answering) email. No wonder it is hard to get concrete work done!

This is somewhat higher than the 37.5% (three hours a day out of eight) that I had predicted in the previous article a couple months ago. This is largely explained due to the recent release of Web Stack 1.5. Due to the impending release I found myself having to check email more often than scheduled to keep on top of last minute pre-release activities.

A few points worth noting out of the experiment so far...

  • It is not easy to limit email activity to the scheduled two or three hours a day. Ideally the graph above should be mostly flat. While part of this is inevitably due to the release activities, I'll try harder going forward to stick to the scheduled email times.
  • While the total times may have fluctuated more than I wanted, I did (mostly) manage to contain my email activities to bounded windows of time within the day, instead of checking emails every three minutes all day long. This has helped a great deal. Even while spending nearly half my hours on email, I've managed to get many other non-email tasks done more productively than before. This part has been a success and I highly recommend it. Shut down that email client!
  • I found myself doing three (or even four) email sessions per day. This is too many. I need to more strictly limit myself to reading email only twice a day, at the beginning and end of the day. If these sessions need to be longer it is better to make them longer but stick with only two. Whenever I started inserting email tasks in the middle of the day, it fragmented my concentration too much, making the day less productive.
  • I'm convinced the ideal arrangement is to do one single email session per day, at the end of the day. That way all the concentration disruption occurs after the days work is done, so it does no harm. The end of the day is also a good time to be entering new tasks into the to-do list so they'll be there tomorrow. Given our distributed time zones it is difficult to do only one email session per day, but that would be ideal. Maybe I'll try that at some point.

As a longer term goal I need to think of ways of reducing the time spent on email. Not sure how to do that yet but spending 45% or even "only" 37% of all working hours on email is totally insane. I suppose email overload is inevitable at a large company with tens of thousands of employees (all of whom, it seems at times, are emailing me) but there has to be a better way. I suppose I could cap my email time to an hour a day and let whatever goes unread just go unread. I'm sure people will be unhappy but will that unhappiness be greater than my productivity gain at doing real work? It's all about tradeoffs, after all. Hard to say what's worse.


Tuesday Aug 11, 2009

What's Taking So Long

While Sun's Web Server has a very nice threading model, once a worker thread is processing a specific request it will continue working on that request even if it takes a while or blocks.

This is rarely an issue. Static content is served very quickly and code which generates dynamic application content needs to be written so it responds promptly. If the application code takes a long time to generate response data the site has more problems than one, so the web application developers have a motivation to keep it snappy.

But what if you do have a bad application which occasionally does take a long time? As requests come in and worker thread go off to process them, each long-running request ties up another worker thread. If requests are coming in faster than the application code can process them, eventually the Web Server will have all its worker threads busy on existing connections.

As you can infer from Basant's blog entry, the server will still continue accepting new connections because the acceptor thread(s) are separate from the worker threads, so it is still accepting new connections. But there won't be any spare worker threads to take that new connection from the connection queue.

If you're the client making the request, you'll experience the server accepting your request but it won't answer for a (possibly long) while. Specifically, until one of the previous long-running requests finally completes and a worker thread frees up to take on your request (and of course, there may be many other pending requests piled up).

If this is happening with your application one option is to check the perfdump output and see which requests are taking a while. But, as these things are bound to do, it'll probably happen sporadically and never when you're watching.

So how can we easily gather a bit more info? It's been said countless times but always worth repeating.. dtrace really is the greatest thing since sliced bread (and I like bread). I can't imagine attempting to maintain a system without dtrace in this day and age, it would be limiting beyond belief! One of the many key benefits is being able to gather arbitrary data right from the production machine without any prior preparation (such as producing debug builds) or downtime or even any access to the sources you're diagnosing.

So in that spirit, I tried to gather a bit more data about the requests which appear to be taking a while using dtrace and without attempting to look at what the code is actually doing (well, also because I only had fairly limited time to dedicate to this experiment so didn't want to go looking at the code ;-). Although, I should mention, since Sun's Web Server is open source you certainly could go review the source code if you wish to know more detail.

So what am I looking for? Basically I'd like to know when the worker thread starts on a request and when it is done with it. If the time between those two grows "too long", I'd like to see what's going on. Sounds simple enough. Searching around a bit I saw Basant's article on dtrace and Web Server so using his pid$1::flex_log:entry as an exit point seems like a suitable thing to try. I didn't find (on a superficial search, anyway) a mention of an adequate entry point so instead I took a number of pstack snapshots and looked for something useful there and wound up selecting "pid$1::__1cLHttpRequestNHandleRequest6MpnGnetbuf_I_i_:entry" (ugly mangled C++ function name). With that, ran the following dtrace script on the Web Server process:

% cat log.d
#!/usr/sbin/dtrace -qs

pid$1::__1cLHttpRequestNHandleRequest6MpnGnetbuf_I_i_:entry
{
  self->begin = timestamp;
  printf("ENTER %d, %d to n\\n", tid, self->begin);
}

pid$1::flex_log:entry
/self->begin/
{
  self->end = timestamp;
  printf("DONE %d, %d to %d\\n", tid, self->begin, self->end);
  self->begin = 0;
}

This gets me entry/exit tick marks as the threads work their way through requests. On a mostly unloaded server it's easy enough to just watch that output, but then you're probably not experiencing this problem on an unloaded server. So we need a little bit of helper code to track things for us. Twenty minutes of perl later, I have

#!/usr/bin/perl

$PATIENCE = 9;                  # seconds - how long until complains start

$pid = shift @ARGV;
$now = 0;
$npat = $PATIENCE \* 1000000000;

open(DD, "./log.d $pid |");
while (<DD>)
{
    chomp;
    ($op, $tid, $t1, $t2) = /(\\S\*) (\\d\*), (\\d\*) to (.\*)/;
    if ($t1 > $now) { $now = $t1; }

    # dtrace output can be out of order so include start time in hash key
    $key = "$tid:$t1";          

    if ($op eq "ENTER") {
        if ($pending{$key} != -1) {
            $pending{$key} = $t1 + $npat; # value is deadline time
        }

    } else {
        $took = (($t2 - $t1) / 1000000000);
        if (!$pending{$key}) {
            $pending{$key} = -1; # if DONE seen before ENTER, just ignore it
        } else {
            delete $pending{$key};
        }
    }

    # Once a second, review which threads have been working too long
    # and do a pstack on those.
    # 
    # Note: we only reach here after processing each line of log.d output
    # so if there isn't any more log.d output activity we'll never get here.
    # A more robust implementation is left as an exercise to the reader.
    #
    if ($now > $nextlook) {
        $c = 0;
        foreach $k (keys %pending)
        {
            if ($pending{$k} != -1 && $pending{$k} < $now) {
                ($tid, $started) = $k =~ /(\\d\*):(\\d\*)/;
                $pastdue = ($now - $started) / 1000000000;
                print "=================================================\\n";
                system("date");
                print "Thread $tid has been at it $pastdue seconds\\n";
                system("pstack $pid/$tid");
                $c++;
            }
        }
        if ($c) { print "\\n"; }
        $nextlook = $now + 1000000000;
    }
    
}

The perl code keeps track of the ENTER/DONE ticks (which may occasionally be out of order) and if too long (more than $PATIENCE) goes by, gives you pstack output what's going on.

I don't actually have a suitably misbehaving application so I'll leave it at that. If I had a real application issue, it'd be useful to fine tune the dtrace script to key off of more specific entry and exit points and it'd also be useful to trigger more app-specific data gathering instead of (or in addition to) the pstack call (for instance, checking database availability if you suspect a database response problem, or whatever is suitable for your concrete application).

dtrace is like lego blocks, there's a thousand and one ways of coming up with something similar. Care to try an alternative or more efficient approach? Please share it in the Web Server forum!


About

jyri

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today