Tuesday Dec 22, 2009

Ruby and Lighttpd updates in OpenSolaris

As of build 129 of OpenSolaris Ruby, Lighttpd and RubyGems have been updated to the following versions:

  • Ruby 1.8.7 patch level 174
  • RubyGems 1.3.5 
  • Lighttpd 1.4.23

Lighttpd is a couple of revs behind, when we started the update 1.4.24 had just been released and had a couple of issues which we would have needed to patch. At that time darix and stbeuhler weren't sure if they would release a 1.4.25 to fix theses issues so we took an executive decision to go with 1.4.23 and update to a later version via the OpenSolaris /dev repository after the release of OpenSolaris 2010.03. 1.4.25 made it out before we integrated but given the processes we have to ensure component and build quality it's not a good idea to change versions at the 11th hour. We'll probably push Lighttpd 1.4.25 out via the /webstack repo.

These same version updates are going into WebStack 1.6 which will be available around the same time as OpenSolaris 2010.03

Tuesday Jan 20, 2009

Lighttpd and Olio Rails

We were trying to use Lighttpd to run the Apache Olio Rails application on OpenSolaris recently and we found that because the Lighttpd workers run as a non-root user (in this case as webservd), the image_science gem was unable to access the shared library built for it by RubyInline. The error that we saw was:

ActionView::TemplateError (Permission denied - /root/.ruby_inline) on line #10 of events/_filtered_events.html.erb (although the exact error varies depending on whether you are looking at the error page returned to the browser or the logfile). We knew from some of the problems that we had with getting image_science up and running on OpenSolaris that RubyInline defaulted to building libraries in the root users home directory, but up until then we had been using Mongrel and Thin and running them as root (which is food for thought). 

The fix is simple, RubyInline defaults to building libraries in $HOME/.ruby_inline unless the environment variable $INLINEDIR is set in which case it builds them in $INLINEDIR/.ruby_inline. You can pass environment variables on to the FastCGI processes that Lighttpd spawns by setting them in the fastcgi.server directive in the Lighttpd config file. An example of this is the one from our rig:

fastcgi.server =  ( ".fcgi" =>
                    ( "localhost" =>
                      ( "min-procs" => 1,
                        "max-procs" => 5,
                        "socket" => "/tmp/ruby-olioapp.fastcgi",
                        "bin-path" => "/export/faban/olio_rails/olioapp/public/dispatch.fcgi",
                        "bin-environment" => (
                           "RAILS_ENV" => "production",
                           "INLINEDIR" => "/export/faban/olio_rails/olioapp/tmp"
                        )  
                      )
                    )
                  )

I've included the whole thing as it's sometimes tough to see the nesting of the options. Basically, if you don't have a 'bin-environment' section add one after 'bin-path' (watch for the commas).

With this config file, RubyInline will build (rebuild in this case) the libraries of the gems that make use of it in /export/faban/olio_rails/olioapp/tmp/.ruby_inline so as long as the user that Lighttpd is running it's worker processes as has access to that directory you should be good to go.

BTW: In case you are wondering, image_science is a native Ruby Gem that can resize images and create thumbnails, but instead of being built on install, it's built and managed by the RubyInline gem when you first go to use it.



Tuesday Oct 14, 2008

Lighttpd on CMT running Web 2.0 workloads

Earlier in the year I spent many a night burning candles at both ends testing with a Web 2.0 workload on a Lighttpd/PHP/Memcached/MySQL stack. The results have been made into a Sun Blueprint entitled An Open Source Web Solution - Lighttpd Web Server and Chip Multithreading Technology The bottom line is that we were able to get 416 ops/sec with 2250 users before we ran out of network bandwidth. That was with a 5 second cycle time between operations, so 450 ops/sec would have been the best we could theoretically get. This was all done on a Sun SPARC Enterprise T5120 running with 8 cores (64 threads) at 1.2GHz. In reflection I wish I'd spent more time on Memcached and MySQL analysis, but we wasted a lot of cycles trying to team a couple of network cards using a switch that was not up to the job and we just ran out of time.

The keys to all of the testing were the Faban test harness which not only ran all of the tests for us but collected and presented most of the data used in the Blueprint Document, and the Web 2.0 kit which is now in incubation on Apache.org as Project Olio (http://incubator.apache.org/projects/olio.html)

Friday Oct 03, 2008

Lighttpd in OpenSolaris

Lighttpd went into the SFW consolidation 6 weeks ago and made it into build 97 of Solaris Nevada (which sees the light of day as Solaris Express Community Edition). It appeared in the OpenSolaris package repository in the first week of September and you can install Lighttpd using the following command:

pfexec pkg install SUNWlighttpd14

if it doesn't find it try running: pfexec pkg refresh and then try it again.

BTW: You don't need to use the pfexec part of the commands if you are logged in as root

I've not blogged about this before because I tried to be clever with dependencies and built the Lighttpd MySQL vhost module against the MySQL 5.0 version that's intergrated into SFW but I deliberately didn't call out the dependency in the packages. This should have meant that when you installed Lighttpd, it wouldn't pull down MySQL, but once you had installed the MySQL packages (if you chose to) then MySQL vhost support would just work. The missing library dependency was spotted by the Gatekeeper and he added the MySQL dependency to the packages by hand. So now when you install Lighttpd, you get MySQL too. Sorry about that.

We have a CR open for this and in the next couple of weeks we'll integrate a fix that packages MySQL support separately. If in the meantime I come up with a way of bypassing the dependency check I'll blog it.

In the meantime, give it a try and if you find anything that doesn't work the way it should or if there's some change you'd like to suggest, reply to this blog entry or join the webstack-discuss alias at http://opensolaris.org/os/discussions/ and send us an email. 

BTW: We don't have lua support at the moment so the likes of mod_magnet won't work, we are hoping to fix this in the near future


Thursday May 29, 2008

Lighttpd and temporary files

I think that this blog is turning into a list of my shortcomings, but these kind of silly problems need to be stamped out and you can only do that by documenting them. 

I was testing our Web 2.0 application on Lighttpd/PHP/MySQL and my throughput tailed off at around 800 users, I did all of the usuals, tuned MySQL a little, enabled XCache, used the private interfaces instead of the ones hooked up to our internal network. Still I was seeing the same tail off - I was getting 160 ops/sec for 900 and 1000 users and < 160 ops/sec for > 1000 users.

As I was just quickly ramping this test up I wasn't monitoring iostat during the run, so I enabled it (Faban allows you to specify a whole bunch of tools that you can run on each system in the rig) and ran it again. iostat showed that the root disk was 87% busy on average with avg. service times at around 80ms which is way too high, particularly for a disk that wasn't doing anything. I realised then that either PHP or Lighttpd were writing file upload data temporarily to /var/tmp, the default for both of them. It was Lighttpd in fact, I just added the line (which I really thought I had already done):

server.upload-dirs     = ( "/tmp")

to the Lighttpd config file and restarted.  On Solaris, /tmp is an in-memory file system so you can get real benefits from writing temporary data there over writing to /var/tmp (which by default is on a filesystem). Just make sure that you don't run out of memory or you'll start swapping (writing memory pages to disk, probably your applications pages) and don't use /tmp for ordinary files. UFS and ZFS both have caches which will benifit read/write access to existing files and all of the data in /tmp is lost when you reboot.

The config file option for PHP that causes it to store file upload data to /tmp is:

upload_tmp_dir = /tmp

You can also cause session data to be saved to /tmp with:

session.save_path = "/tmp"


 


 


Wednesday Dec 05, 2007

Lighttpd SMF troubles

We came across an issue recently when running Lighttpd with /dev/poll on Solaris under SMF. You would start the service and immediately the CPU would peg at 100% and the Lighttpd error log would fill up with the message "(server.c.1429) fdevent_poll failed: Invalid argument". 

 SMF (The Solaris Service Management Facility) allows the deployer of a service to specify which user and group the processes that belong to the service should run under. In this case Lighttpd was being started as user webservd with group webservd. This would be similar to logging on to a system as webservd and then running the lighttpd executable. When we did exactly that we saw the same problem as we did when running under SMF. If we started Lighttpd as root with the same config file it ran fine and no errors were logged. So the problem came down to starting Lighttpd as webservd with /dev/poll specified as the event handler in the Lighttpd config file.

The workaround is to start Lighttpd as root and specify the user name and group for Lighttpd to run under through the Lighttpd config file. This is fairly standard practice for starting both Lighttpd and Apache. If you've run into this problem then it's maybe because you've somehow obtained a Service Manifest file that specifies "webservd" as the user and group. The easy way to modify the service so that Lighttpd is started as root is to create a copy of the current manifest and in the copy remove the entire <method_credential> that you'll see here:

...
...

<exec_method
  type='method'
  name='start'
  exec='/opt/coolstack/lib/svc/method/svc-csklighttpd start'
  timeout_seconds='60'>
  <method_context>
    <method_credential
      user='webservd' group='webservd'
      privileges='basic,!proc_session,!proc_info,!file_link_any,net_privaddr' />
  </method_context>
</exec_method>
...
...

 
You can leave the <method_context> and </method_context> tags with nothing between or you can delete the closing tag and use an empty tag i.e.: <method_context /> Just don't remove it as it's a useful marker. The above snippet is from an example that I saw when I first came across this issue, yours maybe different but in which case hopefully you wrote it and understand how to change it.

What you are left with is:

...
...
<exec_method
  type='method'
  name='start'
  exec='/opt/coolstack/lib/svc/method/svc-csklighttpd start'
  timeout_seconds='60'>
  <method_context />
</exec_method>
...
...


Once you've changed the copy of the manifest, import it using svccfg as follows:

svccfg -v import <manifest filename>

This will take a snapshot of the current state of the service and name it previous then delete all of the entries that you removed from the copy of the manifest. They will be named start/group, start/user, start/privileges plus a few others that would have been set to their default values. It will then take another snapshot of the service and call it last-import. Finally it will "refresh" the service, which means pushing out the changes to the running service. If the Lighttpd service was running it will probably go to the state called "Maintenance" at this point. It's best to disable and enable the service after a refresh (see the man page for svcadm) so you should do that now. Lighttpd should then be running correctly.

I'll post some example Manifests on another blog entry.

Root Cause

It turns out that when Solaris 10 came along, this same problem was seen when using /dev/poll and when starting Lighttpd as root . Lighttpd is written such that it bases it's maximum number of connections on the number of File Descriptors available to the process, the result is that all of the available File Descriptors are locked away for use when creating connections and none are left  for /dev/poll to use and therefore every call to /dev/poll results in an error. A more detailed discussion is available on this thread on the Sun forums. A workaround was added to Lighttpd that effectively sets the max connections to 10 less that the max File descriptors. Unfortunately it only works for the root user as the number of connections for a non-root user is set in a different code path. See Lighttpd ticket 1465. We are working on getting a workaround added for non-root users so watch this space.

Oh, and also, if the process has it's max File Descriptors set to say 65535 and you specify server.max-fds = 1000 in the Lighttpd config file, Lighttpd will reset the max number of File Descriptors available to it to 1000. So you can't get around the problem simply by specifying a lower number for server.max-fds than what should be available to the process (according to ulimit -n in the shell from which you start Lighttpd).

About

Bloggity, blog

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today