Thursday Jul 09, 2009

Rack and content_type for multipart requests

I wrote this entry a while back but never posted it. Seems a shame to waste it.

 While updating the Apache Olio app to do caching we found that on the new test rig, adding events and people (it's a Social networking app) would fail. We had moved to the new rig because our time on the lab systems had run out, but fortunately we diligently left an overlap and were able to go back and look at the Rails stack to see what had changed. The problem that we were seeing gave us a clue as all of the parameters in the  HTTP POST request that contained the person or the event details looked like this:

Parameters: {"address"=>{"city"=>#<File:/tmp/RackMultipart4952-35>, "zip"=>#<File:/tmp/RackMultipart4952-37>, "country"=>#<File:/tmp/RackMultipart4952-38>, "street1"=>#<File:/tmp/RackMultipart4952-33>, "street2"=>#<File:/tmp/RackMultipart4952-34>, "state"=>#<File:/tmp/RackMultipart4952-36>}, "commit"=>#<File:/tmp/RackMultipart4952-20>, "event_image"=>#<File:/tmp/RackMultipart4952-30>, "event_document"=>#<File:/tmp/RackMultipart4952-31>, "authenticity_token"=>#<File:/tmp/RackMultipart4952-39>, "event"=>{"title"=>#<File:/tmp/RackMultipart4952-21>, "event_timestamp(1i)"=>#<File:/tmp/RackMultipart4952-25>, "event_timestamp(2i)"=>#<File:/tmp/RackMultipart4952-26>, "event_timestamp(3i)"=>#<File:/tmp/RackMultipart4952-27>, "telephone"=>#<File:/tmp/RackMultipart4952-24>, "description"=>#<File:/tmp/RackMultipart4952-23>, "summary"=>#<File:/tmp/RackMultipart4952-22>, "event_timestamp(4i)"=>#<File:/tmp/RackMultipart4952-28>, "event_timestamp(5i)"=>#<File:/tmp/RackMultipart4952-29>}, "tag_list"=>#<File:/tmp/RackMultipart4952-32>}

instead of like this:

Parameters: {"address"=>{"city"=>"aaynaiuotrtgs", "zip"=>"81602", "country"=>"USA", "street1"=>"49857 Pk Ln", "street2"=>"", "state"=>"BC"}, "commit"=>"Create", "event_image"=>#<File:/tmp/RackMultipart4833-2>, "event_document"=>#<File:/tmp/RackMultipart4833-3>, "authenticity_token"=>"VdNDLR/dCCJe96Ua3zEC9ZOwPg2DxujQ5D6pxI9E0ws=", "event"=>{"title"=>"aa rygrtokldq t ", "event_timestamp(1i)"=>"2008", "event_timestamp(2i)"=>"10", "event_timestamp(3i)"=>"20", "telephone"=>"0014879641640", "description"=>"kw sjnieb vui fslzpn jokjw xjijsm jzeweyio dthti vckudre osoempc jurldvyi adusy twghtlzwluh cowiczskxg wql ctulke km yxtuost enixrl qv to ltszeriord lpxrlp cokjtrehwc mbrnchxh fdnxwie x nuuzpvvv pqlwqghg thwtgc svuzbnzdokgv iqwsrvokviuw l z gnr trkmc aspwbgckozcg so jq dcjxl vluosk dypk rkhg iseurrximrvk qnepyyzxu iugxbgmvcui mahnpibcoa wbhvplqym ogompcsikpz engr ugipr uvj w duk dqefcurj zoztkh ", "summary"=>"x c ztsg ncccoca e dspe azhzwvcz blfdtdllh zpbothd gctqotpln eunpoudzboef fcbzcstxh ", "event_timestamp(4i)"=>"20", "event_timestamp(5i)"=>"10"}, "tag_list"=>"tag1"}

So all of the parameters in the request were being treated as file uploads and it very much looked like Rack might be the cause. We had been using Thin as our Rails runtime and a check on the two systems showed that the old system had Thin 1.0.0 and Rack 0.9.1 and the new system had Thin 1.2.2 and Rack 1.0.0. Going back to the older versions fixed the issue.

Rack processes multipart form data with a couple of passes through the class method parse_multipart(env) (rack/utils.rb) where env is a wrapper around the request (as StringIO). The first pass processes the StringIO and extracts the form data and it's parts. To determine if a part is a file upload it used to run the following check:

filename = head[/Content-Disposition:.\* filename="?([\^\\";]\*)"?/ni, 1]
if filename
  body = Tempfile.new("RackMultipart")
  body.binmode  if body.respond_to?(:binmode)
end

Which basically locates lines like the following in the form data:

    Encapsulated multipart part:  (image/jpeg)
        Content-Disposition: form-data; name="event_image"; filename="event.jpg"\\r\\n

In Rack 1.0.0 the conditional changed to:

if content_type || filename
  body = Tempfile.new("RackMultipart")
  body.binmode  if body.respond_to?(:binmode)
end

The check for content_type had always been there and I won't list the code, but needless to say the conditional was now: if either content_type or filename (or both) are set then treat this part as a file upload.

In the Apache Olio Rails Driver (the code that drives load to the App) we have to assemble POST requests by hand and the code is all based on Apache HttpClient 2 (we use 3 now but the same code using deprecated methods in 3). What we had been doing to add text params to the POST request up to this point was:

MultipartPostMethod post = new MultipartPostMethod(addEventResultURL)
post.addParameter("event[title]", <randomly generated String data>);

This had the unfortunate effect of adding a content_type to the form-data with the result that the request looked like this:

     Encapsulated multipart part:  (text/plain)
        Content-Disposition: form-data; name="event[title]"\\r\\n
        Content-Type: text/plain; charset=US-ASCII\\r\\n
        Content-Transfer-Encoding: 8bit\\r\\n\\r\\n
        Line-based text data: text/plain

We modified the code to use a StringPart and addPart() instead of addParameter():

StringPart tmpPart = new StringPart("event[title]", <randomly generated String data>);

post.addPart(tmpPart);

and we also had to explicitly set content_type to NULL on the new part:

tmpPart.setContentType(null);

and the form data in the request now looks like this:

Encapsulated multipart part: 
        Content-Disposition: form-data; name="event[title]"\\r\\n
        Content-Transfer-Encoding: 8bit\\r\\n\\r\\n
        Data (19 bytes)
 
  

Thursday May 14, 2009

Apache Olio: Web 2.0 toolkit for Java EE

This week saw the Apache Olio project release the code for it's Java EE version adding to the versions already available for PHP and Rails. 

If you know Apache Olio and want to know more about the specifics of the Java EE version then I'll cover them first, if you want to know more about Apache Olio in general, read on.

To run the Java EE version of Olio, you'll need:

  • Java SE 5 or 6
  • Java EE 5 compliant application server (tested so far with GlassFish v2)
  • A Java Persistence API (JPA) provider (Eclipselink is the JPA provider packaged with GlassFish v2)
  • MySQL Database (any DB could be used but we have scripts and instructions for MySQL)

Some of the technologies that the Java EE version features:

  • JPA for the Object-Relational persistence solution
  • AJAX 
  • Rest based Services (partially implemented)
  • JMaki widgets wrappers for Yahoo and dojo widgets

In planning are the following changes/features:

  • Re-implementation of the memcached layer for caching (this was stripped out for this release but needs to be put back)
  • Rest based services with JSR-311, JAX-RS.  I've started this already using the Jersey implementation.
  • Replacement of the jMaki widgets with appropriate alternative
  • Minor features to 'catch up' with the PHP and JRuby version.
  • Investigation of file distribution system eg. Hadoop (current implementation only uses local filesystem)

If you want to get involved then visit our page at  http://incubator.apache.org/olio/

You can contibute patches, submit bugs or RFEs or just generally tell us what components you have successfully used the app with.

What is Apache Olio?

Apache Olio is a Web 2.0 toolkit, basically it's a Web 2.0 application and a load generator. You deploy the application to a configuration that you want to test, fire up the load generator, drive load to the application and then analyze the results. The application isn't that fussy about what it runs on, for the Java EE app you need a Java EE Web container (Glassfish or Tomcat for example), you also need a Database and a schema is provided that can be used to set that up. You need a filestore and you need a Web Server to act as a remote Web Service (for looking up geolocations). 

Apache Olio uses Faban to drive load along with a custom Faban driver. Faban is a benchmark driver framework and harness that is designed to allow you to model the usage of your application and drive load for 1000s of simulated users. It also can be used to manage the runtime environment and it gathers the results from test runs.

Once you've deployed the application you can load it up with dummy users and events (it's a Social Networking app) and use the driver to simulate load. At the end of a test run, you get all of the data from the run presented to you in graphical form (depending on the platform). I spend a vast amount of time using Olio and Faban and can't recommend them enough.

Kim is the lead developer of the Java EE version of Apache Olio and he has a blog entry that goes into lots of detail on how Apache Olio Java EE works and what it looks like.



Tuesday Jan 20, 2009

Lighttpd and Olio Rails

We were trying to use Lighttpd to run the Apache Olio Rails application on OpenSolaris recently and we found that because the Lighttpd workers run as a non-root user (in this case as webservd), the image_science gem was unable to access the shared library built for it by RubyInline. The error that we saw was:

ActionView::TemplateError (Permission denied - /root/.ruby_inline) on line #10 of events/_filtered_events.html.erb (although the exact error varies depending on whether you are looking at the error page returned to the browser or the logfile). We knew from some of the problems that we had with getting image_science up and running on OpenSolaris that RubyInline defaulted to building libraries in the root users home directory, but up until then we had been using Mongrel and Thin and running them as root (which is food for thought). 

The fix is simple, RubyInline defaults to building libraries in $HOME/.ruby_inline unless the environment variable $INLINEDIR is set in which case it builds them in $INLINEDIR/.ruby_inline. You can pass environment variables on to the FastCGI processes that Lighttpd spawns by setting them in the fastcgi.server directive in the Lighttpd config file. An example of this is the one from our rig:

fastcgi.server =  ( ".fcgi" =>
                    ( "localhost" =>
                      ( "min-procs" => 1,
                        "max-procs" => 5,
                        "socket" => "/tmp/ruby-olioapp.fastcgi",
                        "bin-path" => "/export/faban/olio_rails/olioapp/public/dispatch.fcgi",
                        "bin-environment" => (
                           "RAILS_ENV" => "production",
                           "INLINEDIR" => "/export/faban/olio_rails/olioapp/tmp"
                        )  
                      )
                    )
                  )

I've included the whole thing as it's sometimes tough to see the nesting of the options. Basically, if you don't have a 'bin-environment' section add one after 'bin-path' (watch for the commas).

With this config file, RubyInline will build (rebuild in this case) the libraries of the gems that make use of it in /export/faban/olio_rails/olioapp/tmp/.ruby_inline so as long as the user that Lighttpd is running it's worker processes as has access to that directory you should be good to go.

BTW: In case you are wondering, image_science is a native Ruby Gem that can resize images and create thumbnails, but instead of being built on install, it's built and managed by the RubyInline gem when you first go to use it.



About

Bloggity, blog

Search

Archives
« July 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
   
       
Today