Wednesday Oct 07, 2009

Last day at Sun

I really enjoyed my tenure at Sun, but I am leaving to go work at NASA on MCT. You can find my new blog, once I get some time to get things going. You can find me (Chris Webster) through linked in

Tuesday Sep 08, 2009

Profiling an ANT process

I had an opportunity recently to take a look at what was causing a significant increase in the time required for automated builds. The build system is based on Ant so what I wanted to see was a list of the top targets and sub tasks, so I could easily find the performance hotspots. I was hoping to use a performance flag that would generate this information but I didn't see anything readily accessible that would give me the information I needed. 

After looking around the ant documentation, I found the BuildListener interface which provides notifications of build events (target started, target completed ...). What I wanted to do was to implement this interface to capture the starting and ending of tasks and targets so that I could generate an ordered list of the targets consuming the most time. I created a TimingBuildListener class that does the following:

  • keep a list of the x targets that take the most time. Each target maintains the list of tasks (and their times) so that decomposition within a target is possible.
  • manage the target lists according to the current stack of running targets.
  • provide thread safety to work with the parallel task. There isn't as much documentation on how events are generated so this code was written empirically based on how Ant 1.7.1 is implemented and works for the intended project, so you may have different results. 

Once I had this code, I got this working on a simple build file (adding a build listener only required adding the fully qualified classname of the listener using the -listener command line switch as well as the lib switch to specify the appropriate classpath for the class) and progressed to profiling the full build.

The profiling information identified a problem with the way we were compressing css and html files which was taking more than 50% of the build time (that includes unit tests), so we now get continuous build results in half the time :)

Monday Jun 01, 2009

Community One presentation on zembly architecture

Girish and I presented a session at community one on What you need to know about creating and running a scalable web site based on our experience on zembly. Here are the slides. Community one slides are available here.

Tuesday Apr 28, 2009

The Agile Development Process used to create Zembly

I recently saw a blog about agile development where the author was doing a production deployment with every commit that successfully passed the unit tests, so I thought I would share the development process we used to creating Zembly.


Each planning cycle covers a single three week development cycle (sprints for those familiar with the lingo). Planning is done both top down and bottom up, that is features and tasks come from both strategic features as well as infrastructure related requirements. The input from everyone on the extended team is a categorized set of tasks and features in our bug tracking system (Jira). The planning meeting serves to further categorize this list of tasks to committed and target features (committed features are expected to be before the end of the sprint, where target features will be delivered if there is time).

While the above sound straightforward, there are several things to be aware of:

  • The issues must be clear and should have an accurate time estimate of the work. Having one line descriptions with no time estimate makes is difficult to determine the priority and importance of the task. The time estimate makes it possible to determine the load on each person (tools can help a lot at this point, specifically with dependencies and total time).
  • Make sure to include things like blogging, demos, presentations. Not including these will result in leftover tasks at the end of the sprint.
  • Bugs or time for bug fixing also needs to be included. What we have done is to work with higher level tasks (something like bug fixing) where each engineer can commit time for bug fixing and determine the bugs to fix. Trying to prioritize the various bugs proved to time consuming, but considering the time impact is important.  
  • Try to avoid adding additional work during the planning meeting, this meeting will become extremely long with feature debates anyway so trying to specify a feature enough to get a time estimate will be difficult.
  • Input to the planning is open to everyone, but the prioritization done during the planning meeting should be open to a few. This will make reaching consensus faster and also reduce the time everyone spends in meetings. We introduced a sprint lead rotating role (responsible for representing the engineering team during planning and update meetings) so that everyone would get to attend and participate in a planning meeting, but not all at once.  


The development process works as follows:

  1. A developer completes a unit of work which is production ready (the assumption is that tests are being developed in conjunction with code) and commits the change to the trunk. Code reviews are incorporated into the development cycle, mostly informal (peer review of change sets), but we have formal code reviews for large and complicated changes.
  2. A continuous build system (Hudson), detects when changes occur and does a checkout, build, test cycle. If the build or tests fail a mail is sent to the development team. If the build is successful the binary is published, for deployment to our continuous integration server.
  3. The continuous integration server picks up the last successful build (up to once an hour), deploys this, then runs a series of functional tests. The functional tests exercise the zembly platform APIs and ensure the integration is working. The functional tests serve as black box tests, where the unit tests are white box tests. If the deployment or functional tests fail a mail is sent to the development team for evaluation. Another set of UI tests are exected if the functional tests are successful which verify the basic functionality of the user interface (you get the picture if the tests fail) across multiple browsers.
  4. In addition, to the above automated checking there is a nighly build which measures code coverage and also runs FindBugs for Java code. The JavaScript code is run through JSLint on each build, which helps us detect problems early.
  5. A set of performance tests run nightly and detect performance variations across builds. The key here is to look for trends and specifically verifying performance optimizations are really working. One of the biggest challenges is to make the tests repeatable (if you can avoid network effects that will save you lots of headaches).

There are a few interesting things that we found work well:

  • breaking the build should be a big deal. Everyone will do it, but promptly fixing the problem is essential. The larger and more distributed the team the more expensive it is to break the build as updating and having to figure out why the build is broken is a pain. Establish a rule of waiting until the continuous build is successful before leaving. Also, peer pressure is effective. 
  • Establish a culture of writing tests, imposing this or trying to write tests after the fact is difficult.
  • Make sure commit messages are understandable. A commit message describing the change lives with the code and may be referenced long after the change commits, so describing the change as well as the potential impacts on other code help both future developers and people testing the code (we used a QE impact section describing what areas would be impacted). 

Deployment and Codeline management

We split a sprint into a number of deployment units (we have done several variations between daily and weekly) but this really depends on how fast the codeline can be vetted and stabilized. This time has increased as the SLA our users have grown to expect has increased. Each deployment unit has a release manager, who is responsibile for ensuring the code line is branched (more below), the bits are tested, stabilized, and pushed to production before the next release manager takes over. The release manager is announced but also gets to wear a special badge, the NASA style vest was not available.

Here are more details on what happens:

  • There are three codelines all the time:
    • trunk - this is never closed, but for larger changes it is better to commit them right after the staging codeline is created. This allows more testing on developer machines and reduces the risk of stopping testing immediately after the staging deployment.
    • staging - this is created from the specific trunk revision that is currently running in the staging environment. Bugs which are detected in the staging environment that are serious enough to prevent a production deployment are fixed in the this branch.
    • production - represents the code which is currently running in production. This branch is used if there are blocker issues in production. Changes in this branch are emergency changes and keeping the production environment running is the highest priority.
  • There is an automated deployment which deploys the latest good bits (these are the bits which pass the test described in the development section) to the staging environment (a small replica of what we run in production) as well as runs the automated tests. The staging environment automated tests are the same as those that run continuously; however, these run in a horizontally scaled environment which has different characteristics.
  • The release manager's job begins after this deployment to ensure the automated test suites pass as well as the codelines described above are uptodate, which is essentially moving the staging codeline to production, and copying the specific trunk revision to the staging codeline (copy and move are used in the subversion context).  The release manager also does a manual sanity check on the bits to look for things that are difficult to detect in UI tests (alignment issues for example).
  • At this point the build is ready for testing. This can be done by a formal QE team and also perhaps through developer brown bag sessions (we use both). The developer brown bag session allows the team to try the software as a group and build or create things, so in addition to finding bugs, things like usability issue will surface.
  • If there are any defects the release manager will determine if the defects require immediate fixes and if so will ensure the fix gets committed to the right codeline and the changes get pushed to staging.
  • Once everything is working as expected, the build is given a go and deployed to production and the cycle starts again. 
The release manager role lets most of the team focus on the code and also provides a way for everyone to get a change to be the release manager. This is a difficult job and walking in the shoes of the release manager helps everyone think about the development process.

Saturday Feb 14, 2009

zembly update

We just finished pushing an update to the zembly production site, if you haven't been there in a while it is a good time to check it out as we have made major and minor improvements to the user experience so we would like to know what you think. 

Saturday Feb 07, 2009

MySQL 5.1 v 5.0

I recently had an opportunity to compare the performance of MySQL 5.1 v. 5.0. The daily performance test suite for zembly simulates the various loads that we see in our production environment, so our experiment was to change the JDBC connection pool in glassfish to point to an instance of MySQL 5.1 and run the tests.  Without any code changes, we saw a better than 25% improvement. Nice work guys!

We also use the MySQL enterprise monitor both in our testing and production environments. The 2.0 version has query analyzer which gives you the query statistics v. just telling you there are slow queries.  This functionality uses the MySQL proxy to aggregate the query information. There is a performance impact for this (we noticed this during our peformance testing scenarios), but we did some discover some queries which were running to often so it proved useful.

Wednesday Dec 24, 2008

zembly book is almost ready for release

The zembly book is now available from amazon (preorder). I am still waiting to see it in print, but I am hoping this will be useful to people interested in zembly directly or the technologies permiating the social application development space. 

Friday Nov 14, 2008

How to pass a parameter into an XSLT stylesheet

First step is to set the parameter in Java. Here is some sample code:

TransformerFactory tf = TransformerFactory.newInstance();
File transform = new File("myTransform.xsl");
Transformer t = tf.newTransformer(new StreamSource(transform)); 
t.setParameter("aParam", "value");
t.transform(new StreamSource(new File("input.xml")), new StreamResult(new File("output.xml")));⁞

Above is the basic code to get a stylesheet working, so performance could be improved by compiling the stylesheet using TransformerFactory.newTemplates. The Transformer.setParameter call is what actually passes the parameter into the stylesheet.

 The next step is to start using the parameter in the stylesheet. The parameter is treated just like an param element so this can be accessed by adding a param element as a child of the stylesheet element.

<xsl:stylesheet xmlns:xsl=""

   <xsl:param name="aParam"/>

Now this can used within the stylesheet just like a parameter. 

Monday Sep 01, 2008

Safari Rough Cut release of zembly book

I have been working on a book describing how to use zembly to create the social web (think wiki for code targeted to social networks and other web 2.0 platforms like a blog), which was released via Safari Rough Cuts. There is still time to influence the book's content, so please use the feedback mechanism.

Friday Jun 06, 2008


I have been working hard on zembly that began "perpetual beta" today. At zembly, we are trying to provide a service to easily create social network applications that run on social networking sites like Facebook and Meebo as well as hosting widgets.

One of our focal points is making invoking Restful services easy. There are several difficult (not necessarily in terms of technology) issues to deal with when consuming restful services including API key management, and perhaps one of the most challenging finding and determining relevant data providers.

We have also tried to remove the distinction of code and deployment space by developing and "deploying" applications in the browser. This is a similar model to how blogs or wikis have taken over what in the past was the tedious process of editing and uploading html and other content.

If you are interested in taking a look at zembly, please visit the site. This is currently in private beta but there are some invitations available :)  

Wednesday Apr 09, 2008

Interested in an Internship

The team I am working on has several openings for interns. We are looking for people who want to build Facebook, MySpace, iPhone, and Meebo applications. If this sounds fun to you, please take a look at the job posting.



Sunday Mar 30, 2008

Working with the Facebook data API

I just finished working with the Facebook Data API to see how it compared with other storage services like Amazon S3. According to the documentation there is no intention to charge for normal usage of this service, so if you are building a facebook application it is worth considering this API. 

The Data API requires referencing objects by identity. All objects have intrinsic identity which is returned whenever an object is created. This can be referenced from FQL as '_id'. The Data API also supports the ability to reference an object using a user defined key (referenced in the documentation as a hash key). Just as in object oriented programming, objects can have properties which must be one of the following set of types (integer, string (255 char max), and text storage as a blob).

I used this API to allow associating tags with a user.  I started by defining the data model using the DataStoreAdmin link on application page. Using this tool, you can quickly create object and association types as well as execute FQL queries. I created two object types user (containing user id) and tag (containing the tag value) as well as the tagged association (a symmetric association between user and tag).

When a user is tagged the following operations are performed:

\* Create a tag object by invoking data.setHashValue passing in the tag string for the hash key as well as the value property. This service returns the object id and allows the object to be referenced via hash key in addition to the object id. In this data model, invoking setHashValue is idempotent and thus this can be called every time a user is tagged.

\* Create a user object by invoking data.setHashValue passing in the user id as the hash key and the id property.

\* Create an association using the data.setAssociation service to link the user object and the tag object. The id's returned by setHashValue should be used to collect the id's.

When the tags for a user are displayed, the tagged association can be queried, using fql.query, to determine the set of tags applied to the user. The query for extracting the tags would roughly be:

SELECT value FROM app.tag WHERE _id IN
   (SELECT tag FROM app.tagged WHERE user = tagged_user)

The storage mechanism makes this data model pretty straightforward. This hash key mechanism can store properties of non scalar data such as JSON. This supports the ability to use the getHashKey method to retrieve more than a single property. There is also an interesting feature of hashkey, data.incHashKey, which allows atomically modifying an integer property value.

 The example above could be extended to support a tag cloud, by counting how many instances of the tag have been added. One way to achieve this is by adding a count object with a single count integer property. The count object type would have an association with both the user and tag objects to have a unique count for each user. The count would be obtained through a query over both associations. Another implementation technique would be to use a synthetic hash key and a unique tag object for each user. The tag object would be extended with the count property. The tag object's hash key would combine the user id and the tag name. The count property would be incremented each time the user is tagged.


  • The API would be smaller if only hash keys were required for each object. The object id is somewhat difficult to work with and typically domain data provides unique identifiers.
  • The batch API is required for any significant use of this API, as a single logical operation will translate to multiple service invocations. It would be nice to have more support directly in the data API.
  • A lot of common use cases could be supported by extending the increment hash key capability to any object property. This could be done by passing the last version id and the current update and rejecting the change if an intervening change has occurred. In relational databases, this is implemented using an optimistic locking technique such as a version column.
  • This technology is worth evaluating if you are working with a viral facebook application.


Monday Jan 21, 2008

Caching static resources in glassfish

I am working on a project which has a fixed daily deployment schedule, Monday through Friday there is a 9:45 deployment. I wanted to take advantage of the known deployment schedule to incorporate Cache control directives when serving static resources. Currently, Glassfish is serving the static resources so I wrote a simple Filter which I applied to the static resource patterns. The filter is instantiated when the web module loads, so the next deployment serves as the appropriate caching period (the period will vary based on the day, the Friday deployment will be cached over the weekend). 

Glassfish (as well as other servers like Apache) will set the Last Modified and ETag headers. These headers still require the client to issue an HTTP request for the resource; however, the server may respond with a 304 status code which indicates the content was not modified since the given date and etag (a mechanism for detecting change in the contents of the resource, like a hashcode). This is accomplished by the client sending the If-Modified-Since header along with the If-None-Match header to let the server know the date and "signature" of the file currently cached. The server will return a 304 if the response would return the same resource which saves the network transmission time for the resource.

This approach doesn't solve the problem where browsers generally only request two resources from the same domain in parallel, so if there are hundreds of resources on a page the overhead of determining whether the resource was current could impact the page load and hence the user experience. Instead I wanted to use the regular deployment cycle to reduce the number of requests required to determine whether resources are up to date.

The filter sets both the HTTP 1.0 Expires header as well as the HTTP 1.1 Cache Control Directive, to ensure both protocols are covered. If you are using apache to server static (or dynamic) resources you can find a good resource for setting up apache modules to do something similar on Glassfish has admin console capabilities for configuring caching which should be evaluated before doing everything in code.

If you want to give it a try in a web module, you will need to edit web.xml to add a filter and a filter mapping:

        <url-pattern>/images/\*</url-pattern> <!-- these patterns should match cached resources -->

Add the filter class to the web module along with adjusting the appropriate timeout.


Saturday Jan 05, 2008

Interesting Site for a collection of web links

One of my colleagues sent me a link to the 50 most popular web design posts. This has some really cool and interesting Web 2.0 design topics, worth a look. 

Wednesday Jan 02, 2008

Dapper Camp

Dapper is an interesting software as a service website that serves as a transformation engine for the content of a website. The content can be converted to a variety of different formats including RSS, XML and JSON. One of the common use cases for Dapper is to select some subset of a website's content, which is certainly interesting. However, a more interesting use case for "Web Service" type applications is the ability to create an external API for web sites which do not have API's defined. These API's are constructed using the Dapper interface which allows publishing a RESTful service that can then be consumed by service side mashup services. There is a free dapper camp in San Francisco in February where you can learn more about this technology.

In addition to the dapper camp, a contest to build a NetBeans plugin is underway. If you are considering enhancing the NetBeans REST or other web technology support this provides an interesting incentive.




« October 2016