Trip Report: Java ONE, June 28 - July 1 2004

My goal for this year's Java ONE conference (my first) was to learn as much as I could about the "state of the art" of XML processing in Java. Given the pervasiveness of XML in the J2EE platform, this was not an easy task. There were somtimes concurrent sessions that dealt with XML, and session overload kept me away from the evening BOFs after Tuesday. That said, here are some highlights.

On Monday, Jeff Suttor, Bhakti Mehta and Ramesh Mandava surveyed the tools that make up JAXP 1.3 (JSR 206), Java's most recent API for XML processing. Their survey's leitmotif was performance, and the star performer is XSLTC, Sun's XSL transformation compiler. Bhakti reviewed performance tests that demonstrated a ten-fold decrease in run time for transformations, when ported from Xalan to XSLTC. With 1.3, XSLTC has been made the default transformer, which means it's what the factory will give you unless you know the URI for Xalan. This is an interesting decision, because it means that even when an application loads a stylesheet at transformation time for a single use, the factory will have to pre-compile it. The benefits are analogous to those of just-in-time compiling for the JVM, and given that HotSpot is now the default JRE, it makes sense that XLSTC would become the default transformer. Compiled "translets" are thread-safe, and can be cached for re-use.

Fans of alternative validation methodologies will be happy to hear that Sun's Multi-Schema Validator has been bundled with JAXP 1.3, and is available through the API. Grammars may be pre-parsed and cached for repeated, thread-safe invocation. In addition to built-in support for XML Schema and Relax NG, there is a ValidatorHandler interface that can be used as the basis for a customized validator. Nota bene: there is no support for DTDs (the entity model makes this awkward). One of MSV's many advantages is that a compiled grammar can be used to validate a document in memory, and not just at parse-time, which opens up a wide range of new applications. For example, validation can be inserted in among series of transformations or SAX filters via "assertions" during testing and debugging.

JAXP 1.3 includes for the first time a stand-alone implementation of XPath 1.0. XPath is the W3C langauge for describing parts of an XML document. Expressions can be pre-computed for repeated evaluation, and applied against any document that can be represented as an abstract object model. There is built in support for DOM, but users can build their own. The value of an XPath expression can be returned as any type supported by the XPath data model, which is limited to Strings, Integers, and Nodes (more on datatypes later).

Though there is no support for XQuery (XML Query) 1.0 in JAXP 1.3, there is a separate project under way (JSR 225) to develop XQJ, the XQuery API for Java. As the standard is still in candidate status, the API will likely continue to evolve. Andrew Eisenberg (IBM) and Jim Melton (Oracle) gave an overview of the standard and a preview of the new API in a talk Tuesday Morning. XML Query is a functional, idempotent (side-effect free) language for asking questions of collections of XML resources, whether these be documents, relational databases, or object repositories; whether stored as XML, or viewed as XML via some kind of middle-ware.

An XQuery expression always returns a sequence (think SQL cursor), made up of zero or more atomic types or document nodes. As the language is functional in design, expressions can also operate on sequences - or the return values of other expressions. This allows for a much more flexible processing model than that supported by XSLT.

XQuery 1.0 is a strongly typed language. Its type system is borrowed from XML Schema. Future versions of XPath and XSLT will also make use of XML Schema's type system. Norman Walsh gave an overview of XPath 2.0 and XSLT 2.0 on Thursday afternoon.

The new versions of the standards are currently in last call, and it is hoped that they will become candidate recommendations by the end of this year. But the wheels of the W3C turn slowly, so don't hold your breath.

Like XQuery 1.0, XPath 2.0 will operate on sequences of items. An item may be a node, or it may be any atomic type supported by XML schema. Up to now, operations like comparison have always taken their operands to be strings or nodes, but with a full range of types available, comparison will be more clearly defined. If types are not correctly cast, a comparison could result in an error, and such errors must interrupt processing. This could cause problems for users who wish to upgrade their 1.0 stylesheets to take advantage of the 2.0 type system. To take advantage of the new type system, XPath 2.0 adds operators like "eq", "lt" and "gt" for performing value comparisons, and "<<" and ">>" for performing node comparisons.

XSLT 2.0 will include XPath 2.0 and a host of new features. Regular expressions will make it easier to replace characters with markup (such as when one wants to preserve space in a transformation to HTML). Improvements to the template priority mechanism will make it easier to specify fallback templates. Direct access to the result tree will make it easier to perform some complicated, multi-pass operations, such as sort and iterate.

All JAXP tools are based on two interfaces to an XML document: SAX, the parsing event-handler interface, and DOM, the in-memory, random-access document model. Chris Fry and Scott Ziegler, both from BEA, gave a presentation about StAX, an XML "stream reader" interface, that defines an alternate interface to an XML document. StAX is being developed as part of the Java Community Process (JSR 173). The reader interface allows the caller to control parsing, for example:

    XMLStreamReader reader =
        XMLStreamReaderFactory.newInstance( "foo.xml" );
    while( reader.hasNext() ) {
        // Get info about start element, end element, or character event
    }

The nice thing about StAX is that it maps directly to the iterator pattern. The caller can stop the parse at any time, skip ahead some number of events, or switch context. Applications that bind XML to data models will generally be much simpler to code if pull parsing is used.

Finally, I attended a preview of JDBC 4.0, which will include support for all of the SQL 2003 standard. This includes the new XML datatype, and SQL select operators for dynamically generating XML - albeit within the constraints of a tabular result set. For example, the expression

    SELECT generateElement( id, generateElement( 'name',
        generateElement( first_name ), generateElement( last_name )))
    FROM emp;

would return a single column, of type XML, with rows looking something like

    <id>1234<id>
    <name>
        <first_name>Gregory</first_name><last_name>Murphy</last_name>
    </name>

I have misgivings about the addition of this type to SQL. Since there is no obvious way to define a document context, and no way to simulate one within the scope of an RDBMS cursor, handling namespaces gracefully will be difficult. There is also no way to break out of the tabular result model. My prediction is that in five years we will all be using XQuery to extract relational data into a structured context, and the XML select functions will be deprecated relics.

Comments:

Post a Comment:
Comments are closed for this entry.
About

gjmurphy

Search

Top Tags
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today