Friday Jun 02, 2006

Fast Infoset 1.1: Improvements to API when using External Vocabularies

I bumped up the latest stable version of FastInfoset to 1.1, to reflect changes i made to the org.jvnet.fastinfoset API (in addition to general tidying up and removing some crud) that improves support for external vocabularies.

In a previous blog i presented some code that showed how to create and utilize external vocabularies. This code is a bit messy and also depends on stuff that can potentially change in the future. So in an attempt not to pull the rug from under developers feet i have created some classes in the org.jvnet.fastinfoset package. You can still use the mechanism but i cannot guarantee that it will not change in the future.

As in the previous blog the following code will prepare information from schema and sample documents to generate a vocabulary:
  String args[] = ... // e.g. args from main
// Process the schema
SchemaProcessor sp = new SchemaProcessor(
new File(args[0]).toURL(), true);

// Procese the sample documents
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser p = spf.newSAXParser();
FrequencyHandler fh = new FrequencyHandler(sp);
for (int i = 1; i < args.length; i++) {
p.parse(new File(args[i]), fh);
Once all information has been generated and sorted a Vocabulary instance can be obtained from the FrequenceHandler:

Vocabulary v = fh.getVocabulary();
String externalVocabularyURI = args[0];

(Much simpler than before and there is no need to specify an API for which the vocabulary is intended for).

Next the vocabulary can be set on the SAX serializer as an external vocabulary:

ExternalVocabulary ev = new ExternalVocabulary(
  FastInfosetWriter writer =
new SAXDocumentSerializer();

(Note that FastInfosetWriter interface is being used. I would encourage developers when they instantiate concrete instances of Fast Infoset parsers/serializers that they use the appropriate API interfaces where possible so to minimize possible change if upgrading).

Then the external vocabulary can be set on the SAX parser:

  FastInfosetReader reader = new SAXDocumentParser();
  Map externalVocabularyMap = new HashMap();

Hopefully you will agree that this is a improvement!

Wednesday May 31, 2006

Another Fast Infoset implementation pops up

Noemax's product SDO, a communication/security framework for .Net applications, supports Fast Infoset. You can get the LGPL open source version for free here.

The source shows that XmlReader and XmlWriter implementations have been developed to support the serialization and parsing of FI documents using the .Net XML APIs.

Monday May 15, 2006

FIFI at JavaOne

Some of you with a certain "je ne sais quoi" of French please hold off those thoughts! Tous seront indiqués...

FIFI stands for Fast Infoset for Indigo. Gerry, Santiago and I will host a BOF on FIFI:

BOF-2535 Project FIFI: Bridging the Interoperability Chasm Birds-of-a-Feather  Wednesday
09:30 PM - 10:20 PM
Moscone Center
Hall E 135

If you are interested in finding out more come along!

Monday May 08, 2006

FastInfoset is now in the maven repository

As part of the push to move Glassfish jars to the the Maven repository i have just pushed FastInfoset.

Friday May 05, 2006

Data binding

When combining XML binding with Fast Infoset it is possible to better utilize more advanced features of Fast Infoset.

For example, Fast Infoset supports the encoding of information in binary form using encoding algorithms,  such as integers and real numbers, which would otherwise be represented in lexical form, and encoded in say UTF-8. So an array of Java float[] can be encoded in IEEE (big endian) binary representation using the "float" encoding algorithgm. To avail of such functionality does not require a schema as the encoding algoreithms are not tightly coupled to the XSD datatypes (an informal mapping is between the two can be specified). So an attribute value described as a xsd:list of xsd:floats and represented as float[] in Java can utilize the "float" encoding algorithm. The important thing to note is that a processor of fast infoset documents, containing such information, does not require the schema to decode the documents.

It is also possible to combine XML binding with Fast Infoset, especially in the case of serialization, for increased performance. Since the binding tool has knowledge of the schema and element/attribute declarations it is possible to perform the mapping of an element/attribute qualified name or a local name to an index in O(n) time and thus for such information no hash map lookup is required (a simple little trick is used to assign indexes in the order of occurrence). Such a technique will also reduce dynamic memory allocation.

Fast Infoset and Liquid XML

Liquid Technologies have released a beta implementation of Fast Infoset for the latest release of their Liquid XML data binding product. Currently there are implementations in C++ and C#. Java and VB6 implementations will follow. It is great to see so many different implementations available and planned!

From browsing information on Liquid XML it looks like a very comprehensive XML binding technology for multiple platforms.

Now that there are multiple implementations of Fast Infoset it would be great to show that they are interoperable.

Monday Apr 17, 2006

How to use an external vocabulary

In a previous blog i explained how external vocabularies can be generated from schema and a set of sample documents. In this blog i will explain how you can using the Fast Infoset implementation with an external vocabulary.

All the referenced classes are found in either FastInfoset.jar or FastInfosetUtilities.jar. Currently the FastInfosetUtilities sub-project is not packaged up into the distribution so you will have to get this from CVS and build it yourself for now. I will sort out the packaging very soon for those that want access via the distribution.

First a schema needs to be processed to obtain all relevant information in the schema ordered lexicographically (namely the element and attribute declarations):
  String args[] = ... // e.g. args from main
SchemaProcessor sp = new SchemaProcessor(
new File(args[0]).toURL(), true);

Next the information from the schema processor is used to initialize a frequency handler. The frequency handler processes a list of sample documents and orders the information in the sample documents and schema according to the frequency of occurence of such information (it is not necessary that the sample documents are valid according to the schema but obviously the sample documents should have a close association with the schema):

  SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser p = spf.newSAXParser();
FrequencyHandler fh = new FrequencyHandler(sp);
for (int i = 1; i < args.length; i++) {
p.parse(new File(args[i]), fh);

Once all information has been generated and sorted the serializer and parser vocabularies can be generated:

  VocabularyGenerator vg = new VocabularyGenerator(
  SerializerVocabulary externalSerializerVocabulary =
  ParserVocabulary externalParserVocabulary =
  String externalVocabularyURI = args[0];

It is important to specify the API for which the SerializerVocabulary and ParserVocabulary will be generated for by the VocabularyGenerator as there are slightly different representations and optimizations for each API. This is something i want to improve by having a canonical representation of a vocabulary whereby it is up to the parser/seriazlier implementation to convert the canonical representation to the appropriate internal and optimal representation.

The SerializerVocabulary instance, externalSerializerVocabulary, can then be set on an instance of a SAXDocumentSerializer:

  SerializerVocabulary initialVocabulary = 
      new SerializerVocabulary();
      externalSerializerVocabulary, false);       
  SAXDocumentSerializer saxSerializer =
new SAXDocumentSerializer();

An initial vocabulary needs to be created that contains the external vocabulary and the URI that is to be used for the external vocabulary. The instance, ParserVocabularyexternalParserVocabulary, can then be set on an instance of a SAXDocumentParser:

  SAXDocumentParser saxParser = new SAXDocumentParser();
  Map externalVocabularyMap = new HashMap();

A Map is used to associate an external vocabulary with the URI since the same parser instance could parser two documents that have different external vocabulary URIs.

The saxSerializer and saxParser instances are now correctly initialized so that they can interoperate.

As you can see it is important that they both agree on the external vocabulary. There is currently no general standard to specify an external vocabulary generated from schema and sample documents so it is necessary to define the nature of the external dependence on a per case basis, preferably in a known tightly coupled scenario or by standards that explicitly declare the external vocabulary by other formal means.

It is also possible to create your own serializer and parser vocabularies but i would recommend using the SchemaProcessor, FrequencyHandler and VocabularyGenerator if possible as this will minimize dependecies due to changes to the vocabulary API.

Wednesday Apr 12, 2006

Realistic generation and use of external vocabularies

In Fast Infoset terminology an external vocabulary is something a fast infoset document can reference (via a URI). The (referenced) external vocabulary contains tables mapping information to indexes. Thus fast infoset documents referencing the external vocabulary do not need to encode the information that is already present in the external vocabulary and can encode the indexes instead. To benefit from an external vocabulary encoders and decoders must share the external vocabulary out-of-band (so it can be considered external knowledge, a bit like a schema is external knowledge).

Such use can reduced the size of fast infoset documents and consequently increase the encoding and decoding performance because less data is produced and consumed.

External vocabularies are suited for the case where the unique markup of an XML document represents a high proportion of the overall XML document size, which can be the case for UBL and FpML infosets.

Size results generated when using an external vocabulary for UBL and FpML infosets showed promise. However, the generation of the external vocabulary was from the document itself, which is entirely unrealistic in practice.

To be realistic external vocabularies should be generated from a schema and a set of representative sample documents. The schema primes the external vocabulary with information and the set of samples optimize the vocabulary so information is assigned indexes in proportion to frequency of occurence. I have created just such functionality as part of a newly create sub project of the Fast Infoset project at, FastInfosetUtilities. The SchemaProcessor class generates information from schema, the FrequencyHandler class orders that information according to a set of sample documents, and the VocabularyGenerator generates external vocabularies, from the ordered information, to be used for encoding and decoding.

So now we are in a position to compare size results using realistic and unrealistic external vocabularies.

The FpML data has quite a few samples, 90 in all. So using these samples with the complete FpML schema should present a realistic use-case. I created a few Japex config files and some drivers to operate on the Japex config parameters to compare various configurations for measuring the size of fast infoset documents, then ran Japex and it produced the following chart for the means of % of bytes relative to the size of the XML document:

The red "XML" bar represents the size of XML documents. I configured Japex to measure relative to the "XML" bar which is why it is always at 100%.
The blue "FastInfoset" bar represents the size of fast infoset documents when using default settings for the Fast Infoset encoder.
The green "FastInfoset_UseSchema" bar represents the size of fast infoset documents when using an external vocabualry generated from the FpML schema but without using sample documents.
The yellow "FastInfoset_UseSchema_UseSamples" bar represents the size of fast infoset documents when using an external vocabualry generated from the FpML schema and using sample documents
The orange "FastInfoset_UseTestCaseDocument" bar represents the size of fast infoset documents when using an external vocabulary generated from the (test case) document itself.

When using no external vocabulary the fast infoset documents are about 50% of the XML documents (when comparing the arithmetic means). When using a schema to generate an external vocabulary fast infoset documents are about 33% of the XML documents. When using a schema with samples to generate an external vocabulary fast infoset documents are about 29% of the XML documents, which is slightly smaller than the size of the fast infoset documents using an external vocabulary generated from the document itself.

Initially i was surprised at the last observation but after a little reflection it makes sense since when using a set of samples the most frequent information is given the smaller indexes, which is not the case when using the document itself (the most frequently used information could occur towards the end of the document).

What is nice about this result is that although previous size results have been presented using an unrealistic technique the results using the realistic technique are actually slightly better.

Tuesday Mar 14, 2006

How to enable Fast Infoset

I have had a number of questions about how Fast Infoset can be enabled in JWSDP 1.6 and JWSDP 2.0. The answer is deceptively simple for developers even if the details (the devil is always in them) can quickly get somewhat complex, something i think i could have done better to highlight in the release notes.

The release notes for JWSDP 1.6 and JWSDP 2.0 contain links to Fast Infoset and how it can be used (see here and here respectively).

Enabling Fast Infoset requires no changes to WSDL and there are no changes to the JAX-RPC and JAX-WS APIs. It will work with any doc/lit, rcp/lit or even with rpc/encoded services. It will work with any statically or dynamically generated client side artifacts.

To enable Fast Infoset (in JAX-WS) one simply sets a property on the client side object obtained from the service as follows:
stubOrDispatch = ...;    // Obtain reference to Stub or Dispatch
((BindingProvider) stubOrDispatch).getRequestContext().put(
"", "pessimistic");

That is it! (for JAX-RPC it is almost identical, see the release notes for the differences).

How can that work?

Setting the CONTENT_NEGOTIATION_PROPERTY property to "pessimistic" will inform a JAX-WS client that it should add an HTTP Accept request-header field that includes a Fast Infoset MIME type when sending the HTTP request with the XML encoded SOAP message.

A service that is Fast Infoset enabled can look at the HTTP Accept request-header and if it includes a Fast Infoset MIME type the service can respond with Fast Infoset encoded SOAP message.

This is referred to as HTTP agent-driven negotiation.

When the client receives a Fast Infoset encoded SOAP message it knows the service can support Fast Infoset. So for any subsequent requests (for the life time of the stub) the client can encode using Fast Infoset. This is why the content negotiation is referred as "pessimistic", since the client pessimistically assumes on the first request that the service does not support Fast Infoset. The advantage of being pessimistic is that interoperability between XML-based Web services is supported while transparently accelerating those services that are Fast Infoset enabled.

By default all JWSDP 1.6 and JWSDP 2.0 services are Fast Infoset-enabled and so they are prepared to correctly process HTTP Accept request-headers and to also accept requests that are encoded using Fast Infoset.

Monday Jan 23, 2006

Difference between Fast Web Services and Fast Infoset

A question i often get asked is:

What is the difference between Fast Web Services and Fast Infoset?

It is easy to get confused between the two because of the way i first used the "Fast Web Service" term (in 2003) and then the adoption of the term for the standard in the ITU-T and ISO organisations. Here I will try and clearly explain the connections between them. When I refer to "Fast Web Services" and "Fast Infoset" it means I am refering to the standards.

Fast Web Services and Fast Infoset are complimentary.

Fast Infoset specifies how to encode an XML infoset in binary.

Fast Web Services specifies two ways of encoding SOAP messages in binary using Fast Infoset or a more optimized form.

The first part specifies how to encode SOAP message infosets using Fast Infoset. This is the part of Fast Web Services that has been implemented in JWSDP 1.6 and JWSDP 2.0.

The second more optimized, and complex, part of Fast Web Services specifies the use of WSDL, X.694, an ASN.1 schema for the SOAP message infoset, and the ASN.1 packed encoding rules. This can produce highly compact binary SOAP messages that are very efficient to serialize and parse.

The first, using Fast Infoset, is easier to integrate into an existing Web service stack that uses the JAX-RPC, JAX-WS and JAXB 1.x/2.0 APIs, where as the second is more complex. In addition the first does not depend on WSDL so can be considered more "loosely-coupled". So, for now, we have focused our open source development and product efforts on Fast Web Services using Fast Infoset.

Wednesday Jan 18, 2006

Solving a problem

One of the most pleasurable aspects of engineering is getting positive feedback on how one's work is put to use.

Fast Infoset solved a performance issue related to the embedding of binary data for Tony (see full email on the Fast Infoset users list), who writes:
The performance was pretty horrible though and I was \*just\* about to
write a binary encoded XML mechanism when I came across your Fast
Infoset stuff - which did \*exactly\* what I needed and saved a whole
bunch of development and testing. It worked first time too <smile>!

Thanks Tony! you made me <smile> too :-)

The SAX and StAX implementations of Fast Infoset can be used to embed binary data directly into a fast infoset document that would otherwise have to be base64 encoded and embedded as characters in an XML document, or the binary data would have to be included as an attachment in an MTOM encoded XML document (that includes XOP include infoset, which references the attachment).

For embedding of binary data using an extention to the StAX API see this email thread.

The Xj3D project for rendering and manipulating X3D and VRML documents utilizes Fast Infoset for parsing and serializing X3DB documents. The Fast Infoset implementation provides extensions to the SAX API (using a primitive and algorithm handler), which are used to get access to binary data, and integers, floats etc for efficient than as characters.



« April 2014