In a previous blog
i discussed some Fast Infoset and bnux performance results and how
measurement of Fast Infoset was very different from the measurement
of bnux ('oranges-to-apples').
For the first iteration to get close to an 'apples-to-apples'
comparison I have developed an optimal FI XOM parser that integrates
into XOM in a manner such that it is assumed the hallowed status of a
trusted parser that performs correct well-formed checking (see
previous blog for the configuration). This is for reasons for
expediency as it is easy to modify the existing FI DOM parser (which
already performs well-formed checking). Such a solution is not
practical given that it avails of the package private methods for XOM
node construction. However, the results will give a good indication
of Fast Infoset performance and whether it is worth pursuing a more
practical implementation along the lines of how bnux is integrated.
The differences between the FI and bnux measurement has been
reduced from different layers of integration and 2 sets of
well-formed checks to the same layer of integration and one set of
well-formed checks. The only difference that remains is where the
well-formed checks are being performed. The FI XOM parser performs
the well-formed checks and the bnux XOM parser defers such checks to
XOM (which is probably one of the, if not the, most rigorous
well-formed checkers there is).
This is not an exact 'apples-to-apples' comparison, but I would
classify it as different types of apple, say 'red delicious' to
'golden delicious'. Perhaps I should have originally compared two
fruits that have an old common ancestor (what ever that might be, I
briefly looked at the tree of life
but could not find apple or orange tree present in the database!)
then chosen a fruit with a closer common ancestor that is not an
A 'red delicious' is great for snacking but not good for baking,
kind of like the FI XOM implementation since it is not designed to be
baked into XOM for practical use. A 'golden delicious' is great for
snacking, cooking and baking, kind of like the bnux XOM
implementation. I digress somewhat but I hope the point is clear: the
measurement process is a lot closer to 'apples-to-apples' than
before. (I may have swung slightly in favour of FI over bnux because
the FI well-formed checking is likely to be more efficient than the
XOM well-formed checking. I will return to this point later on).
The data chosen to measure was the set of 56 XML documents
supplied with the nux distribution. I made sure the FI XOM parser
passed tests for round-trip consistency on all the XML documents to
ensure that all information was reported correctly and the parser was
not cheating and gaining an unfair performance advantage.
To measure performance I created Japex drivers to:
(I reused the FI size Japex driver from the Japex XML driver
library to report the size of FI documents.)
Then, I created Japex configuration files to describe the
measurement process to be performed by the Japex runtime (one for
size and one for parsing). Running both configuration files produced
some nice pretty charts :-)
Rather than present the individual results in all their the gory
detail it is easier in this forum to show the average of all the
results and provide a link to the zip file containing detailed
results. Ideally I would have liked to show a log-log scale scatter
plot of all the results (size/throughput vs parser time) but Japex
does not have this functionality (not yet, but i want to add this
capability!). So average results it is, with a 'government health
warning' that these averages are taken over a wide range of sizes and
parse times, so please download the detailed results if you want to
The size results are presented in the following chart (the detailed
report may be found here).
results are presented as a percentage of the XML document size, so
the smaller the better. The amount of compression achieved by FI can
depend on the degree to which repeating text content and attribute
values are replaced with small integer values (commonly referred to
as indexing). The FI serializers can control this degree using a
simple heuristic based on the length of the character strings for
text content and attribute values, if the length is less than 'n'
than the character string is is indexed, otherwise it is not. The red
bar labeled 'FastInfoset7SizeDriver' presents results for the fast
infoset documents with the indexing of character strings less than 7
in length (the default value for the implementation of the Fast
Infoset serializers), the green bar labeled 'FastInfoset64SizeDriver'
presents results for the indexing of character strings less than 64
in length. It is my understanding that bnux indexes all character
strings up front at the head of the document.
When an appropriate degree of indexing is chosen FI is in most cases
slightly more efficient than bnux. In three cases bnux is more
efficient than FI. My initial guess as to why this is the case is
that FI, for the most part, is slightly more efficient at 'packing'
information but in some cases the length prefix encoding of character
strings for FI is more costly than zero terminating encoding of
character strings for bnux.
The parse results are presented in the following chart (the detailed
report may be found here).
The average results are presented as a percentage of the bnux
parse time, so the smaller the better. These results show that FI is
slightly faster than bnux. In three cases bnux is slightly faster.
Although the average results do not show a difference between the
degrees of indexing (64 and 4096) there is a difference in individual
results when correlated to differences in size. These results are
As i said earlier FI is likely to have an unfair advantage over
bnux because of where the well-formed checking is performed. To
ascertain how unfair this might be I hacked together an FI SAX parser
that performed no well-formed checking then measured this against the
well-formed checking SAX parser and the bnux parser using a XOM null
node factory (which performs no creation of nodes and therefore no
well-formed checking). The detailed report may be found here.
From these and the previous results it is possible to obtain the
approximate cost of a FI XOM parser with XOM well-formed checking by
doing the following:
subtract the null node factory parse time from the bnux parse
add the value obtained 1 to the hacked FI SAX parser that
performs no well-formed checking
compare 2 against the bnux parse time.
The data indicates that such a FI XOM parser would be anywhere
between 5 to 20% slower (depending on the document) with an average
of 5%. This is also encouraging! Intuitively this is to be expected
as bnux is a simpler format than Fast Infoset so i would expect it to
be more efficient in this respect. (But note that the differences
between the Fast Infoset encoding and the bnux encoding are not that
great so, intuitively, i would not expect the differences to be that
Overall the results using the FI XOM parser (with well-formed
checking performed by the parser) and the possible results using a
theoretical XOM parser (with well-formed checking performed by XOM)
indicate that it is worth implementing an FI XOM parser in the same
manner as bnux for practical use with XOM and also nux. Development
of such a parser, in addition to an efficient serializer, is likely
to take longer than the development of the prototype parser. I may
need to rethink the design of vocabulary tables and external
vocabularies such that implementations can optimize the vocabulary table
representation and convert to and from a canonical vocabulary table