Fast Infoset and use with scientific data


I read with interest a paper called A Binary XML for Scientific Applications:

"In this paper we present a binary XML format and implementation for scientific data called Binary XML for Scientific Applications (BXSA). We show that per-formance is comparable to that of commonly used scientific data formats such as netCDF. These results challenge the prevailing practice of handling control and data separately in scientific applications, with web services for control and specialized binary formats for data."

And it concludes:

"This rough performance parity demonstrates that binary XML is suitable as an alternative to specialized scientific data formats. Storing and sending as XML would then also directly facilitate the use of various data techiques such as semantic mediation, ontologies for data, and provenance tracking."

"Web services will not solve all format problems. They are a promising, widely-adopted approach, however, with an active research and development community. By unifying control and data into one framework with binary XML efforts such as BXSA, the promise of web services can be more fully realized for scientific computing."

I agree very much with the conclusions that binary XML efforts can very useful in network communication for scientific computing purposes.

However, what i do not agree with is their review of Fast Infoset:

"Fast Infoset. The Fast Infoset [19] is an effort by Sun to develop a fast serialization for XML. Unfortunately, since it is being submitted as an ISO specification, it is currently not publicly available for review and comment. It is based on ASN.1 [15], which is commonly used in the telecommunications industry. ASN.1 is a large, complex standard, and is probably not ideally suited to the needs of scientists."

On the contrary, i think Fast Infoset is very much suited for the needs of scientists as discussed in the paper.

Fast Infoset was standardarized jointly in the ITU-T and ISO organizations. It is true that the Fast Infoset standardization process at ITU-T and ISO does not have a public review period like the W3C, but the standard will soon be available for free at ITU-T and ISO. A disadvantage of not having a public review period does mean that information about Fast Infoset is not always obviously available. However, a search will quickly give all you need. For a clear and precise description see the Wikipedia entry.

Fast Infoset is  specified using all the formal semantics of ASN.1 but a Fast Infoset implementation does not require an ASN.1 toolkit. The Fast Infoset implementation at java.net is an entirely stand alone Java-based implementation of Fast Infoset that supports the SAX, StAX and DOM APIs, and this implementation is used in our application server (and soon to be used in Glassfish).

Fast Infoset supports the direct encoding of arrays of floats and doubles as specified by IEEE 754. The Fast Infoset specification is being used by Web3D as the foundation for specifying the binary encoding of X3D constructs. X3D makes extensive use of numeric data. Fast Infoset has an extension mechanism that allows X3D to specify optimal encodings of numeric data.

Fast Infoset supports many of the properties that the BXSA format supports. It would be interesting to compare the performance of Fast Infoset with BXSA and the scientific data formats netCDF and HDF5.

An example of using Fast Infoset with arrays of Java primitive types can be found here.

I would be very interested in understanding how better to make Fast Infoset and Web services more useable to the scientific community. If you are a member of such a community and have investigated or want to investigate Fast Infoset or binary XML in general then feel free to drop me an email.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

sandoz

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today