XML to typed FI utility

I have just created a simple utility and UNIX shell script that will take as input a XML schema and an XML document, and convert that to a fast infoset document.

The XML schema is used to work out XS data types associated with lexical values (text content and attribute values) within the XML document. The lexical values are converted to binary data using a simple mapping between the XS data types and the Fast Infoset encoding algorithms and restricted alphabets.

To use (on a UNIX system. Warning: i have only tested on Solaris) download the latest fast infoset distribution here, and unzip it. Set the FI_HOME and FI_UTILITIES_HOME environment variables to the unziped location of the distribution and include in your path ${FI_HOME}/bin.

As an example, download the latest fast infoset source distirbution here, and unzip it in the ${FI_HOME} directory. Change directory to samples/data. Then type;

  xmltosaxtotypedfi schema/Content.xsd content.xml content.fi

This will convert the XML document:

<content xmlns="http://www.sun.com/xml/content">
    <base64 value="AAAAAAAAAAAAAAAA">AAAAAAAAAAAAAAAA</base64>
    <floats value="3.14159265 2.71828183">3.14159265 2.71828183 1.0</floats>
</content>


to a fast infoset document. The base64 lexical values will be convert to bytes  (in this case all zeros) and the arrays of float lexical values will be converted to arrays of IEEE floats.

The octets of the FI document are as follows (using od -A x -tx1a):

0000000  e0  00  00  01  00  38  cd  1d  68  74  74  70  3a  2f  2f  77
           ` nul nul soh nul   8   M  gs   h   t   t   p   :   /   /   w
0000010  77  77  2e  73  75  6e  2e  63  6f  6d  2f  78  6d  6c  2f  63
           w   w   .   s   u   n   .   c   o   m   /   x   m   l   /   c
0000020  6f  6e  74  65  6e  74  f0  3d  81  06  63  6f  6e  74  65  6e
           o   n   t   e   n   t   p   = soh ack   c   o   n   t   e   n
0000030  74  92  02  0a  20  20  20  20  7d  81  05  62  61  73  65  36
           t dc2 stx  lf  sp  sp  sp  sp   } soh enq   b   a   s   e   6
0000040  34  78  04  76  61  6c  75  65  30  18  03  00  00  00  00  00
           4   x eot   v   a   l   u   e   0 can etx nul nul nul nul nul
0000050  00  00  00  00  00  00  00  f0  8c  06  09  00  00  00  00  00
         nul nul nul nul nul nul nul   p  ff ack  ht nul nul nul nul nul
0000060  00  00  00  00  00  00  00  f0  a0  7d  81  05  66  6c  6f  61
         nul nul nul nul nul nul nul   p  sp   } soh enq   f   l   o   a
0000070  74  73  00  30  67  40  49  0f  db  40  2d  f8  54  f0  8c  1a
           t   s nul   0   g   @   I  si   [   @   -   x   T   p  ff sub
0000080  09  40  49  0f  db  40  2d  f8  54  3f  80  00  00  f0  90  0a
          ht   @   I  si   [   @   -   x   T   ? nul nul nul   p dle  lf
0000090  ff
         del
0000091


Notice that you cannot see the lexical values for the base64 or arrays of float.

The fast infoset document can be converted back to an XML document by doing the following:

  fitosaxtoxml content.fi

Which will produce the following written to the standard output:

<content xmlns="http://www.sun.com/xml/content">
    <base64 value="AAAAAAAAAAAAAAAA">AAAAAAAAAAAAAAAA</base64>
    <floats value="3.1415927 2.7182817">3.1415927 2.7182817 1.0</floats>
</content>


Notice that the documents are not identical! The exact lexical values of the floats are not preserved because the conversion of some lexical values to IEEE float is lossy. Also notice that the FI parser will, by default, convert binary data to lexical values, thus ensuring that typed fast infoset documents can always be parsed. Further notice that you do not require the schema to parse the fast infoset document.
Comments:

Paul, The link pointing to the Fast Infoset distribution seems to be broken. Wilfred

Posted by Wilfred Springer on October 06, 2006 at 01:41 AM CEST #

Wilfred, thanks for pointing this out. I have updated the links to point to the directory where the latest dist and source may be obtained. IMHO the Java.Net documents & files system is very badly designed, it is not very REST like. The URIs are completely non-intuitive to the names of the directories and files. When you upload a new contents to the same file name the URI changes! OK, rant over :-)

Posted by Paul Sandoz on October 06, 2006 at 02:57 AM CEST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

sandoz

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today