HTML in XML Support
By Tim Dexter-Oracle on Sep 12, 2011
I luckily still get to see some internal emails from the development team. I saw one today making mention of an upcoming 18.104.22.168 roll up patch set that is coming. This piqued my interest as I thought, was there one for August? Indeed there was or is, check out patch 12831433. Digging into the patch read me I found a nugget of gold that many folks have been looking for ... out of the box HTML >> FO formatting.
I have written about this subject elsewhere in the blog and you can find em using the search box. This is new, this is out of the box support to convert you stored XHTML to the required format for Publisher's underlying language, XSL-FO. What do I mean by that? Being able to take
<B>This is bold text</B>and convert to its XSLFO equivalent
<fo:inline font-weight="bold">This is bold text</fo:inline>
There are some restrictions to what's available right now but its a big step forward. Here's the details.This patch supports HTML embedded in XML. The following new layout command is added to retain HTML format from data in the final output.
<?html2fo: xpath?>The HTML code you want converted of course needs to be within the report's xml data so that the template processor can work on it. Further, the html2fo command needs the HTML to be inside a CDATA section in the report data. The term CDATA is used to describe or store text data that should not be parsed by the XML parser. For example the < and > characters denote the opening and closing characters for an XML element tag. In our case we want them to denote the opening and closing of HTML commands. Without the CDATA section the XML parser would attempt to parse and process the HTML commands resulting in unexpected results to say the least.
In the example below the HTML code is embedded in the field RTECODE.
<?xml version="1.0" encoding="UTF-8"?> <RTECODE> <![CDATA[ <font style="font-style: italic; font-weight: bold;" size="3"><a href="http://www.oracle.com">oracle</a></font> <br/> <font size="6"><a href="www.oracle.com">www.oracle.com</a></font><br/><br/> ]]> </RTECODE>
The next question is how do you get the CDATA sections wrapped around your HTML code? Your HTML does not need to be stored into the database with the CDATA sections already attached. Well, you can do it in your SQL query but you need to use a very specific way to achieve it:
select '<![CDATA' || '['|| RTECODE || ']' || ']>' as "RTECODE" from table x
You'll be thinking, hey, I can get rid of a set or two of those concatenation pipes, don't do it! Please use the example above and stick to it. If you do remove some of the || then the CDATA section will not be preserved correctly by the extraction engine.
Sample usage in rtf template (assume the data.xml is as above).
Supported html formats:
- Paragraph Font style ( bold, italic, plain, underline, subscript, superscript and strikes-through)
- Font size
- Font family
- Background color
- Foreground color
- Paragraph alignment (center, left, right and justify)
- Paragraph indent
- URL link
- Bullet List
- Number List
- Nested List (List with Indent)
- Some HTML tags/attributes which manually inserted like Table, Image, etc... in stead of HTML editor