Tuesday Jan 29, 2013

Conversions in WebCenter Content

One of the guiding principles with WebCenter Content has been to make it as easy as possible to consume content.  And part of that means viewing content in a format that is optimal for the end user… regardless of the format the content was created in.  So WebCenter Content has a long history of converting files from one format to another.  Often this involves converting a proprietary desktop publishing format to something more open that can be viewed directly from a browser.  Or taking a high resolution image and creating a rendition that download quickly over a slow network.

Conversion Decision TreeOver the life of the product, the types and methods for those conversions has grown to provide a broad range of options.  It’s sometimes confusing to know what conversion are available and where exactly they are done (Content Server or Inbound Refinery), so I've put together a flowchart and list describing all of the different types of conversion, how and where they are done, and the pros and cons of each.  This list covers what’s available as of the current release – WebCenter Content 11g PS5.

PDF Conversions

Where: Inbound Refinery
When: Upon check-in
How: Multiple ways
Platform: All (* but depends)

So PDF conversions are probably the most common type of conversion done with WCC.  This involves converting a desktop publishing format (e.g. Microsoft Word) into Adobe PDF format.  The benefits obviously include being able to read the document directly in the browser (with a PDF reader plug-in) and not requiring the 3rd party product to read the proprietary format. In addition, PDFs also provide additional benefits such as being able to start viewing the document before the entire file downloads, possible compression on file size, and the ability to provide watermarks and additional security on the file.  And optionally, PDF/A format can be chosen which is recognized as an approved archival format.

Within PDF conversions, there are several different methods that can be used to create the PDF, depending on the needs and requirements.

PDFExportConverter – This method uses Oracle’s own OutsideIn filters to directly convert multiple format types into PDF.  The benefits include multiple platform support (any platform that WCC supports), fastest conversion, and no 3rd party software requirements.  The main downside to this type of conversion is it has the lowest fidelity to the original document. Meaning it won’t always exactly match the look and feel of the original document.  These formats are supported by the OutsideIn filters for conversion to PDF.

WinNativeConverter – Like the name implies, this type of conversion uses the native applications on Windows to do the conversion.  By using the original application that was used to create the document, you will get the best fidelity of PDF compared to the original.  The downside is that the Inbound Refinery can only be run on Windows and not other platforms.  It also requires a distiller engine to convert the PostScript format that gets printed from the native applications to PDF.  The recommended choice for that is AFPL Ghostscript

OpenOfficeConversion – The Open Office conversion is a bit of a compromise between the two types of conversions mentioned above.  It uses Apache Open Office to open and convert the native file. In most cases, it will give you better fidelity of PDF then the PDFExportConverter, but still not as good as WinNativeConverter.  Also, it does support more than just Windows, so it has broader platform support then WinNativeConverter. 

Tiff Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses a 3rd party (CVISION PdfCompressor) engine to perform OCR and PDF conversion
Platform: Windows Only

When needing to convert TIFF formatted files into PDFs, this can be done with either PDFExportConverter or Tiff Converter.  The major difference is if optical character recognition (OCR) needs to be performed on the file in order to extract the full-text off the image.  If OCR is required, then Tiff Converter is used for that type of conversion.  In addition, a 3rd party tool, CVISION PdfCompressor, is required to do the actual OCR and conversion piece.  Tiff Converter acts as the controller between the Inbound Refinery and PdfCompressor.  But because PdfCompressor is a Windows-only application, the Inbound Refinery must also be on Windows. 

XML Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses Oracle OutsideIn filters to convert native formats into XML
Platform: All

The XML Converter allows for native documents to be converted into 2 flavors of XML: FlexionXML (based on FlexionDoc schema) and SearchML (based on the SearchML schema).  In addition, those formats can go through additional transformation with a custom XSLT.  Because the XML Converter utilizes the Oracle OutsideIn filter technology, it supports all platforms.

DAM Converter

Where: Inbound Refinery
When: Upon check-in and updates
How: Can use both Oracle OutsideIn filters as well as 3rd party applications to do image conversions.  Flip Factory is required for video conversions.
Platform: All (* but depends)

DAM Converter is used to create multiple renditions of either image or video files.  The primary goal is to convert original formats which can typically be high resolution and large in size into other formats that are geared towards web or print delivery.  One thing that is unique to DAM Converter is the metadata that is used to specify the rendition set can be updated after the item has been submitted which will send the file back to the Inbound Refinery to be reprocessed.

When using the image converter, the Inbound Refinery comes with the Oracle OutsideIn filters to create renditions, so nothing else is required and it can run on all platforms.  But the converter also supports other types of image converters which are command-line driven such as Adobe Photoshop, XnView NConvert, ImageMagick.  Some are commercial and some are freeware.  Each has different capabilities for different use-cases and are supported on various platforms.  But for general purpose re-sizing, resolution, and format changes, OutsideIn can handle it.

For video conversion, Telestream’s Flip Factory is required.  The DAM Converter acts as the controller between the Inbound Refinery and Flip Factory.  What makes this integration a bit unique is that it is handled purely at a file system level.  This means that Flip Factory, which is a Windows-only application, does not need to reside on the same server as the Inbound Refinery.  They simply need shared file system access between servers.  So the Inbound Refinery can be on Linux while Flip Factory is on Windows.  

HTML Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses Microsoft Office to convert Office documents into HTML
Platform: Windows Only

HTML Converter uses Microsoft Office to save the documents as HTML documents, collects the output (into a zip file if multiple files), and returns them to Content Server.  Using the HTML save output directly from Office, you get a very good fidelity of HTML compared to the original native format.  This is especially true for Excel and Visio which are less text-based.  The downside is you have no control over the HTML output to make any changes or provide consistency between conversions.  It’s simply formatted based on Office’s formatting.  Also, it does not apply any templating around the content to insert code before or after the content or present the document within the structure of a larger HTML page such as in the case of Site Studio.   

Dynamic Converter

Where: Content Server
When: Upon check-in or on-demand
How: Uses Oracle OutsideIn filters to convert native documents into HTML
Platform: All

Like HTML Converter, Dynamic Converter converts Office documents into HTML.  But there are several key differences between the two.  First is Dynamic Converter uses OutsideIn filters to convert to HTML so it supports a wide range of different native formats. Another difference is the processing happens on the Content Server side and not Inbound Refinery.  This allows the conversion to happen on-demand the first time the HTML version is requested.  Alternatively, DC can be configured to do the conversion upon check-in and cache the results so they are immediately available and don’t need to go through conversion on first request. DC also supports a wide range of controls over how the HTML is precisely formatted.  The result can be very minimal and clean HTML with various div or span tags to allow styling with CSS.  This can lead to a more consistent look and feel between converted documents.  In also allows for insertion of code before or after the content to embed the output within a template and is what is used within Site Studio.

Thumbnail Creation

Where: Content Server or Inbound Refinery
When: Upon check-in
How: Uses Oracle OutsideIn filters to create a thumbnail representation of the document to be used on search results
Platform: All

As a new feature in PS5, thumbnails can now be generated directly in the Content Server and not require the document to be sent to the Inbound Refinery (if it doesn’t need other conversions).  This allows the document to become available much more quickly.  But if the file is sent to the Inbound Refinery for other types of conversions, the thumbnail can be generated at that point.

For further information on conversions, see the documentation on Conversions as well as Dynamic Converter

Friday Oct 14, 2011

jQuery DataTables using Excel spreadsheets and Dynamic Converter

On a recent project I worked on, we needed to display a calendar on a site with a list of different events.  From the content owner's perspective, authoring and maintaining this calendar in Microsoft Excel was ideal.  So using Dynamic Converter to convert that to HTML fit the bill.  But they wanted the calendar to be more interactive and dynamic then just a static table. Features such as sorting, searching, pagination and such.  So that's where the DataTables jQuery plug-in makes a perfect solution.  <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js" type="text/freezescript" charset="utf-8"> </script> <script type="text/freezescript" language="freezescript" src="http://datatables.net/release-datatables/media/js/jquery.dataTables.js"></script> <script src="https://ajax.googleapis.com/ajax/libs/jqueryui/1.8.16/jquery-ui.min.js" type="text/freezescript" charset="utf-8"> </script> <script charset="utf-8"> $(document).ready(function() { $("table:contains('TEAM')").attr("id","TeamTable"); $('#TeamTable').prepend($('').append($('#TeamTable tr:first').remove())); $('#TeamTable').attr("class","display"); oTable = $('#TeamTable').dataTable({ 'bJQueryUI': true, 'sPaginationType': 'full_numbers' }) $('#ChangeDivision').appendTo($('#TeamTable_length')); }); function fnFilterType( area ) { oTable.fnFilter( area, 1 ); }</script>

While the default conversion of the Excel document to a HTML table was close, it still needed a bit of manipulation of the table format to fit what DataTables was looking for.  Luckily, jQuery makes that pretty easy to do as well.  

The following are the steps I took to create this conversion.

  1. The first step is to create your Excel document to work from and to check it in.  The first row should be your column headings and the rows below be your data.

  2. Open Internet Explorer and create a new Dynamic Converter  template through Administration -> Dynamic Converter Admin -> Create New Template.  In 11g, for the Template Format, select 'Classic HTML Conversion Template'.  In 10g it should be set as 'GUI Template'
  3. Edit the new template. Be sure to select Classic HTML Conversion Template as the Template Type.

    Note: If you are running Internet Explorer (IE) 8 or newer, you may encounter the error, "Internet Explorer has closed this webpage to help protect your computer.  A malfunctioning or malicious add-on has caused Internet Explorer to close this webpage."   To avoid this error, go to Tools -> Internet Options -> Advanced and uncheck 'Enable memory protection to help mitigate online attacks' near the bottom.  Restart IE and you should be able to bring up the template editor.

  4. Change the preview to point to the document submitted in step 1.
  5. First we'll remove the heading identifying the sheet from Excel.  Click on Element Setup and go to the Styles tab. 
  6. Click New and enter a Name of 'Heading 1'.  For the Associated element, click New and enter a Name of 'Heading 1'.  Click OK and OK.

  7. Go to the Elements tab and double-click on the Heading 1 check mark in the In Body column to change it to a red X.  Click OK.  The sheet heading should now disappear in the preview

  8. Next we'll want to remove all of the formatting to the text.  Click the Formatting button.  Highlight 'Default Paragraph' and for the Font name, Font color, and Font size, choose 'Don't specify'.

  9. Click on the Paragraph tab and for Alignment, choose 'Don't specify'.
  10. Click on the Tables tab and click the Borders and Sizing button.  For Table width and Cell width, choose 'Don't specify'. 

  11. Check the 'Use column headings' box in the Heading section.  Click OK for the Formatting dialog.

  12. Next we need to insert the JavaScript needed to reformat our table into a DataTable. Click on the Globals button and click on the Head tab.
  13. Check the box for 'Include HTML or scripting code in the Head' and insert the code:

    <!-- jQuery-1.4.4.min.js -->
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js" type="text/javascript" charset="utf-8">

    <!-- jQuery DataTables -->
    <script type="text/javascript" language="javascript" src="http://datatables.net/release-datatables/media/js/jquery.dataTables.js"></script>

    <!-- Link to the jQuery Demotables Stylesheet      -->
    <link href="http://datatables.net/release-datatables/media/css/demo_table_jui.css" type="text/css" rel="stylesheet" />

    <script type="text/javascript" charset="utf-8">
        $(document).ready(function() {
            $('#TeamTable').prepend($('<thead></thead>').append($('#TeamTable tr:first').remove()));   

    Let's take a look at this code. 

    The first script tag is used to load the jQuery JavaScript libary.  Here we're loading it from Google's hosted APIs.  The next script tag is used to load the DataTables plug-in.  And the next link tag is loading a sample stylesheet to be used with the DataTables plug-in. In this example, I'm calling out to the hosted files.  You may want to download, check-in, and reference them locally to ensure they are always available.

    Inside the next script tag, the script waits until the page finished loading and begins it's function.  The first line in the function inserts the ID attribute onto the table with a value of 'TeamTable' so that we can easily reference it in the following actions.  In order to identify the table, it looks for the text 'TEAM'.  Adjust this appropriately for the text in your table.

    The next line inserts the <thead> </thead> tags around the heading row in the table.  There is no way to configure Dynamic Converter to insert this, so jQuery helps us do it after the fact.

    The third line applies the class 'display' to the table to utilize the DataTables stylesheet to help format the table.  Again, there isn't a way to insert this class with Dynamic Converter, so jQuery can do it for us.

    And finally, it runs the function to perform the DataTables function to transform the table.  It's using its basic 'zero configuration' settings without any options applied.

  14. Click OK to save the template.  Now use the Template Section Rules to target the appropriate spreadsheets with the new template.

Now when you view the HTML conversion of the spreadsheet, you should see it as a DataTable.  You can do things like sort columns, search, and have pagination.

But now that we have it as a DataTable, we can use the different options it offers to give it a different look and experience.

We can first add an additional JavaScript library and stylesheet from the jQuery UI project.  Edit the template again and modify the code being added to the Head section.

<!-- jquery-ui-1.8.6.custom.min.js -->
<script src='https://ajax.googleapis.com/ajax/libs/jqueryui/1.8.16/jquery-ui.min.js' type='text/javascript' charset='utf-8'>

<!-- jQuery smoothness -->
<link href='http://ajax.googleapis.com/ajax/libs/jqueryui/1.7.2/themes/smoothness/jquery-ui.css' type='text/css' rel='stylesheet' />

Then we can add some additional options to the DataTable:

oTable = $('#TeamTable').dataTable({
                'bJQueryUI': true,
                'sPaginationType': 'full_numbers'

So the bjQueryUI will use the UI library we included above.  And the pagination will show page numbers instead of just arrows.

Then we'll add an option list to do filtering on the table.  Add this line within the $(document).ready(function():


Then add an additional function to call when the option list changes:

 function fnFilterType( area )
        oTable.fnFilter( area, 1 );

Finally, we'll add the HTML option list to the page.  Click on the HTML tab and add this code in the 'Include HTML or scripting code before the content'. 

<span id="ChangeDivision"><br />
<span class="style6">Show</span> <select onchange="fnFilterType (value)" name="Division">
<option value="" selected="selected">All types</option>
<option value="NFC">only NFC</option>
<option value="AFC">only AFC</option>

When you make these additional additions to the editor, it will complain about a runtime error on the page.  This only occurs in the preview window and can be ignored. 

Now we have our updated DataTable:

You can download the completed GUI template for 11g here. The 10g version is here.  If using 11g, be sure to submit it as a "Classic  HTML Conversion Template" and as a "GUI Template" in 10g.  

Special thanks to Paul Thaden for the code on this example!

Monday Aug 30, 2010

PDF Conversion on UNIX without OpenOffice

[Read More]

Kyle Hatlestad is a Solution Architect in the WebCenter Architecture group (A-Team) who works with WebCenter Content and other products in the WebCenter & Fusion Middleware portfolios. The WebCenter A-Team blog can be found at: https://blogs.oracle.com/ ateam_webcenter/


« July 2016