Tuesday Jan 29, 2013

Conversions in WebCenter Content

One of the guiding principles with WebCenter Content has been to make it as easy as possible to consume content.  And part of that means viewing content in a format that is optimal for the end user… regardless of the format the content was created in.  So WebCenter Content has a long history of converting files from one format to another.  Often this involves converting a proprietary desktop publishing format to something more open that can be viewed directly from a browser.  Or taking a high resolution image and creating a rendition that download quickly over a slow network.

Conversion Decision TreeOver the life of the product, the types and methods for those conversions has grown to provide a broad range of options.  It’s sometimes confusing to know what conversion are available and where exactly they are done (Content Server or Inbound Refinery), so I've put together a flowchart and list describing all of the different types of conversion, how and where they are done, and the pros and cons of each.  This list covers what’s available as of the current release – WebCenter Content 11g PS5.

PDF Conversions

Where: Inbound Refinery
When: Upon check-in
How: Multiple ways
Platform: All (* but depends)

So PDF conversions are probably the most common type of conversion done with WCC.  This involves converting a desktop publishing format (e.g. Microsoft Word) into Adobe PDF format.  The benefits obviously include being able to read the document directly in the browser (with a PDF reader plug-in) and not requiring the 3rd party product to read the proprietary format. In addition, PDFs also provide additional benefits such as being able to start viewing the document before the entire file downloads, possible compression on file size, and the ability to provide watermarks and additional security on the file.  And optionally, PDF/A format can be chosen which is recognized as an approved archival format.

Within PDF conversions, there are several different methods that can be used to create the PDF, depending on the needs and requirements.

PDFExportConverter – This method uses Oracle’s own OutsideIn filters to directly convert multiple format types into PDF.  The benefits include multiple platform support (any platform that WCC supports), fastest conversion, and no 3rd party software requirements.  The main downside to this type of conversion is it has the lowest fidelity to the original document. Meaning it won’t always exactly match the look and feel of the original document.  These formats are supported by the OutsideIn filters for conversion to PDF.

WinNativeConverter – Like the name implies, this type of conversion uses the native applications on Windows to do the conversion.  By using the original application that was used to create the document, you will get the best fidelity of PDF compared to the original.  The downside is that the Inbound Refinery can only be run on Windows and not other platforms.  It also requires a distiller engine to convert the PostScript format that gets printed from the native applications to PDF.  The recommended choice for that is AFPL Ghostscript

OpenOfficeConversion – The Open Office conversion is a bit of a compromise between the two types of conversions mentioned above.  It uses Apache Open Office to open and convert the native file. In most cases, it will give you better fidelity of PDF then the PDFExportConverter, but still not as good as WinNativeConverter.  Also, it does support more than just Windows, so it has broader platform support then WinNativeConverter. 

Tiff Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses a 3rd party (CVISION PdfCompressor) engine to perform OCR and PDF conversion
Platform: Windows Only

When needing to convert TIFF formatted files into PDFs, this can be done with either PDFExportConverter or Tiff Converter.  The major difference is if optical character recognition (OCR) needs to be performed on the file in order to extract the full-text off the image.  If OCR is required, then Tiff Converter is used for that type of conversion.  In addition, a 3rd party tool, CVISION PdfCompressor, is required to do the actual OCR and conversion piece.  Tiff Converter acts as the controller between the Inbound Refinery and PdfCompressor.  But because PdfCompressor is a Windows-only application, the Inbound Refinery must also be on Windows. 

XML Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses Oracle OutsideIn filters to convert native formats into XML
Platform: All

The XML Converter allows for native documents to be converted into 2 flavors of XML: FlexionXML (based on FlexionDoc schema) and SearchML (based on the SearchML schema).  In addition, those formats can go through additional transformation with a custom XSLT.  Because the XML Converter utilizes the Oracle OutsideIn filter technology, it supports all platforms.

DAM Converter

Where: Inbound Refinery
When: Upon check-in and updates
How: Can use both Oracle OutsideIn filters as well as 3rd party applications to do image conversions.  Flip Factory is required for video conversions.
Platform: All (* but depends)

DAM Converter is used to create multiple renditions of either image or video files.  The primary goal is to convert original formats which can typically be high resolution and large in size into other formats that are geared towards web or print delivery.  One thing that is unique to DAM Converter is the metadata that is used to specify the rendition set can be updated after the item has been submitted which will send the file back to the Inbound Refinery to be reprocessed.

When using the image converter, the Inbound Refinery comes with the Oracle OutsideIn filters to create renditions, so nothing else is required and it can run on all platforms.  But the converter also supports other types of image converters which are command-line driven such as Adobe Photoshop, XnView NConvert, ImageMagick.  Some are commercial and some are freeware.  Each has different capabilities for different use-cases and are supported on various platforms.  But for general purpose re-sizing, resolution, and format changes, OutsideIn can handle it.

For video conversion, Telestream’s Flip Factory is required.  The DAM Converter acts as the controller between the Inbound Refinery and Flip Factory.  What makes this integration a bit unique is that it is handled purely at a file system level.  This means that Flip Factory, which is a Windows-only application, does not need to reside on the same server as the Inbound Refinery.  They simply need shared file system access between servers.  So the Inbound Refinery can be on Linux while Flip Factory is on Windows.  

HTML Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses Microsoft Office to convert Office documents into HTML
Platform: Windows Only

HTML Converter uses Microsoft Office to save the documents as HTML documents, collects the output (into a zip file if multiple files), and returns them to Content Server.  Using the HTML save output directly from Office, you get a very good fidelity of HTML compared to the original native format.  This is especially true for Excel and Visio which are less text-based.  The downside is you have no control over the HTML output to make any changes or provide consistency between conversions.  It’s simply formatted based on Office’s formatting.  Also, it does not apply any templating around the content to insert code before or after the content or present the document within the structure of a larger HTML page such as in the case of Site Studio.   

Dynamic Converter

Where: Content Server
When: Upon check-in or on-demand
How: Uses Oracle OutsideIn filters to convert native documents into HTML
Platform: All

Like HTML Converter, Dynamic Converter converts Office documents into HTML.  But there are several key differences between the two.  First is Dynamic Converter uses OutsideIn filters to convert to HTML so it supports a wide range of different native formats. Another difference is the processing happens on the Content Server side and not Inbound Refinery.  This allows the conversion to happen on-demand the first time the HTML version is requested.  Alternatively, DC can be configured to do the conversion upon check-in and cache the results so they are immediately available and don’t need to go through conversion on first request. DC also supports a wide range of controls over how the HTML is precisely formatted.  The result can be very minimal and clean HTML with various div or span tags to allow styling with CSS.  This can lead to a more consistent look and feel between converted documents.  In also allows for insertion of code before or after the content to embed the output within a template and is what is used within Site Studio.

Thumbnail Creation

Where: Content Server or Inbound Refinery
When: Upon check-in
How: Uses Oracle OutsideIn filters to create a thumbnail representation of the document to be used on search results
Platform: All

As a new feature in PS5, thumbnails can now be generated directly in the Content Server and not require the document to be sent to the Inbound Refinery (if it doesn’t need other conversions).  This allows the document to become available much more quickly.  But if the file is sent to the Inbound Refinery for other types of conversions, the thumbnail can be generated at that point.

For further information on conversions, see the documentation on Conversions as well as Dynamic Converter

Monday Dec 10, 2012

Expanding on requestaudit - Tracing who is doing what...and for how long

One of the most helpful tracing sections in WebCenter Content (and one that is on by default) is the requestaudit tracing.  This tracing section summarizes the top service requests happening in the server along with how they are performing.  By default, it has 2 different rotations.  One happens every 2 minutes (listing up to 5 services) and another happens every 60 minutes (listing up to 20 services).  These traces provide the total time for all the requests against that service along with the number of requests and its average request time.  This information can provide a good start in possibly troubleshooting performance issues or tracking a particular issue.  

>requestaudit/6 12.10 16:48:00.493 Audit Request Monitor !csMonitorTotalRequests,47,1,0.39009329676628113,0.21034042537212372,1
>requestaudit/6 12.10 16:48:00.509 Audit Request Monitor Request Audit Report over the last 120 Seconds for server wcc-base_4444****
requestaudit/6 12.10 16:48:00.509 Audit Request Monitor -Num Requests 47 Errors 1 Reqs/sec. 0.39009329676628113 Avg. Latency (secs) 0.21034042537212372 Max Thread Count 1
requestaudit/6 12.10 16:48:00.509 Audit Request Monitor 1 Service FLD_BROWSE Total Elapsed Time (secs) 3.5320000648498535 Num requests 10 Num errors 0 Avg. Latency (secs) 0.3531999886035919

requestaudit/6 12.10 16:48:00.509 Audit Request Monitor 2 Service GET_SEARCH_RESULTS Total Elapsed Time (secs) 2.694999933242798 Num requests 6 Num errors 0 Avg. Latency (secs) 0.4491666555404663
requestaudit/6 12.10 16:48:00.509 Audit Request Monitor 3 Service GET_DOC_PAGE Total Elapsed Time (secs) 1.8839999437332153 Num requests 5 Num errors 1 Avg. Latency (secs) 0.376800000667572
requestaudit/6 12.10 16:48:00.509 Audit Request Monitor 4 Service DOC_INFO Total Elapsed Time (secs) 0.4620000123977661 Num requests 3 Num errors 0 Avg. Latency (secs) 0.15399999916553497
requestaudit/6 12.10 16:48:00.509 Audit Request Monitor 5 Service GET_PERSONALIZED_JAVASCRIPT Total Elapsed Time (secs) 0.4099999964237213 Num requests 8 Num errors 0 Avg. Latency (secs) 0.051249999552965164
requestaudit/6 12.10 16:48:00.509 Audit Request Monitor ****End Audit Report*****

To change the default rotation or size of output, these can be set as configuration variables for the server:

RequestAuditIntervalSeconds1 – Used for the shorter of the two summary intervals (default is 120 seconds)
RequestAuditIntervalSeconds2 – Used for the longer of the two summary intervals (default is 3600 seconds)
RequestAuditListDepth1 – Number of services listed for the first request audit summary interval (default is 5)
RequestAuditListDepth2 – Number of services listed for the second request audit summary interval (default is 20)

If you want to get more granular, you can enable 'Full Verbose Tracing' from the System Audit Information page and now you will get an audit entry for each and every service request. 

>requestaudit/6 12.10 16:58:35.431 IdcServer-68 GET_USER_INFO [dUser=bob][StatusMessage=You are logged in as 'bob'.] 0.08765099942684174(secs)

What's nice is it reports who executed the service and how long that particular request took.  In some cases, depending on the service, additional information will be added to the tracing relevant to that  service.

>requestaudit/6 12.10 17:00:44.727 IdcServer-81 GET_SEARCH_RESULTS [dUser=bob][QueryText=%28+dDocType+%3cmatches%3e+%60Document%60+%29][StatusCode=0][StatusMessage=Success] 0.4696030020713806(secs)

You can even go into more detail and insert any additional data into the tracing.  You simply need to add this configuration variable with a comma separated list of variables from local data to insert.

RequestAuditAdditionalVerboseFieldsList=TotalRows,path

In this case, for any search results, the number of items the user found is traced:

>requestaudit/6 12.10 17:15:28.665 IdcServer-36 GET_SEARCH_RESULTS [TotalRows=224][dUser=bob][QueryText=%28+dDocType+%3cmatches%3e+%60Application%60+%29][Sta...

I also recently ran into the case where services were being called from a client through RIDC.  All of the services were being executed as the same user, but they wanted to correlate the requests coming from the client to the ones being executed on the server.  So what we did was add a new field to the request audit list:

RequestAuditAdditionalVerboseFieldsList=ClientToken

And then in the RIDC client, ClientToken was added to the binder along with a unique value that could be traced for that request.  Now they had a way of tracing on both ends and identifying exactly which client request resulted in which request on the server.

Tuesday Jun 14, 2011

Updating metadata in a custom element in Site Studio

I was recently asked if it was possible to edit metadata on a Site Studio data file from an element instead of having to go to the Metadata tab specifically.  Users were finding it disorienting to move back and forth between the Elements and Metadata tabs when updating both content and metadata.

So I put together an example that places a metadata update form within a custom element. It uses a couple of AJAX calls to the server to get the current metadata values (DOC_INFO) and to update them (UPDATE_DOCINFO). The sample includes 3 fields; Title, Type, and Comments. 

It can be easily changed to add additional fields.  Or you could make the entire custom element to be one particular metadata field.  Much of the code in the sample has comments, so I won't go into detail here.  

One function I did find particularly useful was the ability to pull in the dID and dDocName of the data file into the custom element form.  The functions ElementAPI.GetFormProperty('dID') and ElementAPI.GetFormProperty('dDocName') were used in order to run the services against the content to pull and push the metadata information.

The sample can be downloaded here (right-click and save). Simply submit the file with the Web Site Object Type of 'Custom Element Form'.  Then create a new Custom Element that points to the form. I've tried it in both 10gR4 and 11g.   

Hopefully some other folks will find this helpful.

About

Kyle Hatlestad is a Solution Architect in the WebCenter Architecture group (A-Team) who works with WebCenter Content and other products in the WebCenter & Fusion Middleware portfolios. The WebCenter A-Team blog can be found at: https://blogs.oracle.com/ ateam_webcenter/

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today