Tuesday Jan 29, 2013

Conversions in WebCenter Content

One of the guiding principles with WebCenter Content has been to make it as easy as possible to consume content.  And part of that means viewing content in a format that is optimal for the end user… regardless of the format the content was created in.  So WebCenter Content has a long history of converting files from one format to another.  Often this involves converting a proprietary desktop publishing format to something more open that can be viewed directly from a browser.  Or taking a high resolution image and creating a rendition that download quickly over a slow network.

Conversion Decision TreeOver the life of the product, the types and methods for those conversions has grown to provide a broad range of options.  It’s sometimes confusing to know what conversion are available and where exactly they are done (Content Server or Inbound Refinery), so I've put together a flowchart and list describing all of the different types of conversion, how and where they are done, and the pros and cons of each.  This list covers what’s available as of the current release – WebCenter Content 11g PS5.

PDF Conversions

Where: Inbound Refinery
When: Upon check-in
How: Multiple ways
Platform: All (* but depends)

So PDF conversions are probably the most common type of conversion done with WCC.  This involves converting a desktop publishing format (e.g. Microsoft Word) into Adobe PDF format.  The benefits obviously include being able to read the document directly in the browser (with a PDF reader plug-in) and not requiring the 3rd party product to read the proprietary format. In addition, PDFs also provide additional benefits such as being able to start viewing the document before the entire file downloads, possible compression on file size, and the ability to provide watermarks and additional security on the file.  And optionally, PDF/A format can be chosen which is recognized as an approved archival format.

Within PDF conversions, there are several different methods that can be used to create the PDF, depending on the needs and requirements.

PDFExportConverter – This method uses Oracle’s own OutsideIn filters to directly convert multiple format types into PDF.  The benefits include multiple platform support (any platform that WCC supports), fastest conversion, and no 3rd party software requirements.  The main downside to this type of conversion is it has the lowest fidelity to the original document. Meaning it won’t always exactly match the look and feel of the original document.  These formats are supported by the OutsideIn filters for conversion to PDF.

WinNativeConverter – Like the name implies, this type of conversion uses the native applications on Windows to do the conversion.  By using the original application that was used to create the document, you will get the best fidelity of PDF compared to the original.  The downside is that the Inbound Refinery can only be run on Windows and not other platforms.  It also requires a distiller engine to convert the PostScript format that gets printed from the native applications to PDF.  The recommended choice for that is AFPL Ghostscript

OpenOfficeConversion – The Open Office conversion is a bit of a compromise between the two types of conversions mentioned above.  It uses Apache Open Office to open and convert the native file. In most cases, it will give you better fidelity of PDF then the PDFExportConverter, but still not as good as WinNativeConverter.  Also, it does support more than just Windows, so it has broader platform support then WinNativeConverter. 

Tiff Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses a 3rd party (CVISION PdfCompressor) engine to perform OCR and PDF conversion
Platform: Windows Only

When needing to convert TIFF formatted files into PDFs, this can be done with either PDFExportConverter or Tiff Converter.  The major difference is if optical character recognition (OCR) needs to be performed on the file in order to extract the full-text off the image.  If OCR is required, then Tiff Converter is used for that type of conversion.  In addition, a 3rd party tool, CVISION PdfCompressor, is required to do the actual OCR and conversion piece.  Tiff Converter acts as the controller between the Inbound Refinery and PdfCompressor.  But because PdfCompressor is a Windows-only application, the Inbound Refinery must also be on Windows. 

XML Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses Oracle OutsideIn filters to convert native formats into XML
Platform: All

The XML Converter allows for native documents to be converted into 2 flavors of XML: FlexionXML (based on FlexionDoc schema) and SearchML (based on the SearchML schema).  In addition, those formats can go through additional transformation with a custom XSLT.  Because the XML Converter utilizes the Oracle OutsideIn filter technology, it supports all platforms.

DAM Converter

Where: Inbound Refinery
When: Upon check-in and updates
How: Can use both Oracle OutsideIn filters as well as 3rd party applications to do image conversions.  Flip Factory is required for video conversions.
Platform: All (* but depends)

DAM Converter is used to create multiple renditions of either image or video files.  The primary goal is to convert original formats which can typically be high resolution and large in size into other formats that are geared towards web or print delivery.  One thing that is unique to DAM Converter is the metadata that is used to specify the rendition set can be updated after the item has been submitted which will send the file back to the Inbound Refinery to be reprocessed.

When using the image converter, the Inbound Refinery comes with the Oracle OutsideIn filters to create renditions, so nothing else is required and it can run on all platforms.  But the converter also supports other types of image converters which are command-line driven such as Adobe Photoshop, XnView NConvert, ImageMagick.  Some are commercial and some are freeware.  Each has different capabilities for different use-cases and are supported on various platforms.  But for general purpose re-sizing, resolution, and format changes, OutsideIn can handle it.

For video conversion, Telestream’s Flip Factory is required.  The DAM Converter acts as the controller between the Inbound Refinery and Flip Factory.  What makes this integration a bit unique is that it is handled purely at a file system level.  This means that Flip Factory, which is a Windows-only application, does not need to reside on the same server as the Inbound Refinery.  They simply need shared file system access between servers.  So the Inbound Refinery can be on Linux while Flip Factory is on Windows.  

HTML Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses Microsoft Office to convert Office documents into HTML
Platform: Windows Only

HTML Converter uses Microsoft Office to save the documents as HTML documents, collects the output (into a zip file if multiple files), and returns them to Content Server.  Using the HTML save output directly from Office, you get a very good fidelity of HTML compared to the original native format.  This is especially true for Excel and Visio which are less text-based.  The downside is you have no control over the HTML output to make any changes or provide consistency between conversions.  It’s simply formatted based on Office’s formatting.  Also, it does not apply any templating around the content to insert code before or after the content or present the document within the structure of a larger HTML page such as in the case of Site Studio.   

Dynamic Converter

Where: Content Server
When: Upon check-in or on-demand
How: Uses Oracle OutsideIn filters to convert native documents into HTML
Platform: All

Like HTML Converter, Dynamic Converter converts Office documents into HTML.  But there are several key differences between the two.  First is Dynamic Converter uses OutsideIn filters to convert to HTML so it supports a wide range of different native formats. Another difference is the processing happens on the Content Server side and not Inbound Refinery.  This allows the conversion to happen on-demand the first time the HTML version is requested.  Alternatively, DC can be configured to do the conversion upon check-in and cache the results so they are immediately available and don’t need to go through conversion on first request. DC also supports a wide range of controls over how the HTML is precisely formatted.  The result can be very minimal and clean HTML with various div or span tags to allow styling with CSS.  This can lead to a more consistent look and feel between converted documents.  In also allows for insertion of code before or after the content to embed the output within a template and is what is used within Site Studio.

Thumbnail Creation

Where: Content Server or Inbound Refinery
When: Upon check-in
How: Uses Oracle OutsideIn filters to create a thumbnail representation of the document to be used on search results
Platform: All

As a new feature in PS5, thumbnails can now be generated directly in the Content Server and not require the document to be sent to the Inbound Refinery (if it doesn’t need other conversions).  This allows the document to become available much more quickly.  But if the file is sent to the Inbound Refinery for other types of conversions, the thumbnail can be generated at that point.

For further information on conversions, see the documentation on Conversions as well as Dynamic Converter

Monday Jan 14, 2013

Migrating folders and content together in WebCenter Content

In the case of migrating from one WebCenter Content instance to another, there are several different tools within the system to accomplish that migration depending on what you need to move over.

This post will focus on the use case of needing to move a specific set of folders and their contents from one instance to another.  And the folder architecture in this example is Folders_g. Although Framework Folders is the recommended folders component for WebCenter Content 11g PS5 and later, there are still cases where you must still use Folders_g (e.g. WebCenter Portal, Fusion Applications, Primavera, etc).  Or perhaps you are at an older version and Folders_g is the only option.

To prepare, you must first have the FoldersStructureArchive component enabled on both the source and target instances.  If you are on UCM 10g, this component will be available within the CS10gR35UpdateBundle/extras folder.  In addition to enabling the component, there is a configuration flag to set.  By default, the config variable ArchiveFolderStructureOnly is set to false which means content will be exported along with the folders, so that can be left alone.  The config variable AllowArchiveNoneFolderItem is set to true by default which means it will export content both in the folder structure as well as those not selected...or even outside of folders.  Basically, it means you must use the Export Criteria in the archive to control the content to export. In our use case, we only want the content within the folders we select, so the configuration should be set as AllowArchiveNoneFolderItem=false.  Now only content that is in our selected folders will get exported into the archive. This can be set in the General Configuration in the Admin Server.

You will also need to make sure the custom metadata fields on both instances is identical. If they are mismatched, the folders will not import into the target instance correctly. You can use the Configuration Migration Utility to migrate those metadata fields.

Once the component is enabled and configurations set, go to Administration -> Admin Applets -> Archiver and select Edit -> Add... to create a new archive.  

New archive

Now that the archive is established, go back to the browser and go to Administration -> Folder Archiver Configuration.  For the Collection Name, it will default to the local collection.  Change this if your archive is in a different collection.  Then select your Archive Name from the list.

archive select

Expand the folder hierarchy and you can now select the specific folder(s) you want to migrate.  The thing to keep in mind are the parent folders to the ones you are selecting.  If the idea is you want to migrate a certain section of the folder hierarchy to the other server and you want it to be in the same place in the target instance, you want to make sure that the parent folder already exists in the target.  It is possible to migrate a folder and place it within a different parent folder in the target instance, but then you need to make sure you set the import maps correctly to specify the destination folder (more on that later).

Select folders

Once they are selected, click the Add button to save the configuration.  This will add the right criteria to the archive. Now go back to the Archiver applet.  Highlight the archive and select Actions -> Export.  Be sure 'Export Tables' is selected.  Note: If you try using the Preview on either the contents or the Table data, both will show everything and not just what you selected.  This is normal. The filtering of content and folders is not reflected in the Preview. Once completed, you can click on the View Batch Files... button to verify the results.  You should see an entry for the Collections_arTables and one or more for the content items.  

View batches

If you highlight the Collections row and click Edit, you can view and verify the results.

Verify collections table

You can do the same for the document entries as well.

Once you have the archive exported, you need to transfer it from the source to the target instance. If I don't have the outgoing providers set up to do the transfer, I sometimes cheat and copy over the archive folder from <cs instance dir>\archives\{archive name} directly over to the other instance.  Then I manually modify the collection.hda file on the target to let it know about the archive:

@ResultSet Archives
2
aArchiveName
aArchiveDescription
exportfoldersandfiles
Export some folders and files

@end

Or if I have Site Studio installed and my archive is fairly small, I'll take the approach described in this earlier post.

Before you import the archive on the target, you need to make sure the folders will be going into the right "parent" folder. If you've already migrated the parent folder to your folders to the target instance, then the IDs should match between instances and you should not have to do any import mappings. But if you are migrating the folders and the parent IDs will be different on the target (such as the main Contribution Folders or WebCenter Spaces root folder), then you will have to map those values.

First, to check what the folder's ID is, you can simply place your mouse over the link to the particular folder to get it's ID.  It will be identified as dCollectionID in the URL.  Do this on both the source and target instances.

Get dCollectionID

In this example, the dCollectionID on the source instance for the parent folder (Contribution Folders) is 826127598928000002.  On the target instance, its Contribution Folders ID is 838257920156000002.  So that means when the top level 'Product Management' folder in our archive moves over, the ID that specifies the ParentID needs to be mapped to the new value. So now we have all the information we need for the mapping.

Go to the Archiver on the target instance and highlight the archive.  Click on the Import Maps tab and then on the Table tab.  Double-click on the folder and then expand they date entry.  It should then show the Collections table.

Import tables

Click on the Edit button for the Value Maps. For the Input Value, you want to enter the value of the dCollectionID of the parent folder from the source instance. In our example, this is 826127598928000002. For the Field, you want to change this to be the dParentCollectionID. And for the Output Value, you want this to be the dCollectionID of the parent folder in the target instance.  In our example, this is 838257920156000002.  Click the Add button.  

Value map

This will now map the folders into the correct location on target.

The archive is now ready to be imported.  Click on Actions -> Import and be sure the 'Import Tables' check-box is checked. To check for any issues, be sure to go to the logs at Administration -> Log Files -> Archiver Logs.

And that's it.  Your folders and files should now be migrated over.

Thursday Jan 10, 2013

Adding browser search engines in WebCenter Content

In a post I made a few years ago, I described how you can add WebCenter Content (UCM at the time) search to the browser's search engines.  I think this is a handy shortcut if you find yourself performing searches often enough in WCC. 

Well, in the PS5 release, this was actually included as a new feature.  You need to enable the DesktopIntegrationSuite component in order to access it.  Once you do, go to the My Content Server -> My Downloads link.  There you will see the 'Add browser search' link. 

Add Browser Search

Once clicked, an OpenSearchDescription XML file is produced which each modern browser supports for adding in the search engine. 

Browser Search Bar

The one piece that's missing is something I mentioned in my earlier post: forcing authentication.  If you haven't logged into the server, your search will be performed anonymously and you will only get back content that is available to the guest role.  To make sure the search is performed as your user, the extra parameter Auth=Internet can be passed to the server to cause the server to challenge your request and force a login if needed.  Because the definition of the search engine URL is defined within the DesktopIntegrationSuite component, a new custom component can be added to override this.  Basically, the new component must override the dis_search_plugin resource and modify the Url locations.  Below is an example:

<@dynamichtml dis_search_plugin@>
<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/"
                       xmlns:moz="http://www.mozilla.org/2006/browser/search/">
    <ShortName><$if DIS_SearchPluginTitle$><$DIS_SearchPluginTitle$><$else$>Oracle WebCenter Content Server Search<$endif$></ShortName>
    <Description><$lc("wwDISSearchPluginDescription")$></Description>
    <Url type="text/html" method="get" template="<$xml(HttpBrowserFullCgiPath & "?IdcService=DESKTOP_BROWSER_SEARCH&Auth=Internet&MiniSearchText={searchTerms}")$>" />
    <$iconlocation=strReplace(HttpBrowserFullCgiPath,HttpCgiPath,"") & HttpImagesRoot & "desktopintegrationsuite/dis_search_plugin.ico"$>
    <Image height="16" width="16" type="image/x-icon"><$iconlocation$></Image>
    <Developer>Oracle Corporation</Developer>
    <InputEncoding>UTF-8</InputEncoding>
    <moz:SearchForm><$xml(HttpBrowserFullCgiPath & "?IdcService=DESKTOP_BROWSER_SEARCH&Auth=Internet&MiniSearchText=")$></moz:SearchForm>
</OpenSearchDescription>
<$setContentType("application/xml")$>
<$setHttpHeader("Content-Disposition","inline; filename=search_plugin.xml")$>
<$setHttpHeader("Cache-Control", "public")$>
<@end@>

I've included a pre-built custom component that does just that.

UPDATE (Jan 15, 2013)

In addition to enabling the component, there is also a configuration preference that must be enabled.   After enabling the Desktop Integration Suite component,  go to the 'advanced component manager'.  Go to the bottom to the 'Update Component Configuration' list and select DesktopIntegrationSuite and click Update.  The first entry is to 'Enable web browser search plug-in'.  Check that and click Update.

DIS Configuration

If you've already restarted to enable the DIS component, you do not need to restart for this configuration to take effect.

About

Kyle Hatlestad is a Solution Architect in the WebCenter Architecture group (A-Team) who works with WebCenter Content and other products in the WebCenter & Fusion Middleware portfolios. The WebCenter A-Team blog can be found at: https://blogs.oracle.com/ ateam_webcenter/

Search

Archives
« January 2013 »
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
11
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
30
31
  
       
Today