and archiving

At the ODF Workshop last week, a number of the delegates were asking about the right way to handle archiving of their documents. Obviously ODF offers a baseline file format that promises long-term readability and editability, but the question remains of how best to handle files. With the release of 3.0, there are now two alternatives, and we heard at the conference of a third alternative coming in the future from ODF.

  1. ODF plus PDF

    Most of the archivists I have spoken to have insisted that one should always keep the original document in its original format, regardless of other choices. The easiest option for archiving is to retain the original file, with an optional copy filtered to ODF if the original is not in ODF, and then accompany the file with a PDF image. Technology exists to automatically create all this.
  2. PDF Container includes extensive new PDF handling features, including PDF/A support, access to PDF's distribution and use controls and the ability to include the original ODF in a "container" inside a "hybrid PDF". This last feature offers a fine archiving alternative, where a single file is created but within it the original ODF is retained for future use.
  3. Read-Only ODF

    At the workshop, we heard from Jomar Silva on the future of ODF 1.2. One of the features he described was signed, read-only ODF, allowing the preservation of the document exactly as used (it's on slide 4).

Choosing which to use is obviously a decision for each archiving authority, but the richness of the new PDF support means that the options open to arhcivists just grew enormously.


Or you could just use HTML.

Posted by kai Hendry on October 14, 2008 at 12:49 AM PDT #

@kai: As long as you want neither layout fidelity nor formatting metadata preserved that's fine...

Posted by Simon Phipps on October 14, 2008 at 01:12 AM PDT #

Don’t use image PDFs, which are no more useful than GIFs. You want tagged PDFs. Technically all PDF/As have to be tagged, but that doesn’t mean they’ll be tagged correctly.

Posted by Joe Clark on October 14, 2008 at 02:09 AM PDT #

I am curious about how fonts are to be preserved in an ODF document? It's my understanding that fonts are embedded in PDF documents and that OpenOffice developers have decided to not implement this because of real or imagined legal issues.

It seems to me that it would be important to preserve the original fonts used to create a document when the document is archived otherwise document layout etc will likely be a mess toi anyone retrieving the document.

Also I think there are cases where it is desirable to have the ability to modify archived documents.

If any of these issues were discussed please share the details with us.


Posted by RickJ on October 14, 2008 at 04:15 AM PDT #

History teaches us that, besides the fact that nobody learns anything from history, that the best archival format is plain ascii text.

Posted by icebox on October 14, 2008 at 09:42 PM PDT #

ASCII is all very cool for English language, near-unformatted documents which don't contain non-text objects. Unfortunately the world is not the English speaking world and people find graphics useful when trying to assimilate information.

Posted by Reece Hutchinson on October 14, 2008 at 10:29 PM PDT #

History teaches that anything saved in pure US-ASCII has no structure at all.

Posted by Joe Clark on October 16, 2008 at 05:18 AM PDT #

Are you sure Export as PDF for OpenOffice 3.0, which generates the PDF Options panel, this panel contains a check mark box labeled "Create hybrid file" per the link to the screen capture you have included in this blog post's page above? I have OpenOffice 3.0 for both Mac OS X Intel (the new aqua version) and I just spent time downloading OpenOffice 3.0 for Windows XP (because I was eager to try out the hybrid file option because of your blog post herein). Both the XP and the OS X Intel aqua 3.0 versions do NOT provide the check mark box for the "Create hybrid file" option in the PDF Options panel that results from selecting the Export to PDF.

Why oh why oh why do software developers do this to "mere mortals" time and time and time again? They say "oh look at the pretty bird in the tree ... look at what this way cool software feature can do for you" and then when you take the chance by expending time downloading and trying it out, the feature either is missing or doesn't work. Sorry to be so harsh, but you raised my hopes up high for the hybrid file option and it is missing from both the Mac and Windows version (if it was only missing from the Mac version I could understand since most of the world still believes that Microsoft is the almighty omnipotent power be all end all that must get software development treatment first and foremost before the Mac). But the fact that "Create hybrid file" is missing from both 3.0 versions is highly disappointing after reading this blog post.

How will Sun Microsytems compensate people whose time was wasted (the ennui tax) by reading this blog post with innocuous information? I spent about an hour of my life on reading this blog post, and then further reading about PDF/A and about the 2nd international ODF conference and then testing to see if I could get the hybrid file approach to work. My life now has one less hour that I could have been doing something else such as spending time with my family or watching the World Series. Who is going to be held accountable for posting this misinformation? Lawyers wouldn't accept this nonsense because they charge by the hour!

Separately, what is the format of the hybrid file?

Posted by eddie on October 19, 2008 at 12:52 PM PDT #

Post a Comment:
Comments are closed for this entry.

Thoughts and pointers on digital freedoms and technology markets. With a few photos too.


« June 2016