Shrink big presentations with ooshrink

I work in an environment where people use presentations a lot. Of course, we like to use StarOffice, which is based on OpenOffice for all of our office needs.

Presentation files can be big. Very big. Never-send-through-email-big. Especially, when they come from marketing departments and contain lots of pretty pictures. I just tried to send a Sun Systems overview presentation (which I created myself, so less marketing fluff), and it still was over 22MB big!

So here comes the beauty of Open Source, and in this case: Open Formats. It turns out, that OpenOffice and StarOffice documents are actually ZIP files that contain XML for the actual documents, plus all the image files that are associated with it in a simple directory structure. A few years ago I wrote a script that takes an OpenOffice document, unzips it, looks at all the images in the document's structure and optimizes their compression algorithm, size and other settings based on some simple rules. That script was very popular with my colleagues, it got lost for a while and thanks to Andreas it was found again. Still, colleagues are asking me about "That script, you know, that used to shrink those StarOffice presentations." once in a while.

Today, I brushed it up a little, teached it to accept the newer od[ptdc] extensions and it still works remarkably well. Here are some examples:

  • The Sun homepage has a small demo presentation with a few vacation photos. Let's see what happens:
    bash-3.00$ ls -al Presentation_Example.odp
    -rw-r--r--   1 constant sun       392382 Mar 10  2006 Presentation_Example.odp
    bash-3.00$ ooshrink -s Presentation_Example.odp
    bash-3.00$ ls -al Presentation_Example.\*
    -rw-r--r--   1 constant sun       337383 Nov 27 11:36 Presentation_Example.new.odp
    -rw-r--r--   1 constant sun       392382 Mar 10  2006 Presentation_Example.odp

    Well, that was a 15% reduction in file size. Not earth-shattering, but we're getting there. BTW: The -s flag is for "silence", we're just after results (for now).

  • On BigAdmin, I found a presentation with some M-Series config diagrams:

    bash-3.00$ ls -al Mseries.odp
    -rw-r--r-- 1 constant sun 1323337 Aug 23 17:23 Mseries.odp
    bash-3.00$ ooshrink -s Mseries.odp
    bash-3.00$ ls -al Mseries.\*
    -rw-r--r-- 1 constant sun 379549 Nov 27 11:39 Mseries.new.odp
    -rw-r--r-- 1 constant sun 1323337 Aug 23 17:23 Mseries.odp

    Now we're getting somewhere: This is a reduction by 71%!

  • Now for a real-world example. My next victim is a presentation by Teera about JRuby. I just used Google to search for "site:sun.com presentation odp", so Teera is completely innocent. This time, let's take a look behind the scenes with the -v flag (verbose):
    bash-3.00$ ooshrink -v jruby_ruby112_presentation.odp
    Required tools "convert, identify" found.
    ooshrink 1.2
    Check out "ooshrink -h" for help information, warnings and disclaimers.

    Creating working directory jruby_ruby112_presentation.36316.work...
    Unpacking jruby_ruby112_presentation.odp...
    Optimizing Pictures/1000020100000307000000665F60F829.png.
    - This is a 775 pixels wide and 102 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 947, New: 39919. We better keep the original.
    Optimizing Pictures/100000000000005500000055DD878D9F.jpg.
    - This is a 85 pixels wide and 85 pixels high JPEG file.
    - We will try re-encoding this image with JPEG quality setting of 75%.
    - Failure: Old: 2054, New: 2089. We better keep the original.
    Optimizing Pictures/1000020100000419000003C07084C0EF.png.
    - This is a 1049 pixels wide and 960 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 99671, New: 539114. We better keep the original.
    Optimizing Pictures/10000201000001A00000025EFBC8CCCC.png.
    - This is a 416 pixels wide and 606 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 286677, New: 349860. We better keep the original.
    Optimizing Pictures/10000000000000FB000001A6E936A60F.jpg.
    - This is a 251 pixels wide and 422 pixels high JPEG file.
    - We will try re-encoding this image with JPEG quality setting of 75%.
    - Success: Old: 52200, New: 46599 (-11%). We'll use the new picture.
    Optimizing Pictures/100000000000055500000044C171E62B.gif.
    - This is a 1365 pixels wide and 68 pixels high GIF file.
    - This image is too large, we'll resize it to 1280x1024.
    - We will convert this image to PNG, which is probably more efficient.
    - Failure: Old: 2199, New: 39219. We better keep the original.
    Optimizing Pictures/100000000000019A000002D273F8C990.png.
    - This is a 410 pixels wide and 722 pixels high PNG file.
    - This picture has 50343 colors, so JPEG is a better choice.
    - Success: Old: 276207, New: 32428 (-89%). We'll use the new picture.
    Patching content.xml with new image file name.
    Patching styles.xml with new image file name.
    Patching manifest.xml with new image file name.
    Optimizing Pictures/1000000000000094000000E97E2C5D52.png.
    - This is a 148 pixels wide and 233 pixels high PNG file.
    - This picture has 4486 colors, so JPEG is a better choice.
    - Success: Old: 29880, New: 5642 (-82%). We'll use the new picture.
    Patching content.xml with new image file name.
    Patching styles.xml with new image file name.
    Patching manifest.xml with new image file name.
    Optimizing Pictures/10000201000003E3000003E4CFFA65E3.png.
    - This is a 995 pixels wide and 996 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 196597, New: 624633. We better keep the original.
    Optimizing Pictures/100002010000013C0000021EDE4EFBD7.png.
    - This is a 316 pixels wide and 542 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 159495, New: 224216. We better keep the original.
    Optimizing Pictures/10000200000002120000014A19C2D0EB.gif.
    - This is a 530 pixels wide and 330 pixels high GIF file.
    - This image is transparent. Can't convert to JPEG.
    - We will convert this image to PNG, which is probably more efficient.
    - Failure: Old: 39821, New: 56736. We better keep the original.
    Optimizing Pictures/100000000000020D0000025EB55F72E3.png.
    - This is a 525 pixels wide and 606 pixels high PNG file.
    - This picture has 17123 colors, so JPEG is a better choice.
    - Success: Old: 146544, New: 16210 (-89%). We'll use the new picture.
    Patching content.xml with new image file name.
    Patching styles.xml with new image file name.
    Patching manifest.xml with new image file name.
    Optimizing Pictures/10000000000000200000002000309F1C.png.
    - This is a 32 pixels wide and 32 pixels high PNG file.
    - This picture has 256 colors, so JPEG is a better choice.
    - Success: Old: 859, New: 289 (-67%). We'll use the new picture.
    Patching content.xml with new image file name.
    Patching styles.xml with new image file name.
    Patching manifest.xml with new image file name.
    Optimizing Pictures/10000201000001BB0000006B7305D02E.png.
    - This is a 443 pixels wide and 107 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 730, New: 24071. We better keep the original.
    All images optimized.
    Re-packing...
    Success: The new file is only 67% as big as the original!
    Cleaning up...
    Done.

    Neat. We just shaved a third off of a 1.3MB presentation file and it still looks as good as the original!

    As you can see, the script goes through each image one by one and tries to come up with better ways of encoding images. The basic rules are:

    • If an image if PNG or GIF and it has more than 128 colors, it's probably better to convert it to JPEG (if it doesn't use transparency). It also tries recompressing GIFs and other legacy formats as PNGs if JPEG is not an option.
    • Images bigger than 1280x1024 don't make a lot of sense in a presentation, so they're resized to be at most that size.
    • JPEG allows to set a quality level. 75% is "good enough" for presentation purposes, so we'll try that and see how much it buys us.
    The hard part is to patch the XML files with the new image names. They don't have any newlines, so basic Unix scripting tools may hiccup and so the script uses a more conservative approach to patching, but it works.

 

Before I give you the script, here's the obvious
Disclaimer: Use this script at your own risk. Always check the shrunk presentation for any errors that the script may have introduced. It only works 9 out of 10 times (sometimes, there's some funkiness about how OpenOffice uses images going on that I still don't understand...), so you have to check if it didn't damage your file.

The script works with Solaris (of course), but it should also work in any Linux or any other Unix just fine. It relies on ImageMagick to do the image heavy lifting, so make sure you have identify(9E) and convert(9E) in your path. 

My 22 MB Systems Overview presentation was successfully shrunk into a 13MB one, so I'm happy to report that after so many years, this little script is still very useful. I hope it helps you too, let me know how you use it and what shrink-ratios you have experienced!

Comments:

[Trackback] Constantin wrote a neat tool: ooshrink. This tool shrinks presentation by doing some tricks on the pictures of the presentation. It doesn�t help me much for my own presentations, as i use Keynote for them, but it�s fscking useful for sending company p...

Posted by c0t0d0s0.org on November 27, 2007 at 05:10 AM CET #

Hello,

interesting tool. A couple fixes:

line 111: patch_lines=`wc -l "${1}" | sed -e 's,\^[ ]\*\\([0-9][0-9]\*\\)[ ].\*$,\\1,'`
(gnu wc does not output spaces at the beginning of each line)

line 121: replace -ne by !=

By the way, I was surprised to see that with convert you often got an image much larger than the original. Using convert myfile png8:newfile.png would be more efficient (but then you may want to check what transparency features are needed).

Posted by Marc on November 27, 2007 at 10:56 AM CET #

Hi Marc, thank you for your comment and the fixes. I'll have to look into the -ne vs != issue, probably a difference between the Gnu and Solaris test command.

Yes, convert is not always optimal in the way it compresses files, but then the script leaves them alone if that happens. But OTOH, ImageMagick is a great command-line tool for scripting. I was thinking of implementing the script from scratch in Java with Java Advanced Imaging, but that's probably for a future pet project :). The script detects the use of transparency in GIF and PNG images and then decides to leave them alone, just to be sure, so png8 wouldn't be good to add.

Cheers,
Constantin

Posted by Constantin Gonzalez on November 27, 2007 at 02:24 PM CET #

Hello,

for the -ne issue, I was using the bash builtin, not the coreutils command, don't know if that changes much.

I don't really understand your conclusion saying that png8 would not be good to add. Converting to png8 compresses better than converting to png, the only issue being that it cannot handle all cases of transparency. So when the script detects that there is no transparency, it could use png8 and benefit from the better compression.

As for reimplementing, it could be an interesting project for a student, but the script is already nice as it is.

Posted by Marc on November 28, 2007 at 06:30 AM CET #

Hi Marc,

sorry, I was unclear. I think PNG8 is a good idea. I just think the decision process should be:

- Transparency?
Yes: Convert to regular PNG if not already, or leave GIF if better (in many cases, GIF is already better)
No: # of colours?
>256: JPEG
<=256: PNG8

I haven't built in PNG8 yet for the sub-256/no-transparency case and I'll do that because it indeed is a good idea.

I'm hesitant to use PNG8 for >256 colours although there may be a threshold where the tradeoff is ok (such as 1024 colors). I'll try out some cases and see what works best, maybe add a switch to set that threshold with a conservative default.

Posted by Constantin Gonzalez on November 28, 2007 at 07:44 AM CET #

This is a nice idea. I see this problem often. It happens with emails as well. Someone pastes in an image, maybe a screen shot of an error message, but they take it at full resolution and 24-bit color depth just to scale it to a much smaller size for display. But the full version is stored in the document.

This same general technique could also apply to Writer documents as well, right?

Maybe also two modes -- display and print. A presentation targetted for eventually hardcopy output probably requires higher quality and resolution images to reflect the different in DPI between a screen and printer.

Posted by Rob Weir on November 28, 2007 at 09:48 AM CET #

Hi Rob,
thanks for your comment. Yes, I've thought of splitting up the script into an Open/StarOffice specific part (that does the unpacking and scanning and XML-patching) and a separate image optimizer script that applies the conversion and the image codec heuristics as well as sizing constraints. That would enable the use of "imageshrink file.png" and you'll get a size-optimized version for email etc.

Erwin hinted at an upcoming extension to StarOffice that would allow optimization for different needs: http://blogs.sun.com/dancer/entry/shrinking_odf_files

Maybe that's a good point in time to split up my efforts and concentrate in the image optimization, making the Open/StarOffice part optional.

BTW: ooshrink today also works on Writer and Calc documents. And you can easily patch it to accept any other Open/StarOffice document in case I forgot an extension (such as templates).

Cheers,
Constantin

Posted by Constantin Gonzalez on November 28, 2007 at 11:35 AM CET #

I did something similar a while ago with a program called OptimOD. I think a few more ideas can be taken from that as it took the approach of throwing hours of computer time at brute force compression.

Posted by ed on November 29, 2007 at 03:22 PM CET #

This is going places. Congratulations for the tool/work.

Posted by Tiago Silva on November 29, 2007 at 05:52 PM CET #

Hi Ed, Tiago,

thank you for your comments. Based on the feedback I got, I'm going to implement a few enhancements to the script. Stay tuned!

Cheers,
Constantin

Posted by Constantin Gonzalez on November 30, 2007 at 04:04 AM CET #

Hi Constantin,

thanks for the good work! This script has proven very useful to me: One of my colleagues asked me to shrink a seminar presentation on Northern Ireland (with loads of pictures of course). It was 116MB and is now roughly 9MB!

I am looking forward to the improvements ;-)

BTW: Is there an easy way to run it under Windows (without Cygwin!)?
Greets,
Philipp

Posted by Philipp Decker on December 10, 2007 at 05:12 AM CET #

Hi Philipp,
congratulations, this is the biggest shrink I've ever seen with ooshrink so far!

If you're interested, there's now an official OpenOffice.org extension that has a presentation minimization wizard available at: http://extensions.services.openoffice.org/project/PresentationMinimizer

This is obviously more comfortable than ooshrink (and less scriptable, hehe) and it should work very well with Windows (without Cygwin) too.

Cheers,
Constantin

Posted by Constantin on December 10, 2007 at 07:50 AM CET #

Post a Comment:
Comments are closed for this entry.
About

Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!

Search

Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today
Bookmarks
TopEntries
Blogroll
OldTopEntries