Tuesday Nov 27, 2007

Shrink big presentations with ooshrink

I work in an environment where people use presentations a lot. Of course, we like to use StarOffice, which is based on OpenOffice for all of our office needs.

Presentation files can be big. Very big. Never-send-through-email-big. Especially, when they come from marketing departments and contain lots of pretty pictures. I just tried to send a Sun Systems overview presentation (which I created myself, so less marketing fluff), and it still was over 22MB big!

So here comes the beauty of Open Source, and in this case: Open Formats. It turns out, that OpenOffice and StarOffice documents are actually ZIP files that contain XML for the actual documents, plus all the image files that are associated with it in a simple directory structure. A few years ago I wrote a script that takes an OpenOffice document, unzips it, looks at all the images in the document's structure and optimizes their compression algorithm, size and other settings based on some simple rules. That script was very popular with my colleagues, it got lost for a while and thanks to Andreas it was found again. Still, colleagues are asking me about "That script, you know, that used to shrink those StarOffice presentations." once in a while.

Today, I brushed it up a little, teached it to accept the newer od[ptdc] extensions and it still works remarkably well. Here are some examples:

  • The Sun homepage has a small demo presentation with a few vacation photos. Let's see what happens:
    bash-3.00$ ls -al Presentation_Example.odp
    -rw-r--r--   1 constant sun       392382 Mar 10  2006 Presentation_Example.odp
    bash-3.00$ ooshrink -s Presentation_Example.odp
    bash-3.00$ ls -al Presentation_Example.\*
    -rw-r--r--   1 constant sun       337383 Nov 27 11:36 Presentation_Example.new.odp
    -rw-r--r--   1 constant sun       392382 Mar 10  2006 Presentation_Example.odp

    Well, that was a 15% reduction in file size. Not earth-shattering, but we're getting there. BTW: The -s flag is for "silence", we're just after results (for now).

  • On BigAdmin, I found a presentation with some M-Series config diagrams:

    bash-3.00$ ls -al Mseries.odp
    -rw-r--r-- 1 constant sun 1323337 Aug 23 17:23 Mseries.odp
    bash-3.00$ ooshrink -s Mseries.odp
    bash-3.00$ ls -al Mseries.\*
    -rw-r--r-- 1 constant sun 379549 Nov 27 11:39 Mseries.new.odp
    -rw-r--r-- 1 constant sun 1323337 Aug 23 17:23 Mseries.odp

    Now we're getting somewhere: This is a reduction by 71%!

  • Now for a real-world example. My next victim is a presentation by Teera about JRuby. I just used Google to search for "site:sun.com presentation odp", so Teera is completely innocent. This time, let's take a look behind the scenes with the -v flag (verbose):
    bash-3.00$ ooshrink -v jruby_ruby112_presentation.odp
    Required tools "convert, identify" found.
    ooshrink 1.2
    Check out "ooshrink -h" for help information, warnings and disclaimers.

    Creating working directory jruby_ruby112_presentation.36316.work...
    Unpacking jruby_ruby112_presentation.odp...
    Optimizing Pictures/1000020100000307000000665F60F829.png.
    - This is a 775 pixels wide and 102 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 947, New: 39919. We better keep the original.
    Optimizing Pictures/100000000000005500000055DD878D9F.jpg.
    - This is a 85 pixels wide and 85 pixels high JPEG file.
    - We will try re-encoding this image with JPEG quality setting of 75%.
    - Failure: Old: 2054, New: 2089. We better keep the original.
    Optimizing Pictures/1000020100000419000003C07084C0EF.png.
    - This is a 1049 pixels wide and 960 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 99671, New: 539114. We better keep the original.
    Optimizing Pictures/10000201000001A00000025EFBC8CCCC.png.
    - This is a 416 pixels wide and 606 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 286677, New: 349860. We better keep the original.
    Optimizing Pictures/10000000000000FB000001A6E936A60F.jpg.
    - This is a 251 pixels wide and 422 pixels high JPEG file.
    - We will try re-encoding this image with JPEG quality setting of 75%.
    - Success: Old: 52200, New: 46599 (-11%). We'll use the new picture.
    Optimizing Pictures/100000000000055500000044C171E62B.gif.
    - This is a 1365 pixels wide and 68 pixels high GIF file.
    - This image is too large, we'll resize it to 1280x1024.
    - We will convert this image to PNG, which is probably more efficient.
    - Failure: Old: 2199, New: 39219. We better keep the original.
    Optimizing Pictures/100000000000019A000002D273F8C990.png.
    - This is a 410 pixels wide and 722 pixels high PNG file.
    - This picture has 50343 colors, so JPEG is a better choice.
    - Success: Old: 276207, New: 32428 (-89%). We'll use the new picture.
    Patching content.xml with new image file name.
    Patching styles.xml with new image file name.
    Patching manifest.xml with new image file name.
    Optimizing Pictures/1000000000000094000000E97E2C5D52.png.
    - This is a 148 pixels wide and 233 pixels high PNG file.
    - This picture has 4486 colors, so JPEG is a better choice.
    - Success: Old: 29880, New: 5642 (-82%). We'll use the new picture.
    Patching content.xml with new image file name.
    Patching styles.xml with new image file name.
    Patching manifest.xml with new image file name.
    Optimizing Pictures/10000201000003E3000003E4CFFA65E3.png.
    - This is a 995 pixels wide and 996 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 196597, New: 624633. We better keep the original.
    Optimizing Pictures/100002010000013C0000021EDE4EFBD7.png.
    - This is a 316 pixels wide and 542 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 159495, New: 224216. We better keep the original.
    Optimizing Pictures/10000200000002120000014A19C2D0EB.gif.
    - This is a 530 pixels wide and 330 pixels high GIF file.
    - This image is transparent. Can't convert to JPEG.
    - We will convert this image to PNG, which is probably more efficient.
    - Failure: Old: 39821, New: 56736. We better keep the original.
    Optimizing Pictures/100000000000020D0000025EB55F72E3.png.
    - This is a 525 pixels wide and 606 pixels high PNG file.
    - This picture has 17123 colors, so JPEG is a better choice.
    - Success: Old: 146544, New: 16210 (-89%). We'll use the new picture.
    Patching content.xml with new image file name.
    Patching styles.xml with new image file name.
    Patching manifest.xml with new image file name.
    Optimizing Pictures/10000000000000200000002000309F1C.png.
    - This is a 32 pixels wide and 32 pixels high PNG file.
    - This picture has 256 colors, so JPEG is a better choice.
    - Success: Old: 859, New: 289 (-67%). We'll use the new picture.
    Patching content.xml with new image file name.
    Patching styles.xml with new image file name.
    Patching manifest.xml with new image file name.
    Optimizing Pictures/10000201000001BB0000006B7305D02E.png.
    - This is a 443 pixels wide and 107 pixels high PNG file.
    - This image is transparent. Can't convert to JPEG.
    - We will try re-encoding this image with PNG compression level 9.
    - Failure: Old: 730, New: 24071. We better keep the original.
    All images optimized.
    Re-packing...
    Success: The new file is only 67% as big as the original!
    Cleaning up...
    Done.

    Neat. We just shaved a third off of a 1.3MB presentation file and it still looks as good as the original!

    As you can see, the script goes through each image one by one and tries to come up with better ways of encoding images. The basic rules are:

    • If an image if PNG or GIF and it has more than 128 colors, it's probably better to convert it to JPEG (if it doesn't use transparency). It also tries recompressing GIFs and other legacy formats as PNGs if JPEG is not an option.
    • Images bigger than 1280x1024 don't make a lot of sense in a presentation, so they're resized to be at most that size.
    • JPEG allows to set a quality level. 75% is "good enough" for presentation purposes, so we'll try that and see how much it buys us.
    The hard part is to patch the XML files with the new image names. They don't have any newlines, so basic Unix scripting tools may hiccup and so the script uses a more conservative approach to patching, but it works.

 

Before I give you the script, here's the obvious
Disclaimer: Use this script at your own risk. Always check the shrunk presentation for any errors that the script may have introduced. It only works 9 out of 10 times (sometimes, there's some funkiness about how OpenOffice uses images going on that I still don't understand...), so you have to check if it didn't damage your file.

The script works with Solaris (of course), but it should also work in any Linux or any other Unix just fine. It relies on ImageMagick to do the image heavy lifting, so make sure you have identify(9E) and convert(9E) in your path. 

My 22 MB Systems Overview presentation was successfully shrunk into a 13MB one, so I'm happy to report that after so many years, this little script is still very useful. I hope it helps you too, let me know how you use it and what shrink-ratios you have experienced!

About

Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
TopEntries
Blogroll
OldTopEntries