Sunday, September 2, 2012

PDF to Image Conversion in Java

By: Geertjan Wielenga | Product Manager

In the past, I created a NetBeans plugin for loading images as slides into NetBeans IDE. That means you had to manually create an image from each slide first. So, this time, I took it a step further. You can choose a PDF file, which is then automatically converted to an image for each page, each of which is presented as a node that can be clicked to open the slide in the main window.

As you can see, the remaining problem is font rendering. Currently I'm using PDFBox. Any alternatives that render font better?

This is the createKeys method of the child factory, ideally it would be replaced by code from some other library that handles font rendering better:

@Override
protected boolean createKeys(List<ImageObject> list) {
mylist = new ArrayList<ImageObject>();
try {
if (file != null) {
ProgressHandle handle = ProgressHandleFactory.createHandle(
"Creating images from " + file.getPath());
handle.start();
PDDocument document = PDDocument.load(file);
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
for (int i = 0; i < pages.size(); i++) {
PDPage pDPage = pages.get(i);
mylist.add(new ImageObject(pDPage.convertToImage(), i));
}
handle.finish();
}
list.addAll(mylist);
} catch (IOException ex) {
Exceptions.printStackTrace(ex);
}
return true;
}

The import statements from PDFBox are as follows:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

Join the discussion

Comments ( 7 )
  • kvaso Sunday, September 2, 2012

    I am using Pdf-renderer http://java.net/projects/pdf-renderer/

    and I am quite satisfied


  • mark stephens Monday, September 3, 2012

    You could download the LGPL version of JPedal from http://www.jpedal.org/open_source_pdf_viewer_download.php and generate the images using this code

    /**instance of PdfDecoder to convert PDF into image*/

    PdfDecoder decode_pdf = new PdfDecoder(true);

    /**set mappings for non-embedded fonts to use*/

    FontMappings.setFontReplacements();

    /**open the PDF file - can also be a URL or a byte array*/

    try {

    decode_pdf.openPdfFile("C:/myPDF.pdf"); //file

    //decode_pdf.openPdfFile("C:/myPDF.pdf", "password"); //encrypted file

    //decode_pdf.openPdfArray(bytes); //bytes is byte[] array with PDF

    //decode_pdf.openPdfFileFromURL("http://www.mysite.com/myPDF.pdf",false);

    /**get page 1 as an image*/

    //page range if you want to extract all pages with a loop

    int start = 1, end = decode_pdf.getPageCount();

    for(int i=start;i<end+1;i++)

    BufferedImage img=decode_pdf.getPageAsImage(i);

    /**close the pdf file*/

    decode_pdf.closePdfFile();

    } catch (PdfException e) {

    e.printStackTrace();

    }

    The PDF file format has a lot of exceptions (Adobe tries to make sure as many files as possible open in Acrobat even if they do not meet the spec). So there are lots of 'gotchas' in the PDF world (like all TrueType fonts are Mac encoded unless they are not!). We write up the more interesting cases on our blog at http://www.jpedal.org/PDFblog

    We also built the Open source library into our PDF viewer plugin for NetBeans... Come to our talk at JavaOne and we will be talking about it (and NetBeans/JavaFX)!


  • guest Tuesday, July 16, 2013

    I have recently started to use Aspose products (http://www.aspose.com/java/pdf-component.aspx) because they are getting alot of popularity among developers and users who are using them and they also provide java code for users also related to various conversion like in last week they provide code for converting a specific or all pdf pages to png which was very useful for developers like me. Below is the code:

    Convert particular PDF page to PNG Image

    [Java]

    //open document

    com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document("input.pdf");

    // create stream object to save the output image

    java.io.OutputStream imageStream = new java.io.FileOutputStream("Converted_Image.png");

    //create Resolution object

    com.aspose.pdf.Resolution resolution = new com.aspose.pdf.Resolution(300);

    //create PngDevice object with particular resolution

    com.aspose.pdf.PngDevice pngDevice = new com.aspose.pdf.PngDevice(resolution);

    //convert a particular page and save the image to stream

    pngDevice.process(pdfDocument.getPages().get_Item(1), imageStream);

    //close the stream

    imageStream.close();

    Convert all PDF pages to PNG Images

    [Java]

    //open document

    com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document("input.pdf");

    // loop through all the pages of PDF file

    for (int pageCount = 1; pageCount <= pdfDocument.getPages().size(); pageCount++)

    {

    // create stream object to save the output image

    java.io.OutputStream imageStream = new java.io.FileOutputStream("Converted_Image" + pageCount + ".png");

    //create Resolution object

    com.aspose.pdf.Resolution resolution = new com.aspose.pdf.Resolution(300);

    //create PngDevice object with particular resolution

    com.aspose.pdf.PngDevice pngDevice = new com.aspose.pdf.PngDevice(resolution);

    //convert a particular page and save the image to stream

    pngDevice.process(pdfDocument.getPages().get_Item(pageCount), imageStream);

    //close the stream

    imageStream.close();

    }


  • Leila Holmann Tuesday, August 13, 2013

    Hi Geertjan

    Have you looked at our java library products?

    http://www.qoppa.com/javapdf/

    These are commercial products. We have jPDFImages that can convert PDF pages to images. We also have jPDFViewer which can directly take and render a PDF document.

    We have an advanced support for fonts, including font substitution, but if you run into any specific font related issue, contact us at support@qoppa.com.

    Leila


  • Tilman Saturday, June 14, 2014

    The font problem is still in the 1.8.x versions of PDFBox, but it is solved in the unreleased 2.0 version. The API is slightly different, but it is easy to find out by looking at the examples (PDFToImage) or at the test cases.


  • JonyGreen Wednesday, September 30, 2015

    I find a free online pdf to image converter(http://www.online-code.net/pdf-to-image.html), you can convert pdf to jpg online free.


  • DC Monday, March 27, 2017

    i have Rendered temporarely - load from my assetspath

    /** adds all images from path to Image-List */

    private void createImage() throws Exception {

    File songfile = Paths.get(App.getInstance().getAssetPath("/songs/" + song.getId() + "/pdf.pdf")).toFile();

    //

    File songfile = Paths.get(App.getInstance().getAssetPath("/songs/" + song.getId() + ".pdf")).toFile();

    List<Image> mylist = new ArrayList<Image>();

    PDDocument document = PDDocument.load(songfile);

    PDFRenderer pdfRenderer = new PDFRenderer(document);

    for (int page = 0; page < document.getNumberOfPages(); ++page)

    {

    try {

    mylist.add(SwingFXUtils.toFXImage(pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB), null));

    } catch (IOException e) {

    App.getInstance().showAlert("Could not Render Image.");

    Logger.getLogger(App.class.getName()).log(Level.SEVERE, "Could not Render Image.", e);

    }

    }

    document.close();

    notePages.addAll(mylist);

    }

    _____________________

    or save all files permanent:

    /** adds all images from path to Image-ArrayList and showNotes from Index 0 */

    //

    notePages = new ArrayList<Image>();

    //

    for (int i = 0; i < song.getNotePageChangeTimes().size() + 1; i++) {

    //

    String imgPath = App.getInstance().getAssetPath("/songs/" + song.getId() + "/notes-" + Integer.toString(i) + ".png");

    //

    InputStream is = new FileInputStream(new File(imgPath));

    //

    Image img = new Image(is);

    //

    notePages.add(img);

    //

    }

    showNotes(0);


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
 

Visit the Oracle Blog

 

Contact Us

Oracle

Integrated Cloud Applications & Platform Services