Today to do a PDF file parsing, encountered a requirement: Extract the pictures in the file and save. Using the popular Apache open-source jar package PDFBox, but still encountered a pit, such as PDFBox version too high or too low can not be used!! The package did not do a good job of compatibility problems, some methods in the high version said to abandon the abandoned. There is currently no time to study the differences between versions. Here's a solution to this problem.
PDFBox version: 1.8.13
Maven Portal:
<!--working with PDF files-- <!--Https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox-- < dependency> <groupId>org.apache.pdfbox</groupId> <artifactid>pdfbox</ artifactid> <version>1.8.13</version> </dependency>
Code:
Package Com.tbd.util;import Java.io.file;import Java.io.ioexception;import java.util.iterator;import java.util.List ; Import Java.util.map;import Org.apache.pdfbox.pdmodel.pddocument;import Org.apache.pdfbox.pdmodel.pdpage;import Org.apache.pdfbox.pdmodel.pdresources;import org.apache.pdfbox.pdmodel.graphics.xobject.pdxobjectimage;/** * Extract images from PDF * PDFBox version 1.8.13 * @author Charlie Wu * 2018/05/24 */public class Pdf2image {/** * extract * @param filepdf file * @pa Ram TargetFolder Picture Storage directory * @return */public static boolean extractimages (file file, String TargetFolder) {Boolean result = True;try{pddocument Document = pddocument.load (file); list<pdpage> pages = Document.getdocumentcatalog (). Getallpages ();iterator<pdpage> iter = Pages.iterator ( int count = 0;while (Iter.hasnext ()) {pdpage page = (pdpage) iter.next (); Pdresources resources = page.getresources (); Map<string, pdxobjectimage> images = Resources.getimages (); if (images! = null) {iterator<string> imageiter = Images.keyset (). iterator (); while (Imageiter.hasnext ()) {count++; String key = (string) imageiter.next (); Pdxobjectimage image = (pdxobjectimage) images.get (key); String name = File.getname () + "_" + count;//picture file name Image.write2file (targetfolder + name);//Save Picture}}}} catch (IOE Xception ex) {ex.printstacktrace (); return false;} return result; public static void Main (string[] args) {File file = new file ("F:/test.pdf"); String Targerfolder = "f:/target/"; extractimages (file, Targerfolder);}}
Java uses PDFBox to extract pictures from PDF files