PDFBox's own method of converting HTML is not very good, Pdfdom is based on PDFBox, which strengthens the ability to convert HTML.
Maven
<dependency> <groupId>net.sf.cssbox</groupId> <artifactid>pdf2dom</ artifactid> <version>1.6</version> </dependency> <dependency> < groupid>org.apache.pdfbox</groupid> <artifactId>pdfbox</artifactId> <version >2.0.4</version> </dependency> <dependency> <groupId> org.apache.pdfbox</groupid> <artifactId>pdfbox-tools</artifactId> <version> 2.0.4</version> </dependency>
Use
public void generatehtmlfrompdf (String filename) throws IOException, parserconfigurationexception { pddocument PDF = Pddocument.load (new File); Writer output = new PrintWriter ("pdf.html", "Utf-8"); New Pdfdomtree (). WRITETEXT (PDF, output); Output.close ();}
Or
public void convertpdf2html (File input,writer out) throws IOException, parserconfigurationexception { pddocument PDF = Pddocument.load (input); Pdfdomtree tree = new Pdfdomtree (); Tree.writetext (pdf,out);}
Turn from: here
Using Pdfdom to convert PDFs to HTML