First, IntroductionApache PDFBox is an open source, Java-based, PDF document-generated tool library that can be used to create new PDF documents, modify existing PDF documents, and extract the required content from a PDF document. Apache PDFBox also contains a number of command-line tools.
Apache PDFBox recently released the latest version of 1.8.2.
second, the characteristicsApache PDFBox has the following main features:
1) Text extraction: Extracts text from a PDF document.
2) Merge and split: You can combine multiple PDF documents into a single file, or you can split a single PDF into multiple PDF documents.
3) Form Fill: You can extract data from a PDF form, or fill in a PDF form.
4) pdf/a Validation: Verify that the PDF document meets the pdf/a ISO standard.
5) PDF Printing: Exporting PDF documents to the printer-using Java's Print API.
6) PDF Conversion: You can convert a PDF document into an image file.
7) PDF Creation: You can create a new PDF document from scratch.
8) Integrated Lucene search engine: Lucene search engine integrates with PDF index.
third, download address :
1) Pdfbox-1.8.2.jar
Address: Http://archive.apache.org/dist/pdfbox/1.8.2/pdfbox-1.8.2.jar
Description: Meet the general PDF operation requirements.
2) Pdfbox-app-1.8.2.jar
Http://archive.apache.org/dist/pdfbox/1.8.2/pdfbox-app-1.8.2.jar
Description: PDFBox a toolkit for multiple command lines.
3) Fontbox-1.8.2.jar
Address: Http://archive.apache.org/dist/pdfbox/1.8.2/fontbox-1.8.2.jar
Description: PDF using the font pack
Package com.fk.extract.pdf;
Import Java.io.File;
Import Java.io.FileInputStream;
Import java.io.IOException;
Import Org.apache.pdfbox.pdfparser.PDFParser;
Import org.apache.pdfbox.pdmodel.PDDocument;
Import Org.apache.pdfbox.util.PDFTextStripper;
public class Extracttextfrompdf {public
void readpdf (String filename) {
File File = new file (filename);
FileInputStream in = null;
try {in
= new FileInputStream (filename);
Pdfparser parser = new Pdfparser (in);
Parser.parse ();
PDDocument pddocument = Parser.getpddocument ();
Pdftextstripper stripper = new Pdftextstripper ();
String result = Stripper.gettext (pddocument);
System.out.println ("PDF file" + file.getabsolutepath () + "content as follows:");
SYSTEM.OUT.PRINTLN (result);
} catch (IOException e) {
//e.printstacktrace ();
}
}
public static void Main (string[] args) {
new Extracttextfrompdf (). Readpdf ("Src/match.pdf");
}
}