Sometimes we need to do some processing of PDF files, extracting text, merging, and so on. Before we used the A-pdf Text extractor free tool, why not write one yourself?
Now we can use PDFBox-0.7.3 this open source class library. After downloading the package reference:
Copy Code code as follows:
Pdfbox-0.7.3.dll
IKVM. Gnu. Classpath.dll
To create a new project, the code is simple:
Copy Code code as follows:
public static string Parsetotxtstringusingpdfbox (string filename) {
PDDocument doc = pddocument.load (filename);
Pdftextstripper stripper = new Pdftextstripper ();
Return Stripper.gettext (DOC);
}
Get this textstring, and then write them into a disk file, like this:
Copy Code code as follows:
public static void Writetotextfile (String str,string txtpath)
{
if (string. IsNullOrEmpty (Txtpath))
throw new ArgumentNullException ("Output file path should not being Null");
using (var txtwriter = new StreamWriter (Txtpath))
{
Txtwriter.write (str);
Txtwriter.close ();
}
}
Other features you can play by yourself. This class library currently supports:
PDF to Text extraction
Merge PDF Documents
PDF Document encryption/decryption
Lucene Search Engine Integration
Fill in form data FDF and XFDF
Create a PDF from a text file
Create images from PDF pages
Print a PDF