Sometimes we need to process the PDF file, extract the text, merge and so on. We used to use A-PDF Text Extractor free tool, why not write one by yourself?
Now we can use the PDFBox-0.7.3 open source class library. Download the package reference:
Copy codeThe Code is as follows:
PDFBox-0.7.3.dll
IKVM. GNU. Classpath. dll
The code for creating a project is simple:
Copy codeThe Code is as follows:
Public static string parsetotxtstringusingdomainbox (string filename ){
PDDocument doc = PDDocument. load (filename );
Required textstripper stripper = new required textstripper ();
Return stripper. getText (doc );
}
Obtain the textString and write them as disk files, as shown in the following code:
Copy codeThe Code is as follows:
Public static void WriteToTextFile (string str, string txtpath)
{
If (string. IsNullOrEmpty (txtpath ))
Throw new ArgumentNullException ("Output file path shocould not be Null ");
Using (var txtWriter = new StreamWriter (txtpath ))
{
TxtWriter. Write (str );
TxtWriter. Close ();
}
}
You can use other functions on your own. This class library currently supports:
PDF to text extraction
Merge PDF Documents ents
PDF Document Encryption/Decryption
Lucene Search Engine Integration
Fill in form data FDF and XFDF
Create a PDF from a text file
Create images from PDF pages
Print a PDF