There are two main class libraries used to read PDF text in. Net: product_box and itextsharp.
Let's talk about consumer box. This class library is said to be very powerful. Here I will just give a brief introduction:
1. Download product_box
: Http://sourceforge.net/projects/pdfbox/
2. Reference Dynamic Link Library
Decompress the downloaded export box and find the bin directory. You need to add the referenced DLL files in the project:
Ikvm. GNU. classpath. dll
PDFBox-0.7.3.dll
FontBox-0.1.0-dev.dll
Ikvm. runtime. dll
To reference the above four files to a project, you must introduce the following two namespaces to the file:
Using orgdomainbox. pdmodel;
Using orgdomainbox. util;
3. Check the code for API usage:
[CSHARP]View plaincopy
- Using orgdomainbox. pdmodel;
- Using orgdomainbox. util;
- Public void cmd2txt (fileinfo file, fileinfo txtfile)
- {
- Pddocument Doc = pddocument. Load (file. fullname );
- Extends textstripper extends stripper = new extends textstripper ();
- String text = javasstripper. gettext (DOC );
- Streamwriter swpdfchange = new streamwriter (txtfile. fullname, false, encoding. getencoding ("gb2312 "));
- Swpdfchange. Write (text );
- Swpdfchange. Close ();
- }
Itextsharp is used to generate PDF files in many cases, but its ability to read PDF files is not bad. It is used as follows:
1. Download itextsharp
: Http://sourceforge.net/projects/itextsharp/
2. Reference Dynamic Link Library
Decompress the itextsharp-dll-core.zip file in the downloaded package to obtain itextsharp. dll. Add reference itextsharp. DLL to the project.
The following three namespaces must be introduced to the file:
Using itextsharp;
Using itextsharp. text;
Using itextsharp.text.pdf;
3. Check the code for API usage:
[CSHARP]View plaincopy
- Private string oncreated (string filepath)
- {
- Try
- {
- String pdffilename = filepath;
- Pdfreader = new pdfreader (pdffilename );
- Int numberofpages = pdfreader. numberofpages;
- String text = string. empty;
- For (INT I = 1; I <= numberofpages; ++ I)
- {
- Byte [] bufferofpagecontent = pdfreader. getpagecontent (I );
- Text + = system. Text. encoding. utf8.getstring (bufferofpagecontent );
- }
- Pdfreader. Close ();
- Return text;
- }
- Catch (exception ex)
- {
- Streamwriter wlog = file. appendtext (system. appdomain. currentdomain. setupinformation. applicationbase + "\ mylog. log ");
- Wlog. writeline ("error file:" + E. fullpath + "cause:" + ex. tostring ());
- Wlog. Flush ();
- Wlog. Close (); return NULL;
- }
- }