[Go]. NET reading of PDF text

Source: Internet
Author: User

This article transferred from: http://blog.csdn.net/wangqiuyun/article/details/8548779

There are two main libraries used to read PDF text under. NET: PDFBox and Itextsharp.

First said PDFBox, this class library is said to be very powerful, I am just a brief introduction:

1. Download PDFBox

: http://sourceforge.net/projects/pdfbox/

2. referencing the dynamic link library

To extract the downloaded PDFBox, locate the bin directory in which the referenced DLL file needs to be added in the project: IKVM. Gnu. Classpath.dll Pdfbox-0.7.3.dll Fontbox-0.1.0-dev.dll IKVM. Runtime.dll references the above 4 files to the project, the following 2 namespaces need to be introduced in the file: using Org.pdfbox.pdmodel; Using Org.pdfbox.util;

3, the use of the API to see the code:

[CSharp]View Plaincopyprint?
  1. Using Org.pdfbox.pdmodel;
  2. Using Org.pdfbox.util;
  3. Public void Pdf2txt (FileInfo file,fileinfo txtfile)
  4. {
  5. PDDocument doc = pddocument.load (file. FullName);
  6. Pdftextstripper pdfstripper = new Pdftextstripper ();
  7. string text = Pdfstripper.gettext (DOC);
  8. StreamWriter Swpdfchange = new StreamWriter (txtfile.  FullName, false, Encoding.GetEncoding ("gb2312"));
  9. Swpdfchange.write (text);
  10. Swpdfchange.close ();
  11. }
Using org.pdfbox.pdmodel;using org.pdfbox.util;public void Pdf2txt (FileInfo file,fileinfo txtfile) {        pddocument doc = pddocument.load (file. FullName);        Pdftextstripper pdfstripper = new Pdftextstripper ();        string text = Pdfstripper.gettext (doc);        StreamWriter Swpdfchange = new StreamWriter (txtfile. FullName, False, Encoding.GetEncoding ("gb2312"));        Swpdfchange.write (text);        Swpdfchange.close ();}

Itextsharp, in fact, is often used to generate PDFs, but his ability to read PDFs is not bad, using the following:

1. Download Itextsharp

: http://sourceforge.net/projects/itextsharp/

2. referencing the dynamic link library

Unzip the downloaded compressed package inside the Itextsharp-dll-core.zip, get itextsharp.dll, add Reference Itextsharp.dll in the project can be in the file to introduce the following 3 namespaces: Using Itextsharp; Using Itextsharp.text; Using ITextSharp.text.pdf;

3, the use of the API to see the code:

[CSharp]View Plaincopyprint?
  1. Private string oncreated (string filepath)
  2. {
  3. Try
  4. {
  5. string pdffilename = filepath;
  6. Pdfreader Pdfreader = new Pdfreader (Pdffilename);
  7. int numberofpages = pdfreader.numberofpages;
  8. string text = string.  Empty;
  9. For (int i = 1; I <= numberofpages; ++i)
  10. {
  11. byte[] bufferofpagecontent = pdfreader.getpagecontent (i);
  12. Text + = System.Text.Encoding.UTF8.GetString (bufferofpagecontent);
  13. }
  14. Pdfreader.close ();
  15. return text;
  16. }
  17. catch (Exception ex)
  18. {
  19. StreamWriter Wlog = File.appendtext (system.appdomain.currentdomain.setupinformation.applicationbase+"\ \  MyLog.log ");
  20. Wlog. WriteLine ("error file:" + E.fullpath + "Cause:" + ex.)  ToString ());
  21. Wlog. Flush ();
  22. Wlog. Close ();  return null;
  23. }
  24. }

[Go]. NET reads the PDF text

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.