In the. net environment, some methods for extracting Text from PDF files are summarized.

Source: Internet
Author: User

In the. net environment, some methods for extracting Text from PDF files are summarized.

1. IKVM version of Consumer box: As far as I know, only IKVM version of Consumer box can better extract text from PDF. For more information about consumer box, visit the http://www.pdbox.org, see http://www.codeproject.com/csharp/4102text.asp; on CodeProject;
2. Use the Acrobat SDK (this price is not cheap );
3. XPDF: if conditions permit, you can consider using XPDF's PDFToText. XPDF is a PDF parsing library written in C language and provides multiple tools and open source code (if you are familiar with C and dotnet, maybe you can compile it for your use in the dotnet environment), but based on the GUN protocol, if commercial applications require money; more information visit: http://www.foolabs.com/xpdf
4. Ghostscript: another option to consider is Ghostscript. The official website is www.cs.wisc.edu /~ Ghost/, the method for extracting Text, google ps2txt;
5. Other related resources:
Http://www.mj10777.de/NETFramework/Desktop/SharpZipLib/PdfToTxt/index.htm
Extract Text from PDF File: http://www.codeproject.com/Purgatory/DotNetPDF.asp? Df= 100 & forumid = 104443
Code to extract plain text from a PDF file: http://www.codeproject.com/cpp/ExtractPDFText.asp? Df= 100 & forumid = 47947

By the way, many friends asked about the Text Extraction Method in iTextSharp. Now, iTextSharp does not support this function and cannot extract images, of course, I can only extract the most simple format of images (jpeg) through exploration. Others are still studying how to deal with it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.