Methods for extracting text from PDF files in the DOTNET Environment

Source: Internet
Author: User
Tags dotnet
1. ikvm version of Consumer box: As far as I know, only ikvm version of Consumer box can extract text better from pdf. For more information about consumer box, visit http://www.pdbox.org,
For more information about its application instances, see http://www.codeproject.com/csharp/4102text.asp;
2. Use the acrobat SDK (this price is not cheap );
3. xpdf: if conditions permit, you can use the limit totext of xpdf,
Xpdf is a PDF parsing library written in C language. It provides multiple tools and is open to users. Source code(If you are familiar with C and DOTNET, you may compile it for your use in the DOTNET environment), but based on the gun protocol, if commercial applications require money;
More Information Access: http://www.foolabs.com/xpdf
3. ghostscript: another option to consider is ghostscript. The official website is www.cs.wisc.edu /~ Ghost/, the method for extracting text, Google ps2txt;
4. Other related resources:
Http://www.mj10777.de/NETFramework/Desktop/SharpZipLib/PdfToTxt/index.htm
Extract text from PDF file: http://www.codeproject.com/Purgatory/DotNetPDF.asp? Df= 100 & forumid = 104443
Code to extract plain text from a PDF file: http://www.codeproject.com/cpp/ExtractPDFText.asp? Df= 100 & forumid = 47947

By the way, many friends asked about the Text Extraction Method in itextsharp. Now, itextsharp does not support this function and cannot extract images, of course, I can only extract the most simple format of images (JPEG) through exploration. Others are still studying how to deal with it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.