Summary of some methods for extracting text text from PDF documents in the. NET Environment

Source: Internet
Author: User
Tags dotnet

IKVM version of 1.PDFBox: As far as I know, only the IKVM version of PDFBox is able to extract the text from the PDF better, PDFBox more information please visit http://www.pdbox.org,For an example of its application, refer to the CodeProject:http://www.codeproject.com/csharp/pdf2text.asp;
2. Use Acrobat's SDK (this price is not cheap);
3.XPDF: If the condition allows you to consider using XPDF's pdftotext,Xpdf is the C language of the PDF parsing Library, and provide a number of tools, open source (if you are familiar with C and dotnet, perhaps you can compile in dotnet environment for you to use), but based on the gun protocol, if the business application, need money; More information access: http://www.foolabs.com/xpdf
4.Ghostscript: Another thing to consider is Ghostscript, the official website is: www.cs.wisc.edu/~ghost/ , the method of extracting text, Google under Ps2txt;
5. Other related resources:
http://www.mj10777.de/NETFramework/ desktop/sharpziplib/pdftotxt/index.htm
Extract Text from pdf file: http://www.codeproject.com/Purgatory/DotNetPDF.asp?df=100&forumid=104443
Code to extract plain text from a  pdf file: http:// www.codeproject.com/cpp/ExtractPDFText.asp?df=100&forumid=47947

by the way, Many friends asked Itextsharp in the method of extracting text, here, for the time being, Itextsharp does not support this feature, nor can I extract pictures, of course, I can only extract the simplest format of the image (JPEG), the other is still studying how to deal with.

Some ways to extract text from a PDF document in a. NET Environment summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.