IKVM version of 1.PDFBox: As far as I know, only the IKVM version of PDFBox is able to extract the text from the PDF better, PDFBox more information please visit http://www.pdbox.org,For an example of its application, refer to the CodeProject:http://www.codeproject.com/csharp/pdf2text.asp;
2. Use Acrobat's SDK (this price is not cheap);
3.XPDF: If the condition allows you to consider using XPDF's pdftotext,Xpdf is the C language of the PDF parsing Library, and provide a number of tools, open source (if you are familiar with C and dotnet, perhaps you can compile in dotnet environment for you to use), but based on the gun protocol, if the business application, need money; More information access: http://www.foolabs.com/xpdf
4.Ghostscript: Another thing to consider is Ghostscript, the official website is: www.cs.wisc.edu/~ghost/ , the method of extracting text, Google under Ps2txt;
5. Other related resources:
http://www.mj10777.de/NETFramework/ desktop/sharpziplib/pdftotxt/index.htm
Extract Text from pdf file: http://www.codeproject.com/Purgatory/DotNetPDF.asp?df=100&forumid=104443
Code to extract plain text from a pdf file: http:// www.codeproject.com/cpp/ExtractPDFText.asp?df=100&forumid=47947
by the way, Many friends asked Itextsharp in the method of extracting text, here, for the time being, Itextsharp does not support this feature, nor can I extract pictures, of course, I can only extract the simplest format of the image (JPEG), the other is still studying how to deal with.
Some ways to extract text from a PDF document in a. NET Environment summary