Converting Scanned Image PDF to txt

Source: Internet
Author: User

Although the PDF file can be viewed on Nokia E61, the PDF scaling of the scanned image does not seem to work, and the image can be enlarged by 1000% or cannot be clearly seen.

Here is the process of converting a PDF file to a TXT file:

1. The scanned PDF cannot be directly converted to TXT using a conversion tool. OCR is required.

2. Print the PDF file as an MDI file. After opening it with Microsoft Office document imaging, the text cannot be recognized. The quality of the scanned text is low, and Microsoft Office document imaging cannot be recognized at all.

3. cajviewer recognition: the recognition effect is very good, but only a text recognition can be selected. If the entire document is saved as txt, It is garbled.

4. My final solution is to use the pdf2jpg tool to convert the PDF file to JPG, and then use "Shang Shu 7 OCR" to recognize the image text. (Because the OCR function of Shangshu No. 7 cannot open PDF directly ). The recognition effect is acceptable, more than 90%.

 

Hope to find a better solution.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.