Tesseract OCR-based text recognition Android application development data collation

Source: Internet
Author: User


Objective



first, Tesseract OCR engine


  Tesseract OCR is a commercial-grade OCR(Optical Character Reader, optical character reader ) engine developed by HP in 1985-1995 years, and opened the source code in 2005. Here is the URL on its sourceforge:

  Http://sourceforge.net/projects/tesseract-ocr/?source=directory

It has now been transferred to Google's Code service:

  https://code.google.com/p/tesseract-ocr/

Download list of source and font:

  Https://code.google.com/p/tesseract-ocr/downloads/list

Google's servers are unstable in the country and can be downloaded from http://pkgs.fedoraproject.org to the required packages and font packs:

C> Source: http://pkgs.fedoraproject.org/repo/pkgs/tesseract/tesseract-ocr-3.02.02.tar.gz/

 > Chinese font: http://pkgs.fedoraproject.org/repo/pkgs/tesseract-langpack/tesseract-ocr-3.02.chi_sim.tar.gz/

 > English font: http://pkgs.fedoraproject.org/repo/pkgs/tesseract/tesseract-ocr-3.02.eng.tar.gz/


second, Android-oriented tesseract tools


  Tess-two is a copy of Tesseract Tools for Android (tesseract-android-tools) and adds some features. Tesseract Tools for Android is a collection of Android APIs and build files for Tesseract OCR and Leptonica image processing libraries. The URL on GitHub is:

  Https://github.com/rmtheis/tess-two

about the Tess-two compilation process, you can refer to the above URL, according to my compilation experience, found that the implementation of "Android Update Project--path." "One step is to specify the--target option, whose parameters are obtained by the command" Android list targets "to the ID value, and then specified as follows (remember to connect your Android phone).

[Email protected]:/home/work/tess-two# android list targetsavailable Android targets:----------id:1 or "android-18"     name:android 4.3     Type:platform     API level:18     revision:1     skins:wvga854, WXGA800, WSVGA, WVGA800 (default), WQVGA400, WXGA720 , QVGA, WQVGA432, wxga800-7in, HVGA     abis:armeabi-v7a[email protected]:/home/work/tess-two# Android Update Project- -path. --target 1
The "1" is the value from the back of the ID. Then, follow the description on Tess-two GitHub and add to Eclispe.


Iii. Tess-two-based Android applications


 with the Tesseract library Tess-two for Android, the next step is to use the Android app to test and use. The ANDROID-OCR on GitHub is the application:

  Https://github.com/rmtheis/android-ocr

I test the use of the application from Mike_wong, referring to the article "Analysis of the Android OCR character recognition", the article contains the source code. After extracting the source and importing it into Eclipse, delete the "gen" directory in the directory and close the project and reopen it to regenerate the Gen directory and its contents. The source code already contains the libtess.so and liblept.so generated by the Tess-two compilation and contains the libjpeg.so. The only step before compiling and installing the app is to create the Tessdata directory in the SD directory of the target phone, adding the Chinese and English fonts described above to the directory, the "/sdcard/tessdata/" directory .

 After testing, it is found that the running time of the algorithm is relatively slow, a paragraph of Chinese, processing time of about 20 seconds. Recognition accuracy is also relatively limited, in a photo of the text more than a long, the recognition accuracy has decreased, if the text is relatively small and relatively large, the accuracy is higher.


Appendix: Other Reference Articles


1. "Android OCR tesseract": http://www.cnblogs.com/hangxin1940/archive/2012/01/13/2321507.html

2. "Android Taiwan Tess-two's image recognition finally success": http://www.cnblogs.com/muyun/archive/2012/06/12/2546693.html

3. "Tesseract-ocr Training Method": http://my.oschina.net/lixinspace/blog/60124

4. Training methods of Tesseract 3 language data: http://blog.wudilabs.org/entry/f25efc5f/


Tesseract OCR-based text recognition Android application development data collation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.