An analysis of Android OCR character recognition

Source: Internet
Author: User

This semester a course teacher asked to implement an OCR word recognition program in Java, so it took some time to study how to achieve in Android

The OCR engine is an open source project TESSERACT-OCR This Android version of the address: https://code.google.com/p/tesseract-android-tools/

But I always make mistakes when compiling, so I found someone else on the internet to compile the available tess-two for Android to import into the project (refer to the article address http://www.cnblogs.com/hangxin1940/archive/2012/01/13/ 2321507.html)

I try to do a photo recognition and select picture recognition from the album

But the mobile phone computing ability is too poor, the picture is too large, the resolution is too high, the recognition time will be very long, so in the selection of the picture when the system is called the cutting function, and another thread to deal with the recognition.

Do not use too big a picture when recommending the test.

The same group of teammates with the Java to write the picture preprocessing, so take over to try to improve the recognition success rate

Helpless Android cannot use the java.awt inside the package, so it took some time to replace some of the classes in android.graphics to achieve the same functionality.

The test found that after the grayscale is able to improve some recognition rate, after gray on the computer and then using three of the binary algorithm can further improve the recognition rate

But in the mobile phone with Dajing method, the maximum entropy method of binary time spent too long (almost no success), so later these two methods are not called, and only using the iterative method of binary effect is not ideal.

Identify the language packs that need to be used and need to be placed in the SD card root directory. (I only download the language pack in Simplified Chinese and English)

Here are a few of the identification (the first two are identified mobile phone photos, the latter is a sample image of recognition):

The final effect is to recognize some of the more regular text, the picture is best to cut only the text part to identify (and to be more clear).

can also identify some simple English, digital verification code.

Recognition needs to be improved, the speed also needs to be improved.

Here are the source code, language packs, etc.:

CSDN Download

Baidu Network Disk Download

Tessdata is the language pack that needs to be put into the phone's SD card root directory
Tess-two is an OCR engine (compiled based on version 3.01, now available in version 3.02) and needs to be imported into the project (the project has already been imported)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.