This semester a course teacher asked to implement an OCR word recognition program in Java, so it took some time to study how to achieve in Android
The OCR engine is an open source project TESSERACT-OCR This Android version of the address: https://code.google.com/p/tesseract-android-tools/
But I always make mistakes when compiling, so I found someone else on the internet to compile the available tess-two for Android to import into the project (refer to the article address http://www.cnblogs.com/hangxin1940/archive/2012/01/13/ 2321507.html)
I try to do a photo recognition and select picture recognition from the album
But the mobile phone computing ability is too poor, the picture is too large, the resolution is too high, the recognition time will be very long, so in the selection of the picture when the system is called the cutting function, and another thread to deal with the recognition.
Do not use too big a picture when recommending the test.
The same group of teammates with the Java to write the picture preprocessing, so take over to try to improve the recognition success rate
Helpless Android cannot use the java.awt inside the package, so it took some time to replace some of the classes in android.graphics to achieve the same functionality.
The test found that after the grayscale is able to improve some recognition rate, after gray on the computer and then using three of the binary algorithm can further improve the recognition rate
But in the mobile phone with Dajing method, the maximum entropy method of binary time spent too long (almost no success), so later these two methods are not called, and only using the iterative method of binary effect is not ideal.
Identify the language packs that need to be used and need to be placed in the SD card root directory. (I only download the language pack in Simplified Chinese and English)
Here are a few of the identification (the first two are identified mobile phone photos, the latter is a sample image of recognition):
The final effect is to recognize some of the more regular text, the picture is best to cut only the text part to identify (and to be more clear).
can also identify some simple English, digital verification code.
Recognition needs to be improved, the speed also needs to be improved.
Here are the source code, language packs, etc.:
CSDN Download
Baidu Network Disk Download
Tessdata is the language pack that needs to be put into the phone's SD card root directory
Tess-two is an OCR engine (compiled based on version 3.01, now available in version 3.02) and needs to be imported into the project (the project has already been imported)