Android OCR Demo

Source: Internet
Author: User

I haven't written a blog for a long time because of my work recently. (see a lot of comments and messages, can not reply to each one)


Before the New year Oracle organized an internal programming marathon, when the choice of the topic is OCR-related, but the effect is not very good, have been thinking of the code to re-organize, optimize the effect.


At present with the domestic internet fire of a mess, it seems to have led to the development of image processing and reference, has always felt that the image processing is difficult to find the right job, so when looking for a job to change the natural language processing, and now found that the Internet recruitment image processing Engineers are also many ...


Not much nonsense to say, to see OCR on Android.


For OCR, you need to use some of the existing SDKs, such as Tesseract, a Google Open source project that seems to be more difficult to access domestically.

Tesseract is a C language developed, if you want to use an Android platform that requires a JNI call mechanism via the Android platform, interested friends can refer to: http://blog.csdn.net/watkinsong/article/ details/9849973


For most friends, it is not necessary to directly manipulate the underlying C language, after all, it is very difficult to debug with JNI calls. To facilitate the use of a project called Tess-two on Tesseract,github, Tesseract's underlying API is encapsulated as a Java API that can be used directly by the Android platform, so that direct reference to the project can be done directly with OCR development.


If you do not want to see the following wordy, directly download my demo project, directly read the code on it,: Https://github.com/weixsong/libra, note, this is a collection of items, find the Ocrdemo folder is the corresponding project.


Step one: Download the latest Tess-two project to Https://github.com/weixsong/tess-two and import the envy through eclipse

Part II: Create your own OCR project and then reference the Tess-two project so that you can use the API provided by the TESS-TWO project and the. so file.




Step three: To https://code.google.com/p/tesseract-ocr/, download the trained training data for the corresponding language you need (as to whether this data is a template, a neural network, or a classifier ...). I don't know) and then put the training data into the asset file directory of the Android project,


It is important to note that when writing an Android program, you need to copy the trained classification files to the Android storage:

private void Checkandcopyfiles () {string[] paths = new string[] {data_path, data_path_tessdata};for (String path:paths {File Dir = new File (path), if (!dir.exists ()) {if (!dir.mkdirs ()) {LOG.V (TAG, "Error:creation of directory" + path+ " On SDcard failed "); return;} else {log.v (TAG, "Created directory" + Path + "on SDcard");}}} for (string lang:langs) {string traineddata_path = Data_path_tessdata + lang+ ". Traineddata"; String asset_tessdata = "tessdata/" + lang + ". Traineddata"; New File (Traineddata_path)). Exists ()) {try {Assetmanager Assetmanager = Getassets (); InputStream in = Assetmanager.open ( asset_tessdata); OutputStream os = new FileOutputStream (traineddata_path);//Transfer bytes from on to outbyte[] buf = new Byte[1024];int len;while (len = In.read (buf)) > 0) {os.write (buf, 0, Len);} In.close (); Os.close (); LOG.V (TAG, "Copied" + lang + "Traineddata");} catch (IOException e) {log.e (TAG, "Unable to copy" + lang + "Traineddata" + e.tostring ());}}}


Fourth step: You can use Tess-two for OCR recognition, of course, you need to provide a picture ...


Package Com.example.homework;import Android.graphics.bitmap;import Android.util.log;import Com.googlecode.tesseract.android.tessbaseapi;public class TESSTWOOCR {private static final String TAG = "TESSTWOOCR"; Private Tessbaseapi ocr_eng;private Tessbaseapi ocr_chi;public tesstwoocr () {LOG.V (TAG, "Baseapi initializing ..."); OCR _eng = new Tessbaseapi (); Ocr_eng.setdebug (true); Ocr_eng.init (Mainactivity.data_path, mainactivity.lang_en); Ocr_chi = new Tessbaseapi (); Ocr_chi.setdebug (true); Ocr_chi.init (Mainactivity.data_path, Mainactivity.lang_zh);} Public String DOOCR (Bitmap Bitmap, String lang) {string result = ""; if (Lang.equals (mainactivity.lang_en)) {Ocr_eng.setim Age (bitmap); result = Ocr_eng.getutf8text ();} else if (lang.equals (Mainactivity.lang_zh)) {ocr_chi.setimage (bitmap); result = Ocr_chi.getutf8text ();} else {// Nothing}return Result.trim ();}}

I used the above code in both Chinese and English OCR recognition, because again in my demo can be in the English to identify the switch.



Because the source code of the demo is given here, the whole process is relatively simple to describe. We still try to refer to the demo code for the development of the corresponding OCR recognition. Demo:https://github.com/weixsong/libra, in the source code, there is also a detailed description of how to use the project.


The following is the effect of OCR recognition, overall, the current effect is quite good:










Android OCR Demo

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.