JAVA Implementation of OCR image recognition technology (I), ocr recognition technology

Source: Internet
Author: User
Tags eol

JAVA Implementation of OCR image recognition technology (I), ocr recognition technology
JAVA Implementation of OCR image recognition technology

Recently, there is a need for Image Recognition and learning to record it.
At present, there are many open source image recognition technologies on the network, such as ore (OCR Easy), Clara OCR, OCRAD, TESSERACT-OCR, etc. Today this blog will record the JAVA Implementation of tesseract-ocr, we will record the usage of Android later:

  • Installation of TESSERACT-OCR
  • Program Implementation
1, TESSERACT-OCR Installation

First download the exeinstallation package for installation. The installation environment is deletesseract-ocr-setup-3.05.01.exe. We recommend that you install it directly in the default path. After the installation is complete, the directory is:

The folder "tessdata" is used to place the language library.

Program Implementation

The TESSERACT-OCR has been installed in the previous step, and the code is directly pasted below.

Core code: OCRHelper. java:

Package com. liping. test; import java. io. bufferedReader; import java. io. file; import java. io. fileInputStream; import java. io. inputStreamReader; import java. util. arrayList; import java. util. list; /*** graphic recognition help class ** @ author Felix Li * @ create 2017-12-19-*/public class OCRHelper {private final String LANG_OPTION = "-l "; private final String EOL = System. getProperty ("line. separator ");/*** Tesseract-OCR Installation path */private String tessPath = "C: // Program Files (x86) // Tesseract-OCR"; // private String tessPath = new File ("tesseract "). getAbsolutePath ();/*** @ param imageFile: input image File * @ param imageFormat: input image format * @ return recognized String */public String recognizeText (File imageFile) throws Exception {/*** set the directory where the output File is stored */File outputFile = new File (imageFile. getParentFile (), "output"); StringBuffer strB = ne W StringBuffer (); List <String> cmd = new ArrayList <String> (); cmd. add (tessPath + "\ tesseract"); cmd. add (""); cmd. add (outputFile. getName (); cmd. add (LANG_OPTION); cmd. add ("chi_sim"); // cmd. add ("eng"); ProcessBuilder pb = new ProcessBuilder ();/*** Sets this process builder's working directory. */pb. directory (imageFile. getParentFile (); cmd. set (1, imageFile. getName (); pb. command (cmd); pb. redirect ErrorStream (true); long startTime = System. currentTimeMillis (); System. out. println ("Start Time:" + startTime); Process process = pb. start (); // tesseract.exe 1.jpg 1-l chi_sim // ProcessBuilder is not used to, or Runtime, consistent results // runtime.getruntime(cmd.exe c ("tesseract.exe 1.jpg 1-l chi_sim");/*** the exit value of the process. by convention, 0 indicates normal * termination. * // System. out. println (cmd. toString (); Int w = process. waitFor (); if (w = 0) // 0 indicates normal exit {BufferedReader in = new BufferedReader (new InputStreamReader (new FileInputStream (outputFile. getAbsolutePath () + ". txt ")," UTF-8 "); String str; while (str = in. readLine ())! = Null) {strB. append (str ). append (EOL);} in. close (); long endTime = System. currentTimeMillis (); System. out. println ("End Time:" + endTime); System. out. println ("Time consumed:" + (endTime-startTime) + "millisecond");} else {String msg; switch (w) {case 1: msg = "Errors accessing files. there may be spaces in your image's filename. "; break; case 29: msg =" Cannot recognize the image or its selected region. "; break; case 31: msg =" Unsupported image format. "; break; default: msg =" Errors occurred. ";} throw new RuntimeException (msg);} new File (outputFile. getAbsolutePath () + ". txt "). delete (); return strB. toString (). replaceAll ("\ s *","");}}

Test. java:

Package com. liping. test; import java. io. file; import java. io. IOException; /*** Test the main class ** @ author Felix Li * @ create 2017-12-19-*/public class Test {public static void main (String [] args) {try {// image File: The image must be recognized. file = new File ("C: // Program Files (x86) // Tesseract-OCR // shouye.png "); String recognizeText = new OCRHelper (). recognizeText (file); System. out. print (recognizeText + "\ t");} catch (IOException e) {e. printStackTrace ();} catch (Exception e) {e. printStackTrace ();}}}

The above is the java use of the TESSERACT-OCR.

Notes
Some Characters read from the image may be garbled. This is because of the recognition language library. You can train the language library to improve the recognition accuracy and speed.
If you have any questions, please leave a message for discussion!

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.