Java OCR tesseract Image Intelligent character recognition technology Java code Implementation

Source: Internet
Author: User
Tags eol

Then the previous OCR said. The previous article introduced the simple use of tesseract in the command line, of course, to inherit into our program, or need code implementation. Here's a sample of the Java implementation that you'll share.

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvbg1qnjiznty1nzkx/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center "/>

Take the code to scan the image above, and then output the results.

The main idea is to use Java to invoke system tasks.

Here's the core code:

Package Com.zhy.test;import Java.io.bufferedreader;import Java.io.file;import java.io.fileinputstream;import Java.io.inputstreamreader;import Java.util.arraylist;import Java.util.list;import Org.jdesktop.swingx.util.OS; public class Ocrhelper{private Final string lang_option = "-L";p rivate final String EOL = System.getproperty ("Line.separat or ");/** * file position I prevent in. Project the same path */private String tesspath = new File ("Tesseract"). GetAbsolutePath ();/** * @param imagefile * Incoming image file * @ Param imageformat * Incoming image format * @return The recognized string */public strings Recognizetext (File imagefile) throws exception{/* * * Set the saved file folder of the output file */file outputFile = new file (Imagefile.getparentfile (), "output"); StringBuffer StrB = new StringBuffer (); list<string> cmd = new arraylist<string> (), if (Os.iswindowsxp ()) {Cmd.add (Tesspath + "\\tesseract");} else if (Os.islinux ()) {Cmd.add ("Tesseract"),} else{cmd.add (Tesspath + "\\tesseract");} Cmd.add (""); Cmd.add (Outputfile.getname ()); Cmd.add (lang_option);//cmd.adD ("Chi_sim"); Cmd.add ("Eng"); Processbuilder PB = new Processbuilder ();/** *sets This process builder ' s working directory. */pb.directory (Imagefile.getparentfile ()); Cmd.set (1, Imagefile.getname ());p B.command (cmd); Pb.redirecterrorstream (TRUE); Process process = Pb.start ();//Tesseract.exe 1.jpg 1-l chi_sim//runtime.getruntime (). EXEC ("Tesseract.exe 1.jpg 1-l Chi _sim ");/** * The exit value of the process. By convention, 0 indicates normal * termination. *///system.out.println (Cmd.tostring ()); int w = process.waitfor (); if (w = = 0)//0 means normal exit {BufferedReader in = new Bufferedre Ader (New InputStreamReader (New FileInputStream (Outputfile.getabsolutepath () + ". txt"), "UTF-8")); String str;while ((str = In.readline ())! = null) {strb.append (str). append (EOL); In.close ();} Else{string Msg;switch (w) {Case 1:msg = "Errors accessing files. There may is spaces in your image ' s filename. "; Break;case 29:msg = "Cannot recognize the image or its selected region."; Break;case 31:msg = "Unsupported image format."; BreAk;default:msg = "Errors occurred.";} throw new RuntimeException (msg);} New File (Outputfile.getabsolutepath () + ". txt"). Delete (); return strb.tostring (). ReplaceAll ("\\s*", "");}}
The code is very easy, the middle part processbuilder in fact similar to Runtime.getruntime (). EXEC ("Tesseract.exe 1.jpg 1-l Chi_sim"), people are not accustomed to the use of runtime.

Test code:

Package Com.zhy.test;import Java.io.file;public class Test{public static void Main (string[] args) {Try{file Testdatadir = New File ("TestData"); System.out.println (Testdatadir.listfiles (). length); int i = 0; For (File file:testDataDir.listFiles ()) {i++; String recognizetext = new Ocrhelper (). Recognizetext (file); System.out.print (recognizetext+ "\ t"); if (i% 5  = = 0) {System.out.println ();}}} catch (Exception e) {e.printstacktrace ();}}}

Output Result:


Compared to the first picture, is not very perfect ~ Haha, of course, assuming you only need to implement the verification code read and write. Then the top is enough. The following continues to popularize the knowledge of image processing.



-------------------------------------------------------------------My cutting line------------------------------------------------- -------------------

Of course. Sometimes the picture is distorted or blurred very badly. Very easy to identify, so I would like to introduce you to a noise-cancelling auxiliary class, absolute bunker, first look.

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvbg1qnjiznty1nzkx/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center "/>

A close-up:


A class, do not rely on whatever jar, the image of the interference line to eliminate, is not very force, and then take this image to identify, will not be better, hey. Let's do our own experiment ~

Code:

Package Com.zhy.test;import Java.awt.color;import Java.awt.image.bufferedimage;import java.io.File;import Java.io.ioexception;import Javax.imageio.imageio;public class Clearimagehelper{public static void Main (String[] args) Throws Ioexception{file Testdatadir = new File ("TestData"); final String DestDir = Testdatadir.getabsolutepath () + "/tmp"; for (file File:testDataDir.listFiles ()) {cleanimage (file, DestDir);}} /** * * @param sfile * image to be de-noising * @param destDir * Image Save Address * @throws IOException */public static void Cleanimage (File sfile, String destDir) throws Ioexception{file destf = new File (destDir); if (!destf.exists ()) { Destf.mkdirs ();} BufferedImage bufferedimage = Imageio.read (sfile); int h = bufferedimage.getheight (); int w = Bufferedimage.getwidth ();// Grayscale int[][] Gray = new Int[w][h];for (int x = 0; x < W; + +) {for (int y = 0; y < h; y++) {int ARGB = Bufferedimage.get RGB (x, y);//image highlighting (brightness recognition rate is very high) int r = (int) (((Argb >>) & 0xFF) * 1.1 +); int g = (int) (((ARGB >> 8) & 0xFF) * 1.1 + +), int b = (int) (((ARGB >> 0) & 0xFF) * 1.1 +); if (R >= 255) {r = 255;} if (g >= 255) {g = 255;} if (b >= 255) {b = 255;} Gray[x][y] = (int) Math.pow ((Math.pow (R, 2.2) * 0.2973 + MATH.POW (g, 2.2) * 0.6274 + Math.pow (b, 2.2) * 0.0753), 1/2.2);} }//two value int threshold = Ostu (Gray, W, h); BufferedImage binarybufferedimage = new BufferedImage (w, h,bufferedimage.type_byte_binary); for (int x = 0; x < W; + +) { for (int y = 0; y < h; y++) {if (Gray[x][y] > Threshold) {gray[x][y] |= 0x00ffff;} else{gray[x][y] &= 0xFF0000;} Binarybufferedimage.setrgb (x, Y, gray[x][y]);}} Matrix print for (int y = 0; y < h; y++) {for (int x = 0; x < W; x + +) {if (Isblack (Binarybufferedimage.getrgb (x, y)) {System . Out.print ("*");} Else{system.out.print ("");}} System.out.println ();} Imageio.write (binarybufferedimage, "JPG", New File (DestDir, Sfile.getname ()));} public static Boolean isblack (int colorint) {Color color = new Color (colorint); if (color.geTRed () + color.getgreen () + color.getblue () <=) {return true;} return false;} public static Boolean iswhite (int colorint) {Color color = new Color (colorint); if (color.getred () + color.getgreen () + Colo R.getblue () >) {return true;} return false;} public static int isblackorwhite (int colorint) {if (Getcolorbright (colorint) < | | Getcolorbright (COLORINT) > 730) {return 1;} return 0;}  public static int getcolorbright (int colorint) {Color color = new Color (colorint); return color.getred () + Color.getgreen () + Color.getblue ();}  public static int Ostu (int[][] Gray, int w, int h) {int[] histdata = new Int[w * h];//Calculate histogramfor (int x = 0; x < W; X + +) {for (int y = 0; y < h; y++) {int red = 0xFF & gray[x][y];histdata[red]++;}}  Total number of Pixelsint total = w * h;float sum = 0;for (int t = 0; t < n; t++) sum + = T * histdata[t];float sumb = 0;int WB = 0;int WF = 0;float Varmax = 0;int threshold = 0;for (int t = 0; t < N; t++) {WB + = histdata[t];//weigh T BAckgroundif (WB = = 0) CONTINUE;WF = TOTAL-WB; Weight foregroundif (WF = = 0) Break;sumb + = (float) (T * histdata[t]); float MB = SUMB/WB; Mean backgroundfloat MF = (sum-sumb)/WF;  Mean foreground//Calculate between Class variancefloat Varbetween = (float) WB * (float) WF * (MB-MF) * (MB-MF);// Check if new maximum Foundif (Varbetween > Varmax) {varmax = Varbetween;threshold = t;}} return threshold;}}


All right, here we go. Suppose this article is useful to you, praise one bar ~





Java OCR tesseract Image Intelligent character recognition technology Java code Implementation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.