[Tesseract] simple digital recognition and tesseract digital recognition
Theory involved in Image Recognition: Fourier transformation, image morphology, filtering, matrix transformation, etc.
The emergence of Tesseract aims to quickly recognize the image framework without these complex theoretical foundations.
Preparation:
1. sample image learning and preprocessing (20 times per element on average)
2. Learning and Initial Identification
3. Calibration Learning Library
Test:
1. images to be recognized and pre-processed
2. Recognition Based on the learning database
Example 1: reversed image color
1 private static void Reverse (string fileName, string outName) 2 {3 using (var pic = Image. fromFile (fileName) as Bitmap) 4 {5 for (int I = 0; I <pic. width; I ++) 6 {7 for (int j = 0; j <pic. height; j ++) 8 {9 var c = pic. getPixel (I, j); 10 c = Color. fromArgb (255-c. r, 255-c. GB, 255-c. b); 11 pic. setPixel (I, j, c); 12} 13} 14 pic. save (outName); 15} 16}Reverse
Example 2: Obtain the specified area of an image
1 private Image Analyse (string fileName) // For convenience, the method parameter does not use Image 2 {3 using (var map = Image. fromFile (fileName) as Bitmap) 4 {5 if (map = null) return null; 6 Point p1; 7 Point p2; 8 var p = GetConfig (out p1, out p2); 9 var pic = new Bitmap (p. x, p. y); 10 var x = 0; 11 var y = 0; 12 for (int I = 0; I <map. height; I ++) 13 {14 if (I> = p1.Y & I <= p2.Y) 15 {16 for (int j = 0; j <map. width; j ++) 17 {18 if (j> = p1.X & j <= p2.X) 19 {20 pic. setPixel (x, y, map. getPixel (j, I); 21 x ++; 22} 23} 24 x = 0; 25 y ++; 26} 27} 28 return pic; 29} 30} 31 32 private Point GetConfig (out Point p1, out Point p2) 33 {34 var p1Str = ConfigurationManager. appSettings ["p1"]. split (','); 35 var p2Str = ConfigurationManager. appSettings ["p2"]. split (','); 36 p1 = new Point () {X = int. parse (p1Str [0]), Y = int. parse (p1Str [1])}; 37 p2 = new Point () {X = int. parse (p2Str [0]), Y = int. parse (p2Str [1])}; 38 return new Point () {X = p2.X-p1.X + 2, Y = p2.Y-p1.Y + 2 }; 39} 40 41 42 class Point43 {44 // <summary> 45 // X coordinate of the vertex, or the width is 46 /// </summary> 47 public int X {get; set;} 48 /// <summary> 49 // Y coordinate of the vertex, or the height is 50 /// </summary> 51 public int Y {get; set;} 52}Image Analyse (string fileName)
Identification steps:
1. Convert the image to tif format, and use the jTessBoxEditor program to package all tif images into one tif image.
2.install tesseract-ocr-setup-3.01-1.exe (environment variables are not required for the installation package version ).
3. doscommand: Enter tesseract.exe {0}. tif {0} batch. nochop makebox to generate the box file ({0} is the file name)
4. doscommand: Enter tesseract.exe {0}. tif {0} nobatch box. train to generate the tr File
5. doscommand: Enter unicharset_extractor.exe tj. box to generate the unicharset file.
6. Create a file font_properties in the same directory without the extension. Write {0} 1 0 0 1 0
7. doscommand: Enter cntraining.exe {0}. tr
8. doscommand: Input mftraining.exe-F font_properties-U unicharset {0}. tr
9.4 file names with a prefix: {0}. (1. There are. 2.4: unicharset inttemp normproto pfftable)
10: doscommand: combine_tessdata {0}. (merge all files to generate a Learning Library, {0}. traineddata)
Code steps:
1.decompress tesseractdotnet_v3020.r590.zip and add tesseract. dll reference.
2.
Var tp = new TesseractProcessor ();
Tp. SetPageSegMode (ePageSegMode. PSM_SINGLE_LINE );
Tp. init ("{0}", "{1}", (int) eOcrEngineMode. OEM_DEFAULT); // {0} is the Path name. For example, c: \ test \, {1} is the name of the Learning Library file, and no extension is required.
Var result = tp. Recognize (pic );
3. Generate the target platform x86.
Download tool: Click to download