results, whicheverThe results are worrying.Fortunately, we can improve the recognition rate method.Increase the number recognition rate, specify the recognition character rangeLocate the Tessdata\configs in the installation directory, open the digits file, and use the text editor as well.I installed it in this directoryD:\Program Files (x86) \tesseract-ocr\tessdata\configs\digitsYou will see the following sentence, we just need to identify the number
recognition, the following continue to refer to other parametersNumber of references 3:-lNumber 4: The language library usedThe reference 3-l should be aware of the language library used in the 4, the default English, which is why the above example of identifying English, and did not enter the number of parameters 3 and 4. The recognition is also realized.The following continues our experiment:We prepared a picture, then use tesseract zhongwen.jpg 7-
Then the previous OCR said. The previous article introduced the simple use of tesseract in the command line, of course, to inherit into our program, or need code implementation. Here's a sample of the Java implementation that you'll share.watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvbg1qnjiznty1nzkx/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center "/>Take the code to scan the image a
JAVA Validation Recognition: Training samples based on Jtessboxeditorfx and TESSERACT-OCRTool Preparation:Jtessboxeditorfx Download:Https://github.com/nguyenq/jTessBoxEditorFXTESSERACT-OCR Download:https://sourceforge.net/projects/tesseract-ocr/Main steps:
Jtessboxeditorfx,tesseract-ocr(environment variable configu
The following links contain the jar packages that are required for the installation package and the program to run, and the Chinese resource pack.How to use the Chinese package: Find the Tessdata installation directory (my Local: C:\Program Files (x86) \tesseract-ocr\tessdata), replace Eng.traineddata with Chi_sim.traineddata , and rename the Chi_sim.traineddata to Eng.traineddataResource Bundle: HTTP://PAN.BAIDU.COM/S/1DFC0EM1Code please refer to: ht
Optical character recognition (ocr,optical Character recognition) refers to the process of scanning text data, and then analyzing and processing the image files to obtain the text and layout information. OCR technology is very professional, generally many printing, printing industry practitioners use, can quickly convert paper data into electronic data. About Chinese OCR, the current domestic level of Tsinghua Wen Tong, Han Wang, Shang Shu, its products are not the same, the price is not cheap.
Tesseract installation, tesseract
[1] direct Installation1) In Ubuntu 14.04, you can directly install the release package tesseract-ocrSudo apt-get install tesseract-ocrIn this way, the data files of the system are in/usr/share/tesseract-ocr/tessdata under/usr/bin (The eng p
To use the Tesseract library in VS, you must use a DLL that has been compiled with the corresponding VS version and Lib. For example, in VS 2013, you must use the Tesseract library that was compiled in VS 2013.Here I give a tesseract library that passes the VS 2013 compilation,:Http://pan.baidu.com/s/1o7JqXmUAfter extracting content such as,With the
tool Jtessboxeditor to train the sample, to improve our accuracy rate.
2, tesseract training:
The general process is: Install jtessboxeditor-> Get sample file-> Merge sample file –> Generate Box file-> define character Profile-> character correction-> Execute batch file-> will generate Traine Ddata put in Tessdata to install Jtessboxeditor
Download Jtessboxeditor, address https://sourceforge.net/projects/vietocr/files/jTessBoxEditor/, after decompres
Use the jTessBoxEditor tool for Tesseract3.02.02 sample training to improve the verification code recognition rate and tesseract training samples.1. Background
The previous article briefly introduced the installation and basic use of the tesseract ocr engine. It mentioned that using the-l eng parameter to limit the language library can improve the recognition accuracy and efficiency.
This article will condu
/tesseract-ocr/tessdata/4.00/chi_sim.traineddataTraditional Chinese identification kit: Https://github.com/tesseract-ocr/tessdata/raw/4.0/chi_tra.traineddataStep Two: InstallDirectly perform the downloaded Tesseract-ocr-setup-4.00.00dev.exe, next, next installation.Step Three: Configure environment variablesNote: My system is win7, other systems should be the sam
Step 4 are completed, several files should be generated under the directory. The four files unicharset, inttemp, normproto, and pfftable should be prefixed with the training name "haijia .".
14. Run "combine_tessdata haijia." On the command line to merge the generated haijia. traineddata training file. After this step is completed, a haijia should be generated under the folder. traineddata file. This file is the training data file used for identification. You only need this haijia file. you can
2008-v2.032.02 was unrunnable, due to a last-minute ' simple ' change. 2.03 fixes the problem and also adds an include check for Leptonica to make it more usable.Tesseract Release Notes April 2008-v2.02
Improvements to clustering, training and classifier.
Major internationalization improvements for large-character-set languages, eg Kannada.
Removed some compiler warnings.
Added MultiPage TIFF support for training and running.
Updated graphics output to the new
First, TESSERACT-OCR is what an OCR Engine that is developed at HP Labs between 1985 and 1995 ... and no W at Google based on the Leptonica (http://leptonica.com/) graphics processing library open source graphic recognition engine. Support Linux, Windows, MAC platforms, Support. NET, C + +, Python, Java, and other development languages: Https://code.google.com/p/t
Reprint Address: Http://www.jianshu.com/p/a53c732d8da3Tesseract-OCR Learning Series (c) Simple example tesseract API Basic Example using CMake ConfigurationReference Document: Https://github.com/tesseract-ocr/tesseract/wiki/APIExampleThe API provided by Tesseract can be found in the baseapi.h file. However, if there ar
:Java code
for (int y = miny; y
For (int x = MinX; x
int RGB = BUFFIMG.GETRGB (x, y);
Color color = new color (RGB);//R,g,b color is obtained according to the int value of RGB.
int value = 255-color.getblue ();
if (value > average) {
Color Newcolor = new Color (0, 0, 0);
Buffimg.setrgb (x, Y, Newcolor.getrgb ());
} Else {
Color Newcolor = new Color (255, 255, 255);
Buffimg.setrgb (x, Y, Newcolor.getrgb ());
}
}
}
());
}
}
Results3. Again is two value, take the average grayscale of the picture as the threshold value, below which all is 0, above this value all is 255:Java code
for (int y = miny; y
For (int x = MinX; x
int RGB = BUFFIMG.GETRGB (x, y);
Color color = new color (RGB);//R,g,b color is obtained according to the int value of RGB.
int value = 255-color.getblue ();
if (value > average) {
Color Newcolor = new Color (0, 0, 0);
About the Tesseract identification tool there is a version of Google provides tesseract-android-tools, but there is also a tesseract-two is very useful, here we use Tesseract-two. Tesseract is implemented in C + + and requires encapsulation of
Installing TESSERACT-OCRPreparatory work:Compilation environment: GCC gcc-c++ make (this environment is common machine, can be ignored) ?
1
yum install gcc gcc-c++ make
Dependent packages: autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel Leptonica (1.67 or more)1. autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel Zlib-devel can be installed via Yum:?
12
yum install
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.