simple use and training of TESSERACT-OCR
Tesseract, an Open-source OCR (optical Character recognition, optical character recognition) engine developed by the HP Lab, maintained by Google, and Microsoft Office Document Imaging (MODI), we can continue to train the library, so
-update version w64-v4.0.0, continue to try. Attached DOWNLOAD link address
Download Tesseract's address: digi.bib.uni-mannheim.de/tesseract/
Download the TESSERACT-OCR package address: Github.com/tesseract-ocr/tesseract/wiki/data
indicates the txt name of the output result file, and eng indicates that the language file used for recognition is in English.
3. Open the result.txt file in the tesseract-ocr directory and check that the recognition result is 7542315857. There are 3 character recognition errors and the recognition rate is not very high. Is there any way to provide the recognition rate?
about the Orc verification code recognition can read another article of this blog
two common types of Orc verification code recognition method and Practice Testimonials
This article is a further technical upgrade note for TESSERACT-OCR, and what to do if the default recognition rate is relatively low.
don't worry, TESSERACT-OCR's own tools provide a way to us
JAVA Validation Recognition: Training samples based on Jtessboxeditorfx and TESSERACT-OCRTool Preparation:Jtessboxeditorfx Download:Https://github.com/nguyenq/jTessBoxEditorFXTESSERACT-OCR Download:https://sourceforge.net/projects/tesseract-ocr/Main steps:
Jtessboxeditor
in order to improve the recognition rate of tesseract library, it can be trained in Chinese characters.
1. Install Tesseract first. Note Here to install, because the installed program contains other training programs, the compiled version does not have these tools.
2. Download the Jtessboxeditor tool. This tool is written in Java and requires the JRE to run. T
Optical character recognition (ocr,optical Character recognition) refers to the process of scanning text data, and then analyzing and processing the image files to obtain the text and layout information. OCR technology is very professional, generally many printing, printing industry practitioners use, can quickly convert paper data into electronic data. About Chinese OC
Tesseract is an open source OCR engine that complies with the Apache License 2.0 protocol. Here's how to compile Tesseract on the Android platform and how to quickly create a simple OCR application. Reference Original: Making an Android OCR application with
The first one must be to download all the relevant code, GitHub is the most convenient https://github.com/tesseract-ocr/tesseractPoint 1, Cppan C + + Chinese Management Pack, very convenient, need to turn-wall, installation package also need. This should be popular, it will definitely fire, because it is too convenient, on Windows like Linux installed C + + dependencies, but also a cross-platform solution!
Paste the code First:#1.Install Tesseract-ocr*.exe from http://jaist.dl.sourceforge.net/project/tesseract-ocr-alt/ Tesseract-ocr-setup-3.02.02.exe#2.Install Pillow as "pip Install form *.WHL"#3.Install pytesseract as "pip Install
Reprint Address: Http://www.jianshu.com/p/a53c732d8da3Tesseract-OCR Learning Series (c) Simple example tesseract API Basic Example using CMake ConfigurationReference Document: Https://github.com/tesseract-ocr/tesseract/wiki/APIExampleThe API provided by
times the trust is higher, the recognition result is wrong.After these steps, you can complete the OCR in Japanese. But for the above code to run successfully, you must also install VC + + Run time 2012, otherwise it will error.I used the above method to test the scanned image, found that the recognition accuracy is relatively high, especially after the specified area and the pagesegmode parameter. But the Japanese font also has some low-level errors
\tessdata after decompression : tesseract yourpic.png ENG Use default Eng language Pack Tess Eract yourpic.png sim-l Chi_sim "Use Chi_sim language packs tesseract yourpic.png tra-l Chi_tra" Use Chi_sim language packs To select the closest real data, convenient Fix Later
Third, advanced use training
A handful of trainin
Installing TESSERACT-OCRPreparatory work:Compilation environment: GCC gcc-c++ make (this environment is common machine, can be ignored) ?
1
yum install gcc gcc-c++ make
Dependent packages: autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel Leptonica (1.67 or more)1. autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel Zlib-devel can be installed via Yum:?
12
yum install
Use the jTessBoxEditor tool for Tesseract3.02.02 sample training to improve the verification code recognition rate and tesseract training samples.1. Background
The previous article briefly introduced the installation and basic use of the tesseract ocr engine. It mentioned th
First of all I have to admit that the focus on TESSERACT-OCR, is directed at the following this article gimmick go, 26 line groovy Code hack website Verification Codehttp://www.kellyrob99.com/blog/2010/03/14/breaking-weak-captcha-in-slightly-more-than-26-lines-of-groovy-code/Of course, after looking to know, originally called the three-party library TESSERACT-
Introduction to the Ocr engine and installation of Tesseract in Python, tesseractocr1. Introduction to Tesseract
Tesseract is an open source ocr project supported by google. Its Project address is https://github.com/tesseract-
Use TESSERACT-OCR to hack website verification codeBlog Category:
Image recognition, machine learning, data mining
Groovyhpgoogleblog first I have to admit that attention to TESSERACT-OCR, is directed at the following this article gimmick to go, 26 line groovy Code hack website Verification Codehttp://www.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.