Chinese OCR.1.1, first to tesseract project home page Download command line tools, source code, Chinese language pack:1.2. The command line tool is decompressed as follows (1.jpg, 1.txt not included):1.3. For Chinese OCR, copy the Simplified Chinese language pack to the "Tessdata" directory:1.4, in DOS switch to tesseract command line directory, look at the tess
tesseract command under Terminal to parse it and report the following error (but it is available in Python ):Tesseract Open Source OCR Engine v3.03 with LeptonicaError in pixReadStreamPng: function not presentError in pixReadStream: png: no pix returnedError in pixRead: pix not readError in pixGetInputFormat: pix not definedReading./Test/Python/t2.png as a list of filenames...Error in fopenReadStream: file
Installing TESSERACT-OCRPreparatory work:Compilation environment: GCC gcc-c++ make (this environment is common machine, can be ignored) ?
1
yum install gcc gcc-c++ make
Dependent packages: autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel Leptonica (1.67 or more)1. autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel Zlib-devel can be installed via Yum:?
12
yum install
To use the Tesseract library in VS, you must use a DLL that has been compiled with the corresponding VS version and Lib. For example, in VS 2013, you must use the Tesseract library that was compiled in VS 2013.Here I give a tesseract library that passes the VS 2013 compilation,:Http://pan.baidu.com/s/1o7JqXmUAfter extracting content such as,With the
simple use and training of TESSERACT-OCR
Tesseract, an Open-source OCR (optical Character recognition, optical character recognition) engine developed by the HP Lab, maintained by Google, and Microsoft Office Document Imaging (MODI), we can continue to train the library, so that the image of the ability to convert text is constantly enhanced, if the team depth needs, you can also use it as a template, to d
The first one must be to download all the relevant code, GitHub is the most convenient https://github.com/tesseract-ocr/tesseractPoint 1, Cppan C + + Chinese Management Pack, very convenient, need to turn-wall, installation package also need. This should be popular, it will definitely fire, because it is too convenient, on Windows like Linux installed C + + dependencies, but also a cross-platform solution!
Introduction to the Ocr engine and installation of Tesseract in Python, tesseractocr1. Introduction to Tesseract
Tesseract is an open source ocr project supported by google. Its Project address is https://github.com/tesseract-ocr/tesseract. the latest source code can be down
@egorpugin (ref issue # 209) https://www.dropbox.com/s/8t54mz39i58qslh/ Tesseract-3.05.00dev-win32-vc19.zip?dl=1You have the to install the VC2015 x86 redist from Microsoft.com on order to run them. Leptonica is built with all libs except for libjp2k.
Https://github.com/UB-Mannheim/tesseract/wiki
http://domasofan.spdns.eu/tesseract/
To summarize:1,
The previous article simply learned the English in the TESSERACT-OCR recognition image (the link address is as follows: www.cnblogs.com/wj-1314/p/9428909.html), it looks good, So this article continues in-depth study TESSERACT-OCR recognize the Chinese in the picture.
first, prepare the Chinese font
Download the Chi_sim.traindata font. To have this ability to r
Use the jTessBoxEditor tool for Tesseract3.02.02 sample training to improve the verification code recognition rate and tesseract training samples.1. Background
The previous article briefly introduced the installation and basic use of the tesseract ocr engine. It mentioned that using the-l eng parameter to limit the language library can improve the recognition accuracy and efficiency.
This article will condu
, leaving only the text to make the picture clearer and easier to read tesseract:From PIL import image import Subprocessdef cleanfile (FilePath, Newfilepath): image = Image.open (filePath) # Filter the image by threshold and save image = Image.point (lambda x:0 if xGrab text from a Web site pictureUse Tesseract to read the text on the image on the hard drive, but when we combine it with a web crawler, it becomes a powerful tool.To grab a tex
under Windows, Linux installation you can refer to this blog Linux installation TESSERACT-OCRWindows is relatively simple, download the program installation is good, which requires language packs can be selected in the installation options to download the Chinese Language pack, the default is only English. The version I installed is
Recently to do word recognition, do not let the interface directly with others, so you can only try to use Open source class library. TESSERACT-OCR is an open-source word Recognition project from Hewlett-Packard, which allows us to quickly build a text-to-text recognition system to help develop an OCR system that recognizes images. Because of the Windows environment Development, I also have to install the system in the Windows environment.First step:
As we all know, this is an excellent character recognition software. This open source project can be downloaded from http://code.google.com/p/tesseract-ocr/downloads/list.When using, it is recommended to use 3 instead of 2, for some reason, 2 can be used directly in the project, but due to some obvious bugs and other reasons, many causes the program to not run or even crash. So we recommend using the command-line version of 3.In addition to downloadin
In the image of interest in the processing of text recognition, the individual found that some of the tutorials are incomplete. Need to find the west to put together. So this comb the next Windows installation complete record, in the application is the use of Python programming.
First of all, the prerequisite downloads related packages.
Includes Windows Installer (tesseract-ocr-setup-3.05.01) with TESSERACT
In python, tesseract is called as an api to identify image verification codes,I. background
Previously, I introduced how to call the tesseract ocr engine in python. At that time, I mainly introduced the shell mode. In shell mode, the tesseract program needs to be installed, and the efficiency is relatively low.
Today we will introduce how to call the api. Because
There are roughly two ocr solutions for android applications, and the most popular one is tesseract. Here I will write down my solutions for the last two days. If you have any defects, please click here:There are two solutions. One is to use tesseract cloud-service, which sends the image information to the cloud and obtains the image analysis data. The other is not to connect to the Internet, localized Anal
background.
Make sure the foreground is segmented as far as possible from the background (that is, no pixelated or distorted characters).
Apply a text offset correction to the input image to ensure that the text is aligned correctly.
Now, we apply OCR to the following sample image. (First you need to download the sample image in the original link, the original link is given below)Go to your project path and enter the following command in your
Preparations for installing Tesseract-OCR: Compiling Environment: gccgcc-c ++ make (this environment is generally available on machines and can be ignored) packages on which 1yuminstallgccgcc-c ++ make depends: autoconfautomakelibtoollibjpeg-devellibpng-devellibtif... install Tesseract-OCR
Preparations:
Compiling Environment: gcc-c ++ make (this environment is generally available on machines and can be igno
indicates the txt name of the output result file, and eng indicates that the language file used for recognition is in English.
3. Open the result.txt file in the tesseract-ocr directory and check that the recognition result is 7542315857. There are 3 character recognition errors and the recognition rate is not very high. Is there any way to provide the recognition rate? Tesseract provides a set of training
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.