Just touch, nothing, follow the tutorial walk
Requirements: Identify the text information in the picture
Environment: Windows system
Development language: python3.5
Tool class: 1.PYOCR
2.PIL
3.tesseract-ocr
Steps:
1.pyocr
Network Access direct command:
Pip Install PYOCR
Network is not available, go to https://pypi.python.org/pypi/pyocr/0.4.1 download installation
2. Install PIL (has not been installed successfully, as if there is no corresponding 3.5 version, to 2.X, but this can be skipped, not installed )
Network Access direct command:
Pip Install PIL
Network is not available, go to http://www.pythonware.com/products/pil/index.htm download installation
3. Installing TESSERACT-OCR
Http://jaist.dl.sourceforge.net/project/tesseract-ocr-alt/tesseract-ocr-setup-3.02.02.exe
EXE file, install directly after download, recommend the default installation process option, install directory default C:\Program Files (x86) \TESSERACT-OCR
# Coding=utf-8 __author__ =' YJJ '
#https://GITHUB.COM/TESSERACT-OCR Import Sys Import importlib #reload (SYS) Importlib.reload (SYS); #sys. setdefaultencoding (' Utf-8 ')
Import OS; os.environ[' Nls_lang '] =' Simplified Chinese_china. UTF8 ' Try From PYOCRImport PYOCR From PILImport Image ExceptImporterror: Print' Module import error, please install using PIP, Pytesseract dependent on the following libraries: ') Print' http://www.lfd.uci.edu/~gohlke/pythonlibs/#pil ') Print' http://code.google.com/p/tesseract-ocr/') RaiseSystemexit Tools = Pyocr.get_available_tools () [:] IfLen (tools) = =0: Print"No OCR tool found") Sys.exit (1) Print"Using '%s '"% (tools[0].get_name ()) print (tools[ 0].image_to_string (Image.open ( "D: \\123.png "), Span style= "COLOR: #660099" >lang= print (Tools[0].image_to_string (Image.open ( D:\\ 3434.png "), lang= Chi_sim ') #print tools[0].image_to_ String (Image.open (' d:\\3535.png '), lang= ' Chi_sim ') |
File contents: (Put the picture on the D-plate)
123.png
3434.png
Output:
Using ' tesseract (SH) '
7364
Beg I only another u going r 1th generation
Problems that you may encounter throughout the process
1. (sometimes restarting the software, wrong is nothing, strange is not wrong, I am so) if the console output: "No OCR tool found", indicating that the installation is not successful Tesseract-ocr,debug view Get_available_ Tools, go back in this method to see the OCR library that has been installed in this machine, there are three kinds,
Libtesseract,
Tesseract,
Cuneiform,
This article uses the second kind of tesseract,
Tesseract specific installation please go to.
2. In the identification of pictures with Chinese, will encounter the "Allow_blob_division" error,
Need to download TESSERACT-OCR's Chinese library, address: Https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.chi_ Sim.tar.gz/download, which contains tesseract multiple text library, Chi_sim.traineddata for the Simplified Chinese library, put the file to C:\Program files (x86) \tesseract-ocr\ Tessdata directory next to the specific processing method, go to: https://www.cnblogs.com/syqlp/p/5462459.html
OCR recognition version-python3.5