First, the environment
Windows 7 x64
Python 3 +
Second, installation
1, TESSERACT-OCR Installation
http://digi.bib.uni-mannheim.de/tesseract/
2, Pytesseract installation
Pip Install Pytesseract
3, Pillow installation
Pip Install Pillow
Third, the use
#!-*-coding:utf-8-*-Importpytesseract fromPILImportImagepytesseract.pytesseract.tesseract_cmd='C://program Files (x86)//tesseract-ocr//tesseract.exe'Tessdata_dir_config='--tessdata-dir "C://program Files (x86)//tesseract-ocr//tessdata"'defMain (): Image= Image.open ('Code.png') Code= pytesseract.image_to_string (image, Lang ='Eng', config=tessdata_dir_config)Print(Code)if __name__=='__main__': Main ()
Iv. experience, the pits encountered
1, in the Windows environment support is not so good, imports only import pytesseract package, will always report not Find error.
Cause: You do not find the TESSERACT-OCR application in the installation step, you need to include the reference in your code:
2, image_to_string need to overload two parameters, the approximate understanding,
Lang = ' eng ' will find the Eng.traineddata file under the Tessdata folder under Configuration path under Tessdate_dir_config,
Config= is the reference path
Different identification libraries can be configured according to the *.traineddata file in the Tessdata directory (not sure if it is correct, presumably)
Error message:
Traceback (most recent):
File "d:\***\verifycodetest\src\main.py", line +, in <module>
Main ()
File "d:\***\verifycodetest\src\main.py", line one, in main
Code = pytesseract.image_to_string (image, Lang = ' eng ', Config=tessdata_dir_config)
File "c:\users\*\appdata\local\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py", line 193, In image_to_string
return Run_and_get_output (image, ' txt ', lang, config, Nice)
File "c:\users\*\appdata\local\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py", line 140, In Run_and_get_output
Run_tesseract (**kwargs)
File "c:\users\*\appdata\local\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py", line 111, In Run_tesseract
proc = subprocess. Popen (command, stderr=subprocess. PIPE)
File "c:\users\*\appdata\local\programs\python\python36\lib\subprocess.py", line 707, in __init__
Restore_signals, Start_new_session)
File "c:\users\*\appdata\local\programs\python\python36\lib\subprocess.py", line 990, in _execute_child
STARTUPINFO)
Filenotfounderror: [Winerror 2] The system cannot find the file specified
Traceback (most recent):
File "d:\***\verifycodetest\src\main.py", line +, in <module>
Main ()
File "d:\***\verifycodetest\src\main.py", line one, in main
Code = pytesseract.image_to_string (image) #, Lang = ' eng ', Config=tessdata_dir_config)
File "c:\users\*\appdata\local\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py", line 193, In image_to_string
return Run_and_get_output (image, ' txt ', lang, config, Nice)
File "c:\users\*\appdata\local\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py", line 140, In Run_and_get_output
Run_tesseract (**kwargs)
File "c:\users\*\appdata\local\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py", line 116, In Run_tesseract
Raise Tesseracterror (Status_code, Get_errors (error_string))
Pytesseract.pytesseract.TesseractError: (1, ' Error opening data file \\Program Files (x86) \\Tesseract-OCR\\ Eng.traineddata sure the TESSDATA_PREFIX environment variable are set to your "Tessdata" directory. Failed loading language \ ' eng\ ' tesseract couldn\ ' t load any languages! Could not initialize tesseract. ')
Referenced from: 51490291
Python TESSERACT-OCR basic Verification Code recognition feature (Windows)