EnvironmentPython 3.6.3 pip 9.0.1 tesseract-ocr-setup-3.05.00dev.exe Windows10
installation
1.tesseract-orc
Tesseract: Open source OCR identification engine, the initial tesseract engine developed by HP Labs, later contributed to the open source software industry, then through Google to improve, eliminate bugs, optimize, redistribute.
Install the language you need to choose to install, some other countries can not choose the language installed, I installed in Chinese, English and Japanese. The installation process is the same as other software.
2.pytesseract
Pip Install Pytesseract
Configuring the Environment
1. Set the TESSERACT-ORC path
TESSERACT-ORC is not added to the system's path path by default, which occurs when used Filenotfounderror: [Winerror 2] The system cannot find the specified file error.
Workaround:
* Method 1: Add C:\Program Files (x86) \TESSERACT-OCR to the system path (the path varies depending on the installation process)
* Method 2: Modify the pytesseract.py file, modify the method as follows
Set the location of the training set
The default training set for the download is also not added to the system path, and an error Pytesseract.pytesseract.TesseractError: (1, ' Error opening data file \\Program Files (x86) \ \ Tesseract-ocr\\tessdata/chi_sim.traineddata ')
Workaround:
Set environment variable Tessdata_prefix
C:\Program Files (x86) \tesseract-ocr\tessdata Instance program
Import pytesseract from
pil import image
image = Image.open (' test.png ')
code = pytesseract.image_to_string (image)
Print (code)
More references: https://pypi.python.org/pypi/pytesseract