1. Installing Pillow
Pip Install Pillow
2. Installing TESSERACT-OCR
GitHub Address: Https://github.com/tesseract-ocr/tesseract
You can either the Install tesseract via pre-built binary package or build it from source.
Windows:
The latest installer can be downloaded Here:tesseract-ocr-setup-3.05.01.exe and Tesseract-ocr-setup-4.00.00dev.exe ( Experimental).
Ubuntu:
sudo apt-get install TESSERACT-OCR
Traineddata file path:/usr/share/tesseract-ocr/tessdata/
3. Installing Pytesseract
Pip Install Pytesseract
Problems encountered:
1.FileNotFoundError: [Winerror 2] The system cannot find the file specified
Workaround:
Method 1[Recommended]: Add Tesseract.exe to the environment variable path,
For example: D:\Tesseract-OCR, the default path is C:\Program Files (x86) \TESSERACT-OCR
Note: In order for the environment variable to take effect, you need to close the cmd window or turn off IDE reboot such as Pycharm
Method 2: Modify the pytesseract.py file to specify the Tesseract.exe installation path
# change this IF tesseract are not in YOUR PATH, OR is NAMED differentlytesseract_cmd = ' c:\\program Files (x86) \\Tesseract -ocr\\tesseract.exe '
Method 3: Specify in the actual run code
Pytesseract.pytesseract.tesseract_cmd = ' D:\\tesseract-ocr\\tesseract.exe '
2.pytesseract.pytesseract.tesseracterror: (1, ' Error opening data file \\tesseract-ocr\\tessdata/eng.traineddata ')
Workaround:
Method 1[Recommended]:
Add the path to the parent directory of the Tessdata directory (the default is the TESSERACT-OCR installation directory) to the TESSDATA_PREFIX environment variable
For example: C:\Program Files (x86) \TESSERACT-OCR
Please make sure the TESSDATA_PREFIX environment variable are set to the parent directory of your "Tessdata" directory.
Method 2: Specify Tessdata-dir in the. py file Configuration
Tessdata_dir_config = '--tessdata-dir ' D:\\tesseract-ocr\\tessdata "' # tessdata_dir_config = '--tessdata-dir" ' C:\\ Program Files (x86) \\Tesseract-OCR\\tessdata "' Pytesseract.image_to_string (image, Config=tessdata_dir_config)
Attached: trainedata:the Latest from github.com
Reference Documentation:
Https://pypi.python.org/pypi/pytesseract
Https://github.com/tesseract-ocr/tesseract/wiki
Python Verification code identifies installation of pillow, TESSERACT-OCR and pytesseract modules, and error resolution