Simple Digital Image Recognition Using ImageMagick and tesseract

Source: Internet
Author: User
Tags imagemagick
Simple Digital Image Recognition Using ImageMagick and tesseract

Because tesseract is used directly for recognition, the recognition rate is very low,

ImageMagick installation, configuration, and usage:
Platform: WINXP
1. Install ImageMagick (ImageMagick Website: http://www.imagemagick.org/script/index.php)
Download and install ImageMagick. Http://www.imagemagick.org/script/binary-releases.php#windows

 

Enter convert-version to verify the success

 
d:\>convert  -versionVersion: ImageMagick 6.8.9-4 Q16 x64 2014-06-22 http://www.imagemagick.orgCopyright: Copyright (C) 1999-2014 ImageMagick Studio LLCFeatures: DPC Modules OpenMPDelegates: bzlib cairo freetype jbig jng jp2 jpeg lcms lqr pangocairo png ps rsvg tiff webp xml zlib

 

If the command cannot add the install path to the system path, it can call DLL. Such as D: \ Program Files \ OCR \ ImageMagick-6.8.9-Q16

 

The following describes how to use tesseract with ImageMagick for simple digital image recognition.

First, Tesseract can only recognize BMP and TIF, so first use ImageMagick to convert the image. Note that no compression conversion is required. Otherwise, the Tesseract error is returned.

convert -compress none ./S16.gif./S16.tif

 

Then grayscale the image-colorspace gray, or directly binarization-monochrome into black and white, so the recognition rate will be higher.

Next, crop the part to be precisely identified:-crop widthxheight + x + y

For example,-crop 320x40 + 0 + 1, crop an image of 320*40 in size at () starting from the upper left corner. Note that the parameter is an English letter X, not a multiplication character *.

There are also the-depth 8 and-alpha off parameters found on the Internet. The practice shows that the addition or removal is not very effective.

One step is like this:

convert -compress none -depth 8 -alpha off -crop 535x24+2+2 -monochrome ./S16.gif ./S16.tif

 

Convert.exe:Part of the ImageMagick suite is responsible for converting the image format. The meaning of each parameter is as follows:

  • -Compress none: Do not compress the converted image. If this item is not added, the following error will be reported during tesseract processing: read_tif_image: Error: Illegal image format: compression.
  • -Depth 8: Set the color depth of the converted image to 8 BPP. If you do not have this parameter, the consequences are as follows:

Tesseract open source OCR engine
Check_legal_image_size: Error: only 1, 2, 4, 5, 6, 8 BPP are supported: 16
Segmentation fault

  • -Alpha off: do not add an Alpha layer to the converted image. If this parameter is not available, the consequences are the same as above.
  • -Crop 320x40 + 0 + 1: Crop an image of the size of 320*40 at () starting from the upper left corner. Note that the parameter is an English letter X, not a multiplication character *.
  • Followed by the name of the image to be converted,
  • Finally, the file name of the converted image.

 

After preprocessing, it is best to zoom in again. This example is enlarged to 500% ~ The recognition efficiency of 600% is satisfactory.

convert  ./S16.tif -scale 600% ./S16B.tif

Then you can start to use tesseract for identification.

tesseract ./S16B.tif ./S16B -l eng -psm 7

-L Eng is the language of choice.-PSM 7 indicates that the source image is only a line of text. For more information, see the help of tesseract.

 

 

 

 

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.