TESSERACT_OCR merged font character recognition base and training font,

Source: Internet
Author: User
Tags character set


Reprinted from: http://blog.csdn.net/why200981317/article/details/48265621


First, experience the powerful features of tesseract, first install TESSERACT_OCR, download the address of http://code.google.com/p/tesseract-ocr/, Be sure to download the 3.0.1 version, the latest version of 3.0.2 in front of me, generate character features command can not pass, finally reluctantly resolved, the generated dictionary is identified by the null character

Look at the root directory after the installation is complete


Tessdata folder mainly store dictionary files, as long as the dictionary files in, you can use Tesseract to identify the relevant language text

Now, first, identify a picture.


Put him into any folder, cmd command CD to the picture to place the directory, and then execute [HTML] view plain copy tesseract 1.jpg 1



You can see that a txt text is generated under the folder, and the recognition effect is not ideal. Why, because I used this picture of the word deformation, our picture and tesseract exist in the word to match, find similar, but the dictionary does not have this deformed font, natural recognition error, in order to improve the recognition rate, so we need to train a set of fonts to improve the recognition rate

Training font also need a tool jtessboxeditor, download address is http://sourceforge.NET/projects/vietocr/files/jTessBoxEditor/


Now let's do the actual combat, first to generate a. tif picture set, we use Jtessboxeditor to merge multiple images formatted as TIF

1, open Jtessboxeditor, select Tools->merge tif, choose TIF picture, create a picture set format TIF




2, I generated a picture named Why4.tif, enter the CD into the Why4.tif directory, generate the corresponding. box file

Execute command

[HTML] view plain copy tesseract why.tif why4 batch.nochop Makebox


This file is identified by Tesseract, which indicates the position, size, and character result of the text in the image set.


3, adjustment, because tesseract recognition is inaccurate, so we use jtessboxeditor to adjust the position of the recognition text, results.

Open the generated picture set with Jtessboxeditor why4.tif, note why4.tif corresponding box file must be in the same folder with him (please keep the file name), otherwise, with Jtessboxeditor open no location, recognition results and other information, Then you can adjust it and save it when you're done.



4. Generate A. tr file

[HTML] view plain copy tesseract why4.tif why4 nobatch box.train



5, calculate the character set, extract from the generated box file

[HTML] view plain copy unicharset_extractor Why4.box


6, create the font characteristics file, now the folder under the new arbitrary file name of the feature file, the content format is

[HTML] view plain copy <fontname> <italic> <bold> <fixed> <serif> <fraktur>   ; FontName the prefix name for the font name, keep and picture set file. tif and. box files match,italic>, <bold>, <fixed>, <serif>, <fraktur> Has a value of 1 or 0, indicating whether the font has these attributes.

For example, I created a new named font with the contents of

[HTML] view plain copy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.