The use of OCR character recognition technology

Source: Internet
Author: User

image text recognition software ABBYY FineReader is an office-ready software that recognizes JPG, GIF, PNG, BMP, TIF and PDF source files, PDF scans, which means that the text that we can not edit in our daily work can be passed through the ABBYY FineReader image word recognition software to identify, the recognition of the text can be freely edited. A lot of people have this question, what is the technical principle of image word recognition software?

1, text input: refers to the input device to enter the document into the computer, that is, to achieve the digitization of the manuscript. The most commonly used device now is the scanner. The scanning quality of document image is the precondition of correct recognition of OCR software. Proper selection of scanning resolution and related parameters is the key to ensure that the text is clear and features are not lost. In addition, the document is placed as correct as possible to ensure that the pre-processing detection of the tilt angle is small, after the tilt correction, the text image deformation is small. These simple operations will improve the recognition accuracy of the system. Conversely, due to improper scanning settings, the text of the broken pen too many may be divided into half the text of the image. Text broken pen and stroke adhesion will cause some loss of features, in comparing its characteristics with the feature library, it will increase the feature distance, the recognition error rate increases.

2, Pre-processing: scanning a simple printed document image, each text image is divided into a recognition module recognition, this process is called image preprocessing. Preprocessing refers to some of the preparations before the word recognition, including image purification, to remove the apparent noise (interference) from the original image. The main task is to measure the tilt angle of the document placement, the document layout analysis, the selected text field for typesetting confirmation, the horizontal, vertical layout of the text line, the separation of the text image of each line, punctuation marks and so on.  This phase of the work is very important, the effect of processing directly affect the accuracy of word recognition. Layout analysis is the overall analysis of the text image, is to check out all the text blocks in the document, to distinguish between the text paragraphs and typesetting sequence, as well as the image, table area. The domain boundary of each block (domain in the image, the starting point and the end point coordinate), the attribute (horizontal and vertical layout) and the connection relation of each block are used as a data structure, which is provided to the recognition module for automatic recognition. For the text area directly to identify processing, for the table area for the special table analysis and identification processing, the image area is compressed or simple storage. Line segmentation is the process of cutting large images first into rows, and then separating individual characters from the image rows.

3, Word recognition: word recognition is the core technology that embodies OCR character recognition software . From the scanned text to detect the text image, by the computer to its graphics, images into the standard code of text, is to let the computer "read" The key, that is, the so-called recognition technology. Just as the human brain knows the words because of the various features of the text that have been preserved in the human brain, such as the structure of the text and the strokes of the text. In order for the computer to recognize the text, you need to first store the characters and other information to the computer, but to store what kind of information and how to obtain this information is a very complex process, and to achieve a very high recognition rate to meet the requirements. The usual practice is to analyze the Strokes, feature points, projection information and regional distribution of the points.

Above these three is the picture word recognition software abbyy finereader recognition process of the technical principle, some of the technology immature software every step needs the user manual input operation, so there is no certain expertise to complete the process. and abbyy finereader picture word recognition software because of the mature technology, high degree of intelligence, these are all programs are software automatic completion, a key to complete the identification work.

This article is from: http://www.abbyychina.com/FRshiyongjiqiao/fr-tuwenshibieruanjian.html

The use of OCR character recognition technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.