Six excellent open-source OCR Optical Character Recognition tools

Source: Internet
Author: User

Statement: This article is not the author's original, original Reprinted from: http://sigvc.org/bbs/thread-870-1-1.html

Paper has become increasingly popular in many places. For more than 40 years of paperless office discussions, the office environment is limiting the generation of Paper Mountain. In the past few years, the concept of paperless office has undergone significant changes. With the help of computer software, documents containing a large amount of important management data and information can be stored electronically more conveniently. The benefit of scanning a document is not purely for archiving reasons. To access paper-based information and integrate information into digital workflows,
Optical character recognition (OCR) technology is crucial. Selecting the correct OCR tool depends on the specific requirements. For example, the online OCR service is useful to some people, but may have privacy issues and file size restrictions. OCR software is not a mass product, so it is less open-source replacement than commercial heavyweight products. In addition, OCR software requires advanced algorithms to correctly translate scanned images into actual texts, images not only contain text, but also layout, graphics, and tables. They may span multiple pages.
Excellent open source OCR software includes:
Tesseract
The image recognition class library Tesseract-OCR originally developed by HP has been updated to 2.04, which is the OCR recently supported by Google. It was originally written by HP and now open source.
Ocropus
Ocropus (TM) is an advanced File Analysis and OCR system that uses pluggable layout analysis, pluggable character recognition, natural language statistical modeling, and multi-language support.
Cuneiform
Cuneiform is a trademark of the OCR Text Recognition System. It was first developed by cognitive technology and runs on Windows. This project is the portable version of the software in Linux.
Gocr
Gocr is an open source OCR Optical recognition program.

Ocrfeeder
Ocrfeeder is an open source OCR suite under GNOME Desktop. You can convert paper or graphic documents into electronic documents.

Lios
Linux-intelligent-OCR-solution (lios) is the next open-source OCR solution in Linux. It can convert printed documents into editable texts.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.