Ocrodjvu 0.7.6 Release OCR system Packaging program

Source: Internet
Author: User
Keywords DJVU OCRODJVU OCR System packaging program
Tags code data files high html5 http information optical character recognition

Ocrodjvu is a packaging program for OCR systems, mainly for performing OCR systems on DjVu files.

Ocrodjvu 0.7.6 This version updates tesseract≥3.00, the bounding box for a particular character is now extracted with high precision. You can choose to use a HTML5 parser.

Code Demonstration:

$ Wget-q ' http://ocropus.googlecode.com/svn/trunk/data/pages/alice_1.png '
$ gm convert-threshold 50% ' alice_1.png ' ALICE.PBM '
$ Cjb2 ' alice.pbm ' Alice.djvu '
$ ocrodjvu--in-place ' Alice.djvu '
處理 ' Alice.djvu ':
-Page #1
$ djvused-e print-txt ' Alice.djvu ' | Head-n10
(page 0 0 2488 3507
(line 470 2922 1383 2978
(Word 470 2927 499 2976 "1")
(Word 588 2926 787 2978 "down")
(Word 817 2925 927 2977 "the")
(Word 959 2922 1383 2976 "Rabbit-hole"))
(Line 465 2803 2073 2856
(Word 465 2819 569 2856 "Alice")
(Word 592 2819 667 2841 "was")
(Word 690 2808 896 2854 "beginning")

About OCR

OCR (optical Character recognition, optical character recognition) is the process by which electronic devices (such as scanners or digital cameras) examine printed characters on paper, determine their shape by detecting dark, bright patterns, and then translate shapes into computer text using character recognition methods; The text data is scanned, then the image file is analyzed and processed, and the process of text and layout information is obtained. How to debug or use auxiliary information to improve the correct rate of recognition is the most important subject of OCR, and the noun of ICR (intelligent Character recognition) is also produced. The main indicators of the performance of an OCR system are: rejection rate, false recognition rate, identification speed, user-friendly interface, product stability, ease of use and feasibility.

Software Information: Http://jwilk.net/software/ocrodjvu

Download Address: http://pypi.python.org/packages/source/o/ocrodjvu/ocrodjvu-0.7.6.tar.gz

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.