A preliminary study on Tesseract

Source: Internet
Author: User

I. Introduction to the Framework

Tesseract is a picture recognition tool, you can grab the text in the picture, you can support multiple languages (the default is English), you need to download open source files can be downloaded in the GitHub, if the knowledge application does not want too much scrutiny directly in Google code inside search download can.

Ii. content of Tesseract

Doc: Description Document

Tessdata: Store a variety of text library (chi_sim.traineddata: Chinese, etc.)

Tessseract.exe can start the file start call Tesseract can be called by CMD, first CD to the corresponding directory, and then enter Tesseract.exe picture name export file name (such as: Tesseract.exe 1.jpg 1) The 1.jpg file recognition results of the same directory as tesseract can be stored in 1.txt files. Of course, if you need to support Chinese, you need to add chi_sim.traineddata files to the Tessdata.

Then invoke the example: Tesseract.exe 1.jpg 1-l Chi_sim is based on what text library to identify. Of course, the picture address can refer to the full path, output can also be a full path if just want to tesseract as a tool to do text parsing, not the pursuit of too high success rate users, so you can meet the needs, directly through the cmd invoke EXE implementation. There is a need to play a small partner can directly download the attachment tesseract_ simple use of. rar files

Three. NET Project Advanced use

If. NET projects need to be developed using DLL-style references and can download Tesseract_dll reference packages.

X86 and X64 are reference DLLs for tesseract associated, adaptive configuration based on the number of system digits in the program.

Liblept168.dll This file will not load the error message when IIS is published in Windows Server 2003: Failed to find the library "Liblept168.dll" for platform x86. Tesseract.DLL is added to the corresponding project if published in a later version of the system to work properly. Here is the test Tesseract.dll code:

1 using (var engine = new Tesseractengine (Server.MapPath (@ "~/tessdata"), "Eng", Enginemode.default))
 2                 {
 3                     //have to load pix via a bitmap since PIX doesn ' t support loading a stream.
 4                     using (var image = new System.Drawing.Bitmap (imageFile.PostedFile.InputStream))
 5                     {
 6                         using ( var pix = Pixconverter.topix (image))
 7                         {
 8                             using (var page = engine. Process (PIX))
 9                             {                                 meanconfidencelabel.innertext = String.Format ("{0:p}", page. Getmeanconfidence ());
One                                 resulttext.innertext = page. GetText ();
{}}                 

Tesseractengine: Constructor Parameter considerations = "The first is that the font path must use the Tessdata end, the second if you need to use the Chi_sim that is Chinese

From:https://www.cnblogs.com/cleanboy/p/4617438.html



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.