The image recognition technology has been very mature for a few days, but there are very few relevant materials. In order to facilitate the summary here (C # implementation), it is convenient for friends who need it to check it, and also makes a mark for themselves.
The purpose of Image Recognition: many people use it to crack the website's verification code for Automatic Ticketing or batch registration, however, I think what attracts me most is the ability to automatically recognize something written on a computer, such as a contract written by hand, a modified written paper or document, the monthly expense invoice must be input or summarized on the computer, and the articles in the diary should be transferred to the computer, now we don't have to worry about writing it on the computer.
This article introduces two mainstream and mature recognition methods:
Method 1: Asprise-OCR.
Method 2: Microsoft Office Document Imaging (Office 2007) component implementation.
Method 1: Use Asprise-OCR.
Asprise-OCR: aspriseOCR.zip
The three dll files are AspriseOCR. dll, dedevil. dll, and ILU. dll.
Note that the references written by these. dll files must be referenced by DllImport in the program. The key code is as follows:
[DllImport("AspriseOCR.dll", EntryPoint = "OCR", CallingConvention = CallingConvention.Cdecl)]public static extern IntPtr OCR(string file, int type);[DllImport("AspriseOCR.dll", EntryPoint = "OCRpart", CallingConvention = CallingConvention.Cdecl)]static extern IntPtr OCRpart(string file, int type, int startX, int startY, int width, int height);[DllImport("AspriseOCR.dll", EntryPoint = "OCRBarCodes", CallingConvention = CallingConvention.Cdecl)]static extern IntPtr OCRBarCodes(string file, int type);[DllImport("AspriseOCR.dll", EntryPoint = "OCRpartBarCodes", CallingConvention = CallingConvention.Cdecl)]static extern IntPtr OCRpartBarCodes(string file, int type, int startX, int startY, int width, int height);
The calling code is simple and has only one sentence:
MessageBox.Show(Marshal.PtrToStringAnsi(OCRpart(img_path, -1, startX, startY, width, height)));
Here, img_path: indicates the image path. The coordinates of startX and startY are all 0. width and height indicate the width and height of the image.
Method 2: Microsoft Office Document Imaging (Office 2007) component implementation.
Before using it, we need to tell you that the Imaging component is not compatible well. To use Windows 7 office 2007, you must install the office 2007 sp1 or sp2 patch to read Chinese characters.
Patch sp1 address (226 M ):
Http://download.microsoft.com/download/1/6/5/1659d607-8696-4001-8072-efaedd70dd30/office2007sp1-kb936982-fullfile-zh-cn.exe
Sp2 patch address (301 MB ):
Http://download.microsoft.com/download/A/3/9/A39E919E-AFA8-4128-9249-51629206C70F/office2007sp2-kb953195-fullfile-zh-cn.exe
Add component references to a project,
Code:
MODI. document doc = new MODI. document (); doc. create (img_Path); MODI. image image; MODI. layout layout; doc. OCR (MODI. miLANGUAGES. miLANG_CHINESE_SIMPLIFIED, true, true); // identifies Simplified Chinese for (int I = 0; I <doc. images. count; I ++) {image = (MODI. image) doc. images [I]; layout = image. layout; sb. append (layout. text);} MessageBox. show (sb. toString ());
Here, img_Path is the image path, and MODI. MiLANGUAGES is the enumerated text type for reading images.
Source: http://files.cnblogs.com/stone_w/OCR.rar
Han Wang pdf ocr V8.1 Simplified Chinese version (has been used, it feels good to recommend): http://xiazai.zol.com.cn/detail/34/335364.shtml