OCR is a broad concept. The market above OCR will always be a continuous development, strong demand direction.
I think, from the difficulty of distinguishing, Chinese OCR difficult to English OCR; handwriting OCR is difficult to print OCR. So the 22 combination, the Chinese handwriting is the hardest (such as hair, some people can not recognize), and the English (digital) printing OCR is the simplest, for the simplest thing, I think you can directly use tesseract, which itself is integrated with the English (digital) recognition. Integrate this functionality, or even research its open source implementations, to fuse it together.
Download http://uniontesting.com/thread-32030-1-1.html
A more detailed introduction from: http://www.cnblogs.com/brooks-dotnet/archive/2010/10/05/1844203.html
If the command line integration, the core code, can achieve a certain effect:
Private void button1_click (object sender, EventArgs e) { string@ " Tesseract.exe"; string @" bbb3.jpg R " ; System.Diagnostics.Process.Start (FILEPATH,PARAMSTR); }
Through the http://blog.csdn.net/fengbingchun/article/details/8493877 can understand, can directly adopt the way of the API to call the OCR, then the degree of integration is further improved! And can be combined with OPENCV for some synchronous parallel processing.
Integration with Tesseract.exe for OCR