Image Recognition exercise (character verification code, license plate number, ID card number)
Byoy 2012
You are welcome to discuss related issues with me. Contact: 1429013154
Code here (note that this version is not the final version)
Optical character recognition (OCR) is a very useful technology. In terms of verification code recognition, license plate number recognition, and text recognition, character-based recognition technology is relatively easy to use (compared with text recognition ).
When I saw a friend studying the verification code recognition, I had a hand itch, and byoy did it on his own. Of course, it must be a simple verification code.
The verification code is actually not limited. It can also identify the license plate number, ID card number, house number, and other messy content.
The identification process is clear:
1. Pre-processing images
2. Perform Y-axis projection.
3. Analyze histogram partitioning
4. Split the image into multiple characters based on the partition (the key is, the better the split, the higher the recognition rate in the future)
5. Discard blank or invalid characters
6. Automatically rotate characters (if skewed) to recognize characters
If the image in the sample has adhesion, the partition may be inaccurate. In this case, it is difficult to rotate automatically.
Currently, characters can be separated. Next, we will study how to identify them. (If a single character is more standard, you can use the ready-made OCR Control)
Here are some examples.
Common Verification Code (no difficulty)
Verification code with interference
High-Intensity Interference (currently, the partition algorithm cannot be used and better algorithms, such as dynamic thresholds, are needed)
Csdn Verification Code (No pressure)
ID card number
License plate number
Add a QQ Verification Code. It is difficult to identify using a single threshold. It must be determined based on the character width.
This is the result of a single threshold partition (no limit on the width), and the effect is poor.
Continue to study how to optimize the Partition Algorithm and how to recognize a single text (multiple recognition + sample training can be considered ).
The verification code of the Pacific website is attached.
Some adhesion, but can be solved by fixed character width (basically the same width)
Refer to this figure (obtain the entire width, divide by the number of characters to get each width, extracted separately)
Binarization the Otsu algorithm I used. References: "A threshold selection method from gray-level histograms", IEEE Trans. systems, man and cybernetics 9 (1), pp. 62-66,197 9
For the verification code, this article is very good. For details, refer to: "text-based CAPTCHA strengths and weaknesses", ACM computer and communication security 2011 (CSS '123)
Byoy 2012
Improved decontamination Algorithm
Sewage license plate number split characters