Author: Caocao (Web hermit), http://www.caocao.name, http://www.caocao.mobi
Reprinted please indicate Source: http://www.javaeye.com/topic/149776
Some websites allow users to upload photos, posters, and other images. as developers, this situation is inevitable, A large part of images uploaded by users are repeated or similar. For this reason, the hermit wants to discuss how to recognize similar images.Algorithm.
What features should such an algorithm have? The main characteristics of the hermit are:
1. You can identify identical images.
2. Image Recognition, rotation, translation, scaling, proportional distortion, and Edge Addition.
3. You can identify some areas inside the image.
4. It can identify partial colors, overexposure, underexposure, blur, and noise.
5. Recognize watermarks.
6. You can recognize images with slight ps.
7. The false recognition rate must be quite low.
8. When recognizing an image, no other images are scanned, and only the feature data extracted from other images can be identified.
It seems that there are many requirements. Let's not talk about it much. Please refer to the figure below. Assuming there is such an algorithm, the hermit wants to input a floating point number as the minimum similarity M, all images with a similarity greater than or equal to m are listed. In the image, 01.jpg is the input image, and the rest are similar to the input image, which are arranged in descending order of similarity.
It seems that the results are not bad, and the requirements are basically met. The hermit will talk about the ideas below. The first step is to extract the image feature data:
1. the hash code of the entire file is used to identify identical images.
2. The rotation angle is not sensitive to data, which is used to prevent rotation and mirroring.
3. the aspect ratio is not sensitive, which is used to prevent scaling and proportional distortion.
4. Partial colors are not sensitive data, which is used to resist partial colors, overexposure, and underexposure.
5. The overall contour data is used to resist Edge Addition, blur, noise, watermarks, and slight ps.
By indexing these feature data for complex operations, you can obtain a comprehensive similarity to quickly recognize images without scanning other images. Because the algorithm is not very mature, we hope that readers and hermit who are interested in this will discuss such an algorithm together. Hermit's MSN: nethermit # Hotmail. Com.