Study on Algorithms for recognizing similar images

Source: Internet
Author: User
Author: Caocao (Web hermit), http://www.caocao.name, http://www.caocao.mobi
Reprinted please indicate Source: http://www.javaeye.com/topic/149776

Some websites allow users to upload photos, posters, and other images. as developers, this situation is inevitable, A large part of images uploaded by users are repeated or similar. For this reason, the hermit wants to discuss how to recognize similar images.Algorithm.

What features should such an algorithm have? The main characteristics of the hermit are:
1. You can identify identical images.
2. Image Recognition, rotation, translation, scaling, proportional distortion, and Edge Addition.
3. You can identify some areas inside the image.
4. It can identify partial colors, overexposure, underexposure, blur, and noise.
5. Recognize watermarks.
6. You can recognize images with slight ps.
7. The false recognition rate must be quite low.
8. When recognizing an image, no other images are scanned, and only the feature data extracted from other images can be identified.

It seems that there are many requirements. Let's not talk about it much. Please refer to the figure below. Assuming there is such an algorithm, the hermit wants to input a floating point number as the minimum similarity M, all images with a similarity greater than or equal to m are listed. In the image, 01.jpg is the input image, and the rest are similar to the input image, which are arranged in descending order of similarity.

It seems that the results are not bad, and the requirements are basically met. The hermit will talk about the ideas below. The first step is to extract the image feature data:
1. the hash code of the entire file is used to identify identical images.
2. The rotation angle is not sensitive to data, which is used to prevent rotation and mirroring.
3. the aspect ratio is not sensitive, which is used to prevent scaling and proportional distortion.
4. Partial colors are not sensitive data, which is used to resist partial colors, overexposure, and underexposure.
5. The overall contour data is used to resist Edge Addition, blur, noise, watermarks, and slight ps.

By indexing these feature data for complex operations, you can obtain a comprehensive similarity to quickly recognize images without scanning other images. Because the algorithm is not very mature, we hope that readers and hermit who are interested in this will discuss such an algorithm together. Hermit's MSN: nethermit # Hotmail. Com.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.