The principle of similar image search

Source: Internet
Author: User
Tags vcard

Http://www.ruanyifeng.com/blog/2011/07/principle_of_similar_image_search.html

Nanyi

Date: July 21, 2011

Last month, Google formally put the "similar image search" on the homepage.

You can use a picture to search all the images on the internet that resemble it. Click the camera's icon in the search box.

A dialog box will appear.

If you enter the Web address or upload the image directly, Google will find a similar image. The following picture is an American actress Alyson Hannigan.

After uploading, Google returns the following results:

Similar "similar image search engine" still has a lot of, tineye even can find out the photograph background.

==========================================================

What is the principle of this technology? How does a computer know that two pictures are similar?

According to Dr. Neal Krawetz's explanation, the principle is very simple and understandable. We can use a fast algorithm to achieve the basic effect.

The key technique here is called the "Perceptual Hashing algorithm" (perceptual hash algorithm), which generates a "fingerprint" (fingerprint) string for each image, and then compares the fingerprints of different images. The closer the result is, the more similar the picture is.

Here is one of the simplest implementations:

The first step is to reduce the size.

Reduce the image to 8x8 's size, a total of 64 pixels. The role of this step is to remove the details of the picture, only the structure, shading and other basic information, discard the different sizes, proportions of the picture differences.

The second step is to simplify the color.

Converts the zoomed-in image to a level 64 grayscale. That is, all pixels have a total of 64 colors.

The third step is to calculate the average.

Calculates a grayscale average of all 64 pixels.

Fourth step, compare the grayscale of the pixel.

The grayscale of each pixel is compared to the average. Greater than or equal to the average, recorded as 1, less than the average, recorded as 0.

Fifth step, calculate the hash value.

By combining the results of the previous step, you make up a 64-bit integer, which is the fingerprint of the image. The order of the combinations is not important, just make sure all the pictures are in the same order.

= = 8f373714acfcf4d0

After getting the fingerprint, you can compare different pictures and see how many of the 64 bits are not the same. In theory, this equates to the calculation of "Hamming distance" (Hamming distance). If the data bits are not more than 5, the two images are similar, and if they are greater than 10, they are two different pictures.

The specific code implementation can be found in the wote written in Python language imghash.py. The code is short, only 53 lines. When used, the first parameter is the base picture, the second parameter is the directory in which the other images are compared, and the result is a different number of data bits (Hamming distance) between the two pictures.

The advantages of this algorithm are simple and fast, not affected by the size of the picture, the disadvantage is that the contents of the picture can not be changed. If you add a few words to the picture, it will not be recognized. So, it's best to use thumbnails to find out the original image.

In practical applications, more powerful phash algorithms and sift algorithms are often used to identify the deformation of images. As long as the degree of deformation does not exceed 25%, they can match the original image. Although these algorithms are more complex, the principle is the same as the simple algorithm above, that is, to first convert the image into a hash string, and then compare.

UPDATE (2013.03.31)

This article has a sequel, please see here.

Finish

Document Information

The principle of similar image search (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.