Principles of similar image search (I)

Source: Internet
Author: User
Last month, Google put & quot; similar Image Search & quot; on the homepage. You can use an image to search for all similar images on the Internet. Click the camera icon in the search box. A dialog box appears. If you enter the URL of the online slice or directly upload an image, Google will find a similar image. The following figure shows AlysonH, an American actress.

Last month, Google put "search for similar images" on its homepage.

You can use an image to search for all similar images on the Internet. Click the camera icon in the search box.

A dialog box appears.

If you enter the URL of the online slice or directly upload an image, Google will find a similar image. The following figure shows Alyson Hannigan, an American actress.

After the upload, Google returns the following results:

There are many similar "search engines for similar images". TinEye can even find the photo background.

========================================================== ============================

What is the principle of this technology? How does the computer know that two images are similar?

According to Dr. Neal Krawetz's explanation, the principle is very simple and easy to understand. We can use a quick algorithm to achieve basic results.

The key technology here is "Perceptual hash algorithm". its function is to generate a "fingerprint" (fingerprint) string for each image and then compare the fingerprints of different images. The closer the result is, the more similar the image is.

The following is a simple implementation:

Step 1: reduce the size.

Reduce the image size to 8x8, with a total of 64 pixels. The purpose of this step is to remove the image details, retain only the basic information such as structure and brightness, and discard the image differences caused by different sizes and proportions.

Step 2: simplify the color.

Convert the reduced image to 64-level gray scale. That is to say, all pixels have only 64 colors in total.

Step 3: calculate the average value.

Calculate the average gray scale of all 64 pixels.

Step 4: compare the gray scale of pixels.

Compare the gray scale of each pixel with the average value. If the value is greater than or equal to the average value, it is recorded as 1. if the value is smaller than the average value, it is recorded as 0.

Step 5: calculate the hash value.

The comparison result in the previous step is combined to form a 64-bit integer, which is the fingerprint of the image. The order of the combination is not important, as long as all images are in the same order.

= 8f373714acfcf4d0

After obtaining the fingerprint, you can compare different images to see how many digits are different in the 64-bit format. In theory, this is equivalent to calculating Hamming distance ). If the number of different data bits does not exceed 5, the two images are very similar. if the number is greater than 10, the two images are different.

For specific code implementation, see the imgHash. py written by Wote in python. The code is very short, with only 53 lines. When used, the first parameter is the reference image, and the second parameter is the directory where other images are compared, the returned result is the number of different data bits (Hamming distance) between two images ).

The advantage of this algorithm is that it is simple and fast, and is not affected by the image size scaling. The disadvantage is that the image content cannot be changed. If you add a few texts to the image, it will not recognize it. Therefore, it is best to find the source image based on the thumbnail.

In practical applications, more powerful pHash algorithms and SIFT algorithms are often used to recognize Image deformation. As long as the deformation degree does not exceed 25%, they can match the source image. Although these algorithms are more complex, they share the same principle as the preceding simple algorithms. they are used to convert an image into a Hash string before comparison.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.