The principle of similar image search

Last Update:2016-04-19 Source: Internet

Author: User

Tags vcard

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.ruanyifeng.com/blog/2011/07/principle_of_similar_image_search.html

Http://www.ruanyifeng.com/blog/2013/03/similar_image_search_part_ii.html

The principle of similar image search

Nanyi

Date: July 21, 2011

Last month, Google formally put the "similar image search" on the homepage.

You can use a picture to search all the images on the internet that resemble it. Click the camera's icon in the search box.

A dialog box will appear.

If you enter the Web address or upload the image directly, Google will find a similar image. The following picture is an American actress Alyson Hannigan.

After uploading, Google returns the following results:

Similar "similar image search engine" still has a lot of, tineye even can find out the photograph background.

==========================================================

What is the principle of this technology? How does a computer know that two pictures are similar?

According to Dr. Neal Krawetz's explanation, the principle is very simple and understandable. We can use a fast algorithm to achieve the basic effect.

The key technique here is called the "Perceptual Hashing algorithm" (perceptual hash algorithm), which generates a "fingerprint" (fingerprint) string for each image, and then compares the fingerprints of different images. The closer the result is, the more similar the picture is.

Here is one of the simplest implementations:

The first step is to reduce the size.

Reduce the image to 8x8 's size, a total of 64 pixels. The role of this step is to remove the details of the picture, only the structure, shading and other basic information, discard the different sizes, proportions of the picture differences.

The second step is to simplify the color.

Converts the zoomed-in image to a level 64 grayscale. That is, all pixels have a total of 64 colors.

The third step is to calculate the average.

Calculates a grayscale average of all 64 pixels.

Fourth step, compare the grayscale of the pixel.

The grayscale of each pixel is compared to the average. Greater than or equal to the average, recorded as 1, less than the average, recorded as 0.

Fifth step, calculate the hash value.

By combining the results of the previous step, you make up a 64-bit integer, which is the fingerprint of the image. The order of the combinations is not important, just make sure all the pictures are in the same order.

= = 8f373714acfcf4d0

After getting the fingerprint, you can compare different pictures and see how many of the 64 bits are not the same. In theory, this equates to the calculation of "Hamming distance" (Hamming distance). If the data bits are not more than 5, the two images are similar, and if they are greater than 10, they are two different pictures.

The specific code implementation can be found in the wote written in Python language imghash.py. The code is short, only 53 lines. When used, the first parameter is the base picture, the second parameter is the directory in which the other images are compared, and the result is a different number of data bits (Hamming distance) between the two pictures.

The advantages of this algorithm are simple and fast, not affected by the size of the picture, the disadvantage is that the contents of the picture can not be changed. If you add a few words to the picture, it will not be recognized. So, it's best to use thumbnails to find out the original image.

In practical applications, more powerful phash algorithms and sift algorithms are often used to identify the deformation of images. As long as the degree of deformation does not exceed 25%, they can match the original image. Although these algorithms are more complex, the principle is the same as the simple algorithm above, that is, to first convert the image into a hash string, and then compare.

UPDATE (2013.03.31)

This article has a sequel, please see here.

Finish

Http://www.ruanyifeng.com/blog/2013/03/similar_image_search_part_ii.html

Nanyi

Date: March 31, 2013

Two years ago, I wrote the principle of similar image search, and introduced the simplest method of implementation.

Yesterday, I saw on the Isnowfy website, there are two other methods also very simple, here do some notes.

First, the color distribution method

Each picture can generate a histogram of color distributions (histogram). If the histogram of the two pictures is close enough, you can think of them as similar.

Any one color is made up of red, green, and Blue (RGB), so there are 4 histogram (the histogram of the primary color + the last synthesized straight-side graph).

If you can take 256 values for each primary color, there are 16 million colors (256 of three) for the entire colour space. For these 16 million kinds of color comparison histogram, the calculation is too large, so the need to adopt a simplified method. 0~255 can be divided into four districts: 0~63 is the No. 0 district, 64~127 is 1th, 128~191 is 2nd, and 192~255 is the 3rd area. This means that red and green blue have 4 zones, which can form a total of 64 combinations (4 of 3).

Any one color is bound to belong to one of these 64 combinations, so you can count the number of pixels that each combination contains.

is a picture of the color distribution table, the last column in the table is extracted, composed of a 64-dimensional vector (7414, 230, 0, 0, 8, ..., 109, 0, 0, 3415, 53929). This vector is the characteristic value of this image or "fingerprint".

So, looking for a similar image becomes the vector that finds its closest resemblance. This can be calculated using Pearson correlation coefficients or cosine similarity.

Second, the content characteristic method

In addition to the color composition, you can also start by comparing the similarity of the picture content.

First, turn the original into a smaller grayscale image, assuming 50x50 pixels. Then, determine a threshold value to turn the grayscale image into black and white.

If the two images are similar, their black and white contours should be similar. So the question becomes, how does the first step determine a reasonable threshold and correctly present the outline in the photo?

Obviously, the larger the contrast between foreground and background color, the more obvious the contour. This means that if we find a value, we can make the difference between the foreground and background colors "Within the class" (Minimizing the Intra-class variance), or the "Maximum difference between classes" (Maximizing the Inter-class Variance), then this value is the ideal threshold value.

In 1979, the Japanese scholar Dajing show proved that "the smallest difference between classes" and "the most difference between classes" is the same thing, that is, the same threshold value. He proposes a simple algorithm that can be used to find this threshold, known as the "Dajing" (Otsu's method). Here's how he calculates it.

Suppose a picture has n pixels, where the grayscale value is less than the threshold pixel is N1, and pixels greater than or equal to the threshold are N2 (n1 + n2 = N). W1 and W2 represent the respective weights of the two pixels.

W1 = n1/n

W2 = n2/n

It is assumed that the mean and variance of all pixels with grayscale values less than the threshold are μ1 and σ1, respectively, and the average and variance of all pixels with a grayscale value greater than or equal to the threshold is μ2 and σ2. So, you can get

Intra-Class differences = W1 (σ1 squared) + W2 (σ2 squared)

inter-class differences = W1W2 (μ1-μ2) ^2

It can be proved that the two formulas are equivalent: The minimum value of the "in-class difference" is obtained, which equates to the maximum value of the "inter-class difference". However, from the computational difficulty, the latter is easier to calculate.

The next step is to use the "exhaustive method", the threshold from the lowest value of the gray level to the highest value, then take again, respectively, into the above formula. The value that makes the "smallest difference within the class" or "Maximum difference between classes" is the final threshold value. For specific examples and Java algorithms, see here.

With the black and white thumbnails of 50x50 pixels, it is equivalent to having a 50x50 0-1 matrix. Each value of the matrix corresponds to one pixel of the original, 0 for Black, and 1 for white. This matrix is the characteristic matrix of a picture.

The fewer of the two feature matrices are, the more similar the two images represent. This can be achieved with "XOR" (that is, only one of the two values is 1, then the result of the operation is 1, otherwise the result of the operation is 0). For different images of the characteristics of the matrix to "XOR", the results of less than 1, is the more similar pictures.

Finish

Principle of similar image search

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More