The principle of similar image search

Last Update:2014-12-24 Source: Internet

Author: User

Keywords Can pixel algorithm compare

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Last month, Google officially put "similar image search" on the homepage.

You can use a picture to search all the pictures on the internet similar to it. Click on the camera icon in the search box.

A dialog box appears.

You enter the Web site, or upload the image directly, Google will find its similar image. Here's a picture of American actress Alyson Hannigan.

After uploading, Google returns the following results:

Similar "similar image search engines" have a lot of tineye can even find out the background of the photograph.

==========================================================

What is the principle of this technique? How does a computer know that two pictures are similar?

The principle is very straightforward, according to Dr. Neal Krawetz. We can use a fast algorithm to achieve basic results.

The http://www.aliyun.com/zixun/aggregation/30779.html "> key technology here is called the" Perceptual Hashing algorithm "(perceptual hash Algorithm), Its role is to generate a "fingerprint" (fingerprint) string for each picture, and then compare the fingerprints of different pictures. The closer the result is, the more similar the picture is.

The following is one of the simplest implementations:

The first step is to reduce the size.

Reduces the picture to the 8x8 size, which is 64 pixels in total. The role of this step is to remove the details of the picture, only to retain the structure, light and other basic information, discard the different size, proportion of the picture differences.

The second step is to simplify the color.

The reduced image will be converted to level 64 grayscale. That is, all pixels have only 64 colors in total.

The third step is to calculate the average.

Calculates the grayscale average of all 64 pixels.

The fourth step is to compare the pixel grayscale.

Compare the grayscale of each pixel with the average. is greater than or equal to the average, recorded as 1, less than the average, and 0.

The fifth step is to compute the hash value.

The comparison of the previous step, grouped together, constitutes a 64-bit integer, which is the fingerprint of this picture. The order of the combinations is not important, just make sure all the pictures are in the same order.

= = 8f373714acfcf4d0

After you get your fingerprints, you can compare the different pictures to see how many of the 64 digits are not the same. In theory, this equates to calculating "Hamming distance" (Hamming distance). If the different data bits are not more than 5, it means that two pictures are similar; if greater than 10, this is two different pictures.

Specific code implementations, you can see wote in Python language imghash.py. The code is very short, only 53 lines. When used, the first parameter is the base picture, the second parameter is the directory of the other pictures to compare, and the result is a different number of data bits (Hamming distance) between the two pictures.

The advantage of this algorithm is simple and fast, not affected by the size of the picture, the disadvantage is that the contents of the picture can not be changed. If you add a few words to the picture, it will not recognize it. So, its best use is to find the original image based on the thumbnail.

In practical applications, more powerful phash algorithms and sift algorithms are often used to recognize the distortion of images. As long as the degree of deformation is not more than 25%, they can match the original. These algorithms are more complex, but the principle is the same as the simple algorithm above, which is to convert the picture into a hash string before comparing it.

Two years ago, I wrote "The principle of similar image search", introduced a simple implementation method.

Yesterday, I saw on the website of Isnowfy, there are two other methods is also very simple, here do some notes.

Color Distribution method

Each picture can produce a histogram (color histogram) of the distribution of colors. If the histogram of two pictures is very close, you can think of them very similar.

Any color is composed of red, green and blue three primary colors (RGB), so there are 4 histograms (three primary colors histogram + The final synthesis of the straight square chart).

If each primary color can take 256 values, then the whole colour space has 16 million kinds of colors (256 of three times). For these 16 million colors to compare histograms, the amount of calculation is too large, so need to adopt a simplified approach. The 0~255 can be divided into four districts: 0~63 is the No. 0 District, the 64~127 is 1th, the 128~191 is the 2nd district, and the 192~255 is the 3rd district. This means that red, green and blue have 4 separate areas, a total of 64 combinations (4 of 3).

Any color must belong to one of these 64 combinations, so that you can count the number of pixels each combination contains.

Above is a picture of a color distribution table, the table to extract the last column, composed of a 64-D vector (7414, 230, 0, 0, 8, ..., 109, 0, 0, 3415, 53929). This vector is the characteristic value of this picture or "fingerprint".

So, looking for a similar picture becomes a vector to find the most phase. This can be calculated by Pearson correlation coefficient or cosine similarity.

Ii. Content Feature method

In addition to the color composition, you can also compare the image content of the similarity to start.

First, turn the original image into a smaller grayscale picture, assuming 50x50 pixels. Then, determine a threshold value and turn the grayscale picture into a black-and-white picture.

If the two pictures are similar, their black-and-white contours should be similar. So the question becomes, how does the first step determine a reasonable threshold that correctly renders the contours of a photograph?

Obviously, the greater the contrast between the foreground color and the background color, the more obvious the contour. This means that if we find a value, we can make the foreground and background colors each have the smallest difference within the class (minimizing the Intra-class variance), or "Maximum difference between classes" (Maximizing the Inter-class Variance), then this value is the ideal threshold.

In the 1979, the Japanese scholar Dajing exhibition proved that "the smallest difference between classes" and "the largest difference between classes" is the same thing, that is, corresponding to the same threshold value. He proposes a simple algorithm that can be used to find this threshold, which is called the "Dajing Method" (Otsu ' methods). Here's how he calculates.

Suppose that a picture has n pixels, where the pixels with a grayscale value less than the threshold are N1, and the pixels greater than or equal to the threshold are N2 (n1 + n2 = N). W1 and W2 represent the respective proportions of the two pixels.

W1 = n1/n

W2 = n2/n

Again, the average and variance of pixels with all gray values less than the threshold are μ1 and σ1, and the mean and variance of all pixels with a grayscale value greater than or equal to the threshold are μ2 and σ2 respectively. So, you can get

Class difference = W1 (σ1 squared) + W2 (Square of σ2)

difference between classes = W1W2 (μ1-μ2) ^2

It can be proved that these two equations are equivalent: Get the minimum value of "Difference within class", which is equivalent to the maximum value of "difference between classes". However, from the computational difficulty, the latter is easier to calculate.

The next step is to use the "exhaustive method", the threshold from the gray level of the lowest value to the highest value, in turn, take it, respectively, into the above formula. The value that makes the "least difference in class" or "the maximum difference between classes" is the final threshold. Specific examples and Java algorithms, please see here.

With the black and white thumbnails of 50x50 pixels, there is a 50x50 0-1 matrix. Each value of the matrix corresponds to a pixel of the original image, 0 indicates black, and 1 is white. This matrix is a feature matrix of a picture.

The less the difference of the two feature matrices, the more similar the two pictures are. This can be implemented with "XOR" (that is, only one of the two values is 1, the result is 1, otherwise the result is 0). For different pictures of the feature matrix "XOR", the result of less than 1, is the more similar picture.

Finish）

Document Information Copyright statement: Free Reprint-Non-commercial-non-derivative-keep the signature | Creative Commons by-nc-nd 3.0 Original website: Http://www.ruanyifeng.com/blog/2013/03/similar_image_search_part_ Ii.html Last modified: April 1, 2013 16:04

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More