Search for similar images
Similar image search refers to how to calculate the similarity between two images. The key technology here is "perceptual hash algorithm ), it generates a "fingerprint" (fingerprint) string for each image, and then compares the fingerprints of different images. The closer the result is, the more similar the image is.
I. Perception Hash Algorithm
1. Reduce the size
Reduce the image size to 8x8, with a total of 64 pixels. The purpose of this step is to remove the image details, retain only the basic information such as structure and brightness, and discard the image differences caused by different sizes and proportions.
2. Simplified colors
Convert the reduced image to 64-level gray scale. That is to say, all pixels have only 64 colors in total.
3. Calculate the average value
Calculate the average gray scale of all 64 pixels.
4. Compare the gray scale of pixels
Compare the gray scale of each pixel with the average value. If the value is greater than or equal to the average value, it is recorded as 1. If the value is smaller than the average value, it is recorded as 0.
5. Calculate the hash value
The comparison result in the previous step is combined to form a 64-bit integer, which is the fingerprint of the image. The order of the combination is not important, as long as all images are in the same order.
After obtaining the fingerprint, you can compare different images to see how many digits are different in the 64-bit format. In theory, this is equivalent to calculating Hamming distance ). If the number of different data bits does not exceed 5, the two images are very similar. If the number is greater than 10, the two images are different.
6. Python implementation
#!/usr/bin/pythonimport globimport osimport sysfrom PIL import ImageEXTS = ‘jpg‘, ‘jpeg‘, ‘JPG‘, ‘JPEG‘, ‘gif‘, ‘GIF‘, ‘png‘, ‘PNG‘def avhash(im): if not isinstance(im, Image.Image): im = Image.open(im) im = im.resize((8, 8), Image.ANTIALIAS).convert(‘L‘) avg = reduce(lambda x, y: x + y, im.getdata()) / 64. return reduce(lambda x, (y, z): x | (z << y), enumerate(map(lambda i: 0 if i < avg else 1, im.getdata())), 0)def hamming(h1, h2): h, d = 0, h1 ^ h2 while d: h += 1 d &= d - 1 return hif __name__ == ‘__main__‘: if len(sys.argv) <= 1 or len(sys.argv) > 3: print "Usage: %s image.jpg [dir]" % sys.argv[0] else: im, wd = sys.argv[1], ‘.‘ if len(sys.argv) < 3 else sys.argv[2] h = avhash(im) os.chdir(wd) images = [] for ext in EXTS: images.extend(glob.glob(‘*.%s‘ % ext)) seq = [] prog = int(len(images) > 50 and sys.stdout.isatty()) for f in images: seq.append((f, hamming(avhash(f), h))) if prog: perc = 100. * prog / len(images) x = int(2 * perc / 5) print ‘\rCalculating... [‘ + ‘#‘ * x + ‘ ‘ * (40 - x) + ‘]‘, print ‘%.2f%%‘ % perc, ‘(%d/%d)‘ % (prog, len(images)), sys.stdout.flush() prog += 1 if prog: print for f, ham in sorted(seq, key=lambda i: i[1]): print "%d\t%s" % (ham, f)
Ii. Color Distribution Method
Each image can generate a color histogram ). If the histograms of the two images are very close, they can be considered very similar.
Any color is composed of three primary colors (RGB), red, green, and blue, so there are a total of four histograms (three primary color histogram + the last merged histogram ).
If 256 values can be set for each primary color, the total color space is 16 million colors (256 to the power of three ). For the comparison histograms of these 16 million colors, the calculation workload is too large. Therefore, a simplified method is required. You can set 0 ~ 255 divided into four zones: 0 ~ Zone 63 is 0th, 64 ~ 127: Zone 1st, 128 ~ 191: Zone 2nd, 192 ~ 255 is zone 3rd. This means that there are four zones in red, green, and blue respectively, and a total of 64 combinations can be formed (the 3rd power of 4 ).
Any color must belong to one of the 64 combinations, so that you can count the number of pixels contained in each combination.
Is the color distribution table of an image. The last column in the table is extracted to form a 64-dimensional vector (7414,230, 0, 0, 8 ,..., 109, 0, 0, 3415,539 29 ). This vector is the feature value of this image or "fingerprint ".
Therefore, searching for similar images becomes the most similar vector. This can be calculated using Pearson correlation coefficient or cosine similarity.
Iii. Content Feature Method
In addition to color composition, you can also start by comparing the similarity of the image content.
First, convert the source image into a small grayscale image, which is assumed to be 50x50 pixels. Then, determine a threshold value and convert the grayscale image into a black-and-white image.
If the two images are similar, their black and white outlines should be similar. As a result, the question becomes: how to determine a reasonable threshold value for the first step to correctly present the outlines in the photo?
Obviously, the larger the contrast between the foreground color and the background color, the more obvious the outline. This means that if we find a value, we can minimize the "intra-Class Variance" (minimizing the intra-Class Variance) of the foreground and background colors ), or "maximizing the inter-Class Variance", the value is the ideal threshold.
In 1979, the Japanese scholar Dajin demonstrated that "minimum intra-class differences" and "maximum inter-class differences" are the same thing, that is, they correspond to the same threshold. He proposed a simple algorithm to obtain the threshold value, which is called the Otsu's method ). The following is his calculation method.
Assume that an image has n pixels in total, among which N1 are pixels whose gray value is less than the threshold, and N2 are pixels whose gray value is greater than or equal to the threshold (N1 + n2 = N ). W1 and W2 indicate the specific proportions of the two pixels.
W1 = N1/n
W2 = n2/n
Assume that the average and variance of all pixels whose gray values are less than the threshold are μ1 and σ 1, respectively, the average and variance of all pixels whose gray values are greater than or equal to the threshold values are μ2 and σ 2, respectively. Therefore, you can obtain
Intra-Class Variance = W1 (σ1's square) + W2 (σ 2's square)
Inter-Class Variance = w1w2 (μ1-μ2) ^ 2
It can be proved that the two formulas are equivalent: Get the minimum value of "intra-Class Variance", equivalent to get the maximum value of "Inter-Class Variance. However, in terms of computing difficulty, the latter is easier to calculate.
In the next step, we use the "exhaustive method" to extract the threshold value from the lowest gray-scale value to the highest value, and import them into the formula above. The value that minimizes the "intra-Class Variance" or "maximum inter-Class Variance" is the final threshold. For specific examples and Java algorithms, see here.
A 50x50 pixel black-and-white thumbnail is equivalent to a 50x50 0-1 matrix. Each value of the matrix corresponds to a pixel of the source image. 0 indicates black and 1 indicates white. This matrix is the feature matrix of an image.
The fewer differences between the two feature matrices, the more similar the two images are. This can be implemented using the "XOR operation" (that is, if only one of the two values is 1, the calculation result is 1; otherwise, the calculation result is 0 ). Perform "exclusive or operation" on the Feature Matrix of different images. The less 1 In the result, the more similar the image is.
Reference:
Http://www.ruanyifeng.com/blog/2011/07/principle_of_similar_image_search.html
Http://www.ruanyifeng.com/blog/2013/03/similar_image_search_part_ii.html
Search for similar images