Perceptual hash algorithm), which is used to generate a "fingerprint" fingerprint) string for each image and then compare the fingerprints of different images. The closer the result is, the more similar the image is.
Step 1: Reduce the size.
Reduce the image size to 8x8, with a total of 64 pixels. The purpose of this step is to remove the image details, retain only the basic information such as structure and brightness, and discard the image differences caused by different sizes and proportions.
Step 2: simplify the color.
Convert the reduced image to 64-level gray scale. That is to say, all pixels have only 64 colors in total.
Step 3: calculate the average value.
Calculate the average gray scale of all 64 pixels.
Step 4: Compare the gray scale of pixels.
Compare the gray scale of each pixel with the average value. If the value is greater than or equal to the average value, it is recorded as 1. If the value is smaller than the average value, it is recorded as 0.
Step 5: Calculate the hash value.
The comparison result in the previous step is combined to form a 64-bit integer, which is the fingerprint of the image. The order of the combination is not important, as long as all images are in the same order.
After obtaining the fingerprint, you can compare different images to see how many digits are different in the 64-bit format. In theory, this is equivalent to calculating the "Hamming distance" Hamming distance ). If the number of different data bits does not exceed 5, the two images are very similar. If the number is greater than 10, the two images are different.
Advantages: simple and fast, not affected by image size scaling
Disadvantage: The image content cannot be changed.
Purpose: locate the source Image Based on the thumbnail.
The pHash algorithm and the SIFT algorithm can find images with a deformation of no more than 25%.
Source code and Library: http://www.phash.org/download/