Three hash algorithms for similar image search

Source: Internet
Author: User

I think everyone will use Google or Baidu's image recognition function. The above is the result of my search for a picture by Edison. There are three algorithms for comparing images and comparing information fingerprints, these algorithms are easy to understand. The following describes them respectively:


I. Average hash algorithm (ahash)

This algorithm is implemented based on comparing the pixels and average values of grayscale images. It is most suitable for thumbnails and enlarged image search.

Steps:

1. resize the image: To retain the structure and remove the details, remove the differences between the size and the vertical ratio, and uniformly scale the image to 8x8, with a total of 64 pixels.

2. Convert to grayscale image:Converts the scaled image to a grayscale image of the 10th level.

Add grayscale algorithm (r = red, G = green, B = blue)

1. floating Point Algorithm: Gray = r * 0.3 + G * 0.59 + B * 0. 112. integer method: Gray = (R * 30 + G * 59 + B * 11)/1003. shift Method: Gray = (R * 76 + G * 151 + B * 28)> 8; 4. average Method: Gray = (R + G + B)/3; 5. green only: Gray = g;

3. Calculate the average value: Calculate the average value of all pixels in the image after grayscale processing.

4. Compare pixel gray values: Traverses each pixel of a grayscale image. If the value is greater than the average value, the value is 1. Otherwise, the value is 0.

5. obtain the information fingerprint:64 bits are combined, and the order can be consistent at will.

6. Comparison fingerprint:Calculate the fingerprints of the two images, and calculate the Hamming distance (several times from one fingerprint to another). The larger the Hamming distance, the more inconsistent the images, the smaller the Hamming distance, the more similar the image is. If the distance is 0, the image is identical. (Generally, the distance greater than 10 is two completely different images)


The following is the program of this algorithm I wrote in Java. eclipse can run it directly.

:Http://download.csdn.net/detail/nash_/5093143

Source image to be compared:




Four images in the Image Library:



Output result:

Similar_pic.jpg is rarely similar to the source Image
Google.gif is completely different from the source image.
Origin.jpg is the same image as the source image.
Ohter_word.jpg is very similar to the source Image


Ii. Perception hash algorithm (phash)

The average hash algorithm is too strict and not accurate enough. It is more suitable for searching for thumbnails. To obtain more accurate results, you can select the perception hash algorithm, which uses DCT (discrete cosine transformation) to reduce the frequency

Steps:

1. Zoom out the image:32*32 is a good size, which facilitates DCT calculation.

2. Convert to grayscale image:Converts the scaled image to a grayscale image of the 10th level. (For specific algorithms, see the average hash algorithm steps)

3. Calculate DCT:A set of DCT image separation component rates

4. Zoom out DCT:DCT is 32*32, and the 8*8 in the upper left corner is retained. These represent the lowest frequency of the image.

5. Calculate the average value:Calculate the average value of all pixels after downgrading the DCT.

6. Further reduce DCT: If the average value is greater than 1, the reverse value is 0.

7. obtain the information fingerprint:64 Information bits are combined, and the order can be consistent at will.

8. Comparison fingerprint:Calculate the fingerprints of the two images, and calculate the Hamming distance (several times from one fingerprint to another). The larger the Hamming distance, the more inconsistent the images, the smaller the Hamming distance, the more similar the image is. If the distance is 0, the image is identical. (Generally, the distance greater than 10 is two completely different images)

This algorithm can be referred to the open source project phash,: http://www.phash.org/download/


Iii. Differential hash algorithm (dhash)

Compared with phash, dhash provides a much faster speed. Compared with ahash, dhash achieves better performance under almost the same efficiency. It is implemented based on gradient.

Steps:

1. Zoom out the image:Scale down to 9*8, with 72 pixels per time

2. Convert to grayscale image:Converts the scaled image to a grayscale image of the 10th level. (For specific algorithms, see the average hash algorithm steps)

3. Calculate the difference value:The dhash algorithm works between adjacent pixels, so there are 8 different differences between nine pixels in each line. A total of 8 rows produce 64 Difference values.

4. Fingerprint retrieval:If the left pixel is brighter than the right pixel, the record is 1; otherwise, it is 0.

It must be noted that this fingerprint algorithm can be applied not only to image search, but also to other multimedia formats. In addition, there are many image search feature extraction methods and many algorithms can be improved. For example, a person can perform face recognition first and then perform local hashing in the face area, or if the background is solid, you can filter and crop the background first. Finally, you can filter the background based on the color, landscape, and products in the search results.


========================================================== ========================================================== ============================

Author: Nash _ Welcome to repost. sharing with others is the source of progress!

Reprinted Please retain the original address: http://blog.csdn.net/nash_/article/details/8618775

========================================================== ========================================================== ==============================

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.