Image Search (i): Python implements the Dhash algorithm

Source: Internet
Author: User

Recently studied the image of this cool thing to search. Baidu and Google have to provide a map of the function of the search, there is interest can be found. Of course, not very deep. In depth, we have to apply the goods to deep learning. Python deep learning is certainly a cinch.

The most important thing about this feature is how to make a computer recognize pictures.

This problem is also bothering me at the chance to see the hash-aware algorithm. There are two types, one is the basic mean hash-aware algorithm (Dhash), and one is the cosine transform hash-aware algorithm (Phash). Dhash is my own name, in order to differentiate it from the Phash. In both of these ways, I've implemented ^_^ in Python.

The basic principles of the hash-aware algorithm are:

1. Turn the picture into a recognizable string, also called the hash value

2. Matching strings with other images

The algorithm is not playing lip on the line, the focus is how to turn the picture into a recognizable string. (Despise the online copy to copy the article, even the words are identical) take a picture example.

First, reduce the image to 8x8 size and change to grayscale mode. This is to blur the processing of the picture and reduce the amount of computation.

8x8 picture is too small, enlarge the picture for everyone to see.

An 8x8-sized picture is a 64-pixel value. Calculates the average of these 64 pixels, further reducing noise processing.

Pixel value =[

247, 245, 250, 253, 251, 244, 240, 240,

247, 253, 228, 208, 213, 243, 247, 241,

252, 226, 97, 80, 88, 116, 231, 247,

255, 172, 79, 65, 51, 58, 191, 255,

255, 168, 71, 60, 53, 69, 205, 255,

255, 211, 65, 58, 56, 104, 244, 252,

248, 253, 119, 42, 53, 181, 252, 243,

244, 240, 218, 175, 185, 230, 242, 244]

Average =185.359375

After the average is obtained, it is compared with each pixel. A pixel value greater than the average is marked as 1, which is less than or equal to the average value of 0. A string that consists of 64 numbers (which looks like a string of binary).

Noise reduction Result =[

1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 0, 0, 0, 0, 1, 1,

1, 0, 0, 0, 0, 0, 1, 1,

1, 0, 0, 0, 0, 0, 1, 1,

1, 1, 0, 0, 0, 0, 1, 1,

1, 1, 0, 0, 0, 0, 1, 1,

1, 1, 1, 0, 0, 1, 1, 1]

64-bit string = ' 1111111111111111110000111000001110000011110000111100001111100111 '

Since 64 bits are too long, it is troublesome to compare them. Each of the 4 characters is 1 groups, and the 2 is converted into 16 binary. This leaves a string with a length of 16. This string is the hash value that the image recognizes.

Hash value = ' ffffc38383c3c3e7 '

The Python code is as follows:

#coding: Utf-8#!usr/bin/python2.7"" "Author:haddy Yang (Yang Shi Ai) data:2016-04-01 (code is written April 1, middle too busy, now released) Description:get image ' s hash valueenviron: python2.7 and python2.6 "" "#yum Install python-imaging (installing PIL Image Library, python2.6)#PIL官方提供的Python2.6 can be used, 2.7 not.#Python2.7 can be used Pillow Library#import Image #python2.6 PILFromPILImport Image #python2.7 PillowImportSysDefGet_hash(Image_path): "" "Get Image Hash String" "Im= Image.Open(Image_path) #antialias anti-aliasing #convert Convert L represents grayscaleIm=Im.Resize((8, 8), Image.AntiAlias).Convert(L) #avg: average of pixelsAvg=Sum(List(Im.GetData()))/64.0 #avg和每个像素比较, get a string of length 64Str=‘‘.Join(Map(LambdaI: ' 0 ' IfI<AvgElse ' 1 ',Im.GetData())) #str切割, one group per 4 characters, and 16 binary characters Return ‘‘.Join(Map(LambdaX:'%x ' % Int(Str[X:X+4],2),Range(0,64,4)))if __name__ ==  ' __main__ ' : if len  (sys. Argv2: print  ' #sample: python imghash.py filename '  else: print Get_hash ( sys.[1])        

Look at the hash values of the other pictures:

B.jpg:fff3fbe1e181c3ff

C.jpg:ffffdf818080d9f9

D.jpg:ffffcfc7c7c3c7ef

The hashes of the 3 images are compared to the hashes of the a.jpg (the image for example), respectively. The method of comparison uses Hamming distance: the number of characters in the same position is different. For example A.jpg and b.jpg contrast

There are 11 different characters, then Hamming distance is 11. The smaller the Hamming distance, the more familiar the picture shows. More than 10 means the picture is quite different.

The Hamming distance of a.jpg and c.jpg is 8;

The Hamming distance of A.jpg and D.jpg is 7.

Description in these 3 images, d.jpg and a.jpg are most similar.

The approximate algorithm is this, the Hamming distance code I did not give, this is relatively simple. It is generally done in the database to calculate, to get a smaller number of those image-aware hashes.

Of course, this algorithm is seldom used in practical applications because the algorithm is more sensitive. The same picture rotates at a certain angle or deformation, the hash value is very different. However, it is the fastest to calculate, and can often be used to find thumbnails.

In the next blog post, we describe the python implementation of the cosine hash-aware algorithm. This algorithm can be used more in practice. "Image Search (ii): Python implementation Phash algorithm"

(Original blog, reproduced please specify from yshblog.com)

Image Search (i): Python implements the Dhash algorithm (RPM)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.