Recently, looking at the movement target tracking data, occasionally saw the zouxy09 god of a "based on the perceptual hashing algorithm visual tracking", it is very interesting. So I went to the relevant information of the perceptual hash, found that in the field of image search is very wide application, this technology I think a bit like template matching, the core is: through the similarity of fingerprints to search the most similar pictures.
The C + + program based on ZOUXY09 has rewritten a Python version and added the implementation of the difference hash.
——————————————————————————————————————————
First, the algorithm principle and the process
1.ahash algorithm
In the process of sampling down, only the low-frequency information of the image is preserved.
Working process:
① reduces size, simplifies color: 8*8 grayscale
② calculates the mean value of gray: the average of 64 gray values
③ Generate hash code: Compares 64 grayscale values with the average generated in the previous step. The ratio of the mean value is large, then the 1 is lower than the mean value, then 0 is placed. To get a "fingerprint" containing 64 elements.
④ calculates Hamming distance: Compares two "fingerprints" to calculate their Hamming distance. (==0): very similar; (<5): similar; (>10): Different
2.phash algorithm
The mean hash, though simple, is affected by the mean value. If gamma correction or histogram averaging is performed on the image, the mean value will be affected, thus affecting the calculation of the hash value. So it is suggested that a more robust method is used to extract the low-frequency by discrete cosine (DCT).
Discrete cosine transform (DCT) is a kind of image compression algorithm, which transforms the image from pixel domain to frequency domain. Then the general image has a lot of redundancy and correlation, so after the conversion to the frequency domain, only a few of the frequency components of the coefficient is not 0, most of the coefficients are 0 (or close to 0). The right of the graph below is a matrix graph of the coefficients obtained from the discrete Cosine transform (DCT) of the Lena graph. From the upper left corner to the lower right corner, the frequency is more and more high, from the graph can see, the upper left corner of the value is relatively large, to the lower right corner of the value is very small. In other words, the energy of the image is almost all centered on the low-frequency coefficients of this place in the upper left corner.
Working process:
① reduces size, simplifies color: 32*32 grayscale
② computing DCT: Discrete cosine transform of the image to obtain the 32*32 DCT coefficient matrix
③ intercept DCT: Because only the upper left part renders the lowest frequency portion of the image, we intercept the 8*8 matrix in the upper left corner.
④ calculated mean value: mean value of 64 DCT coefficients
⑤ generates a hash code: compares 64 DCT coefficients with the average generated in the previous step. followed by a process with mean hash
3.dhash algorithm
This is a gradual implementation of the hash algorithm, compared to Phash, the speed has a greater advantage.
Working process:
① reduces size, simplifies color: 9*8 grayscale
② Calculation variance: 9 pixels per row to get 8 difference values, a total of 64 difference values
③ generates a hash code: The difference value is compared to 0. Greater than 0, 1; less than 0, 0. followed by a process with mean hash
——————————————————————————————————————————
Second, code implementation:
# coding:utf-8 Import CV2 import numpy as NP import time from Glob import Iglob class Hashtracker:def __init__ (sel F, Path): # Initialize Image self.img = cv2.imread (path) Self.gray = Cv2.cvtcolor (self.img, Cv2. Color_bgr2gray def cal_hash_code (self, cur_gray): s_img = Cv2.resize (Cur_gray, dsize= (8, 8)) Img_mea
n = Cv2.mean (s_img) return s_img > Img_mean[0] def cal_phash_code (self, Cur_gray): # Shrink to 32*32 m_img = Cv2.resize (Cur_gray, dsize= (32, 32)) # Floating-point type is used to compute m_img = Np.float32 (m_img) # Discrete cosine transform to get DC T coefficient matrix IMG_DCT = CV2.DCT (m_img) Img_mean = Cv2.mean (Img_dct[0:8, 0:8]) # Returns a 8*8bool matrix Retu RN Img_dct[0:8, 0:8] > img_mean[0] def cal_dhash_code (self, Cur_gray): # dsize= (width, height) m_i MG = Cv2.resize (Cur_gray, dsize= (9, 8)) m_img = Np.int8 (m_img) # Get 8*8 difference Matrix M_img_diff = m_img[:,: -1]-m_img[:, 1:] return m_iMg_diff > 0 def cal_hamming_distance (self, Model_hash_code, Search_hash_code): # Returns a different number diff = n P.uint8 (Model_hash_code-search_hash_code) return Cv2.countnonzero (diff) def hash_track (self, ROI, RECT, Fla g=0): # Get rectangular frame Information width = ABS (rect[0]-rect[2]) height = ABS (rect[1]-rect[3]) # Get the width of the current image Information img_w, img_h = self.img.shape[:2] # According to flag, select method, calculate the hash value of the previous frame if flag = = 0:model_has H_code = Self.cal_hash_code (ROI) elif flag = = 1:model_hash_code = Self.cal_phash_code (ROI) El If flag = = 2:model_hash_code = Self.cal_dhash_code (ROI) # initialize Hamming distance Min_dis = 64 # sliding window
Matching, step size 2 for I in xrange (0, Img_h, 2): for J-xrange (0, Img_w, 2): If flag = 0: Search_hash_code = Self.cal_hash_code (self.gray[j:j + height, i:i + width]) elif flag = 1 : SEArch_hash_code = Self.cal_phash_code (self.gray[j:j + height, i:i + width]) elif flag = 2:
Search_hash_code = Self.cal_dhash_code (self.gray[j:j + height, i:i + width)) # Calculate Hamming distance
Distance = Self.cal_hamming_distance (Model_hash_code, Search_hash_code) # Get the minimum Hamming distance, and get the matching box at this time If distance < Min_dis:rect = I, j, i + width, j + Height Min_dis = Distan Ce # According to the matching box, get the next frame matching template ROI = Self.gray[rect[1]:rect[3], rect[0]:rect[2]] # Displays the current frame rectangle position CV2.R Ectangle (Self.img, (rect[0], rect[1), (Rect[2, Rect[3]), (255, 0, 0), 2) Return ROI # mouse response function box = [0]*4 def Mouse_handler (event, x, y, Flag, param): if event = = Cv2. Event_lbuttondown: # Starting point record box[0], box[1] = x, y elif event = = Cv2. Event_mousemove and flag = = Cv2. EVENT_FLAG_LBUTTON:BOX[2], box[3] = x, y elif event = = Cv2. Event_lbuttonup: # Get the lower right corner box[2], box[3] = x, y def main (): # Read the first picture, used to frame img = Cv2.imread ('./img/0001.jpg ') Gray = Cv2.cvtcolor (img, Cv2. Color_bgr2gray) Cv2.namedwindow (' Hashtracker ', 1) cv2.setmousecallback (' Hashtracker ', Mouse_handler) while Tru E:cv2.imshow (' Hashtracker ', img) if Cv2.waitkey (1) = = 27:break # get Initialize model = GRA
Y[BOX[1]:BOX[3], box[0]:box[2]] # get picture sequence paths = Iglob (R './img/*.jpg ') # count with frame_count = 0 # loop read into the picture For path in Paths:frame_count + + 1 # instance Create h = hashtracker (path) # perceptual hash Trace star T_time = Time.clock () model = H.hash_track (model, box) Fin_time = Time.clock () print "%d:delta t ime:%.2f "% (Frame_count, fin_time-start_time) cv2.imshow (' Hashtracker ', h.img) if Cv2.waitkey (20) = = 27
: Break If __name__ = = ' __main__ ': Main ()——————————————————————————————————————————
III. Experimental Results and analysis
The following is an experimental result based on mean hash tracking
Frame 25th Frame 50th
Frame 100th Frame No. 200
Results: The mean hash and the difference hash speed are relatively fast, the Phash effect is good, but the speed is very slow; in fact, in general, if you want to use as a tracking algorithm, I think the speed is relatively slow otz. But the train of thought is still very good.
——————————————————————————————————————————
Reference documents:
① visual target tracking based on the perceptual hashing algorithm
② three similarity image retrieval techniques based on perceptual hashing algorithm
③ Similarity Image Search principle