Calculation of image similarity-one of Python)

Source: Internet
Author: User

Statement: This article was first published on the blog ghost of Ryong Yonghao (Lianhua butterfly.

 

For the python series: This is the experiment andCodeIt involves a wide range of areas and is also complex. It may include image processing and retrieval, Chinese word segmentation, text classification, Pinyin, and error correction. Unconcealed: post this series on a blogArticleThe reason is that python is promoted, so this series of articles will contain source code and related test cases, which is also one of the characteristics. However, this series of articles are "just a few minutes away" and won't go deep into the exclusive field, just to show that python is powerful and not only suitable for web or game development, but also for scientific research.

 

 

To calculate the image similarity, we must find out the image features. In this way, you can describe a person's face: font face, thick eyebrow, double eyelid, straight nose, big and thick lips. OK, these features determine whether this person is a bit like your colleagues, friends, and family. The same is true for images. To calculate similarity, some features must be abstracted, such as blue sky, white clouds, and green grass. Common image features include color features, texture features, shape features, and spatial relationship features. Color features are the most commonly used, including histograms, color sets, color moments, aggregate vectors, and associated graphs. A Histogram can describe the global distribution of colors in an image, which is easy to understand and implement. Therefore, it is used for basic image similarity calculation. It is used as an example of an article titled "just getting started, we are no exception.

Before proceeding to our test, we need to find a batch of images as test cases. I went down to Huang Quan, and finally, my former colleague Simon's blog (http://blog.163.com/johnal1) he found a series of scenery pictures (http://blog.163.com/johnal1/blog/static/9394912200812105654784) which he took to Tibet during his annual tour organized by the company, it can be said that we have completed 90% of their experiments. Oh yeah! Next let's take a look at our most important group of photos (two photos ):

 

 

 

After finding a good set of test images, we need to install an image library in the python environment. My choice is Pil (Python image library ). PIL provides Image Processing for python and supports dozens of image formats. (For introduction to Pil, see my previous article "Image Processing with Python" http://blog.csdn.net/lanphaday/archive/2007/10/28/1852726.aspx)

Although the two images are of the same size, we need to unify all the images to a special specification for versatility. Here I select the 256x256 resolution.

Because the number of histogram samples calculated by PIL in RGB mode is 768, and the calculation workload is not too large, this article will be used directly without dimensionality reduction.

6 def make_regalur_image (IMG, size = (256,256 )):

7 return IMG. Resize (size). Convert ('rgb ')

After the image is converted to a regular image, you can call the IMG. histogram () method to obtain the histogram data. For example, the histogram of the preceding two graphs is as follows:

 

 

After the regular image is obtained, the similarity calculation of the image is converted to the histogram distance calculation. In this paper, the histogram similarity quantitative measurement is carried out according to the following formula:

SIM (G, S) =, where G, S is the histogram, n is the number of Color Space sample points

The corresponding Python code is as follows:

19 def hist_similar (LH, RH ):

20 assert Len (LH) = Len (RH)

21 return sum (1-(0 if l = r else float (ABS (l-R)/max (L, R) for L, R in zip (LH, RH)/Len (LH)

22

23 def calc_similar (Li, RI ):

24 return hist_similar (Li. histogram (), Ri. histogram ())

In less than 10 lines of code, the image similarity calculation is completed. In addition, the function and test code for reading images from the hard disk are only 20 rows:

28 def calc_similar_by_path (LF, RF ):

29 Li, rI = make_regalur_image (image. Open (LF), make_regalur_image (image. Open (RF ))

30 return calc_similar (Li, RI)

31

32 if _ name _ = '_ main __':

33 Path = r 'test/test % d/% d. jpg'

34 For I in xrange (1, 7 ):

35 print 'test _ case _ % d: %. 3f % '% (I, calc_similar_by_path ('test/test % d/% d. JPG '% (I, 1), 'test/test % d/% d. JPG '% (I, 2) * 100)

So what is the effect of this? Let's take a look at the test results (Click here to download the test cases and code ):

Test_case_1: 63.322%

Test_case_2: 66.950%

Test_case_3: 51.990%

Test_case_4: 70.401%

Test_case_5: 32.755%

Test_case_6: 42.203%

Combined with our visual observation of test cases, thisProgramWork is fine. However, test_case_4 exposes the disadvantages of the histogram: It only describes the global distribution of colors in the image, and cannot describe the local distribution of colors and the position of colors. The rule diagram of test_case_4 is as follows:

We can see that their local color distribution is quite different, but in fact their global histograms are quite similar:

 

Although the two graphs are extremely similar from the histogramAlgorithmThe result of calculating the similarity of 70.4% is certainly unacceptable. So how can we overcome the disadvantages of histograms? The answer is to divide the regular image into blocks, calculate the similarity of the corresponding small blocks, and finally reflect the similarity of the entire image based on the average similarity of each small block. In the experiment, we divide the regular image into 4x4 blocks, and the resolution of each block is 64x64:

 

 

The code for image segmentation is:

9 def split_image (IMG, part_size = (64, 64 )):

10 W, H = IMG. Size

11 PW, pH = part_size

12

13 assert w % PW = H % pH = 0

14

15 return [img. Crop (I, j, I + PW, J + pH). Copy ()/

16 For I in xrange (0, W, PW )/

17 For J in xrange (0, h, pH)]

Correspondingly, change the function calc_similar () for calculating a similar graph:

23 def calc_similar (Li, RI ):

24 # Return hist_similar (Li. histogram (), Ri. histogram ())

25 return sum (hist_similar (L. histogram (), R. histogram () for L, R in zip (split_image (LI), split_image (RI)/16.0

After such improvement, the algorithm has been able to reflect the geographic distribution of colors and the position of colors in a certain program, which can better make up for the shortcomings of the global histogram algorithm. The results calculated by the new algorithm are as follows:

Test_case_1: 56.273%

Test_case_2: 54.925%

Test_case_3: 49.326%

Test_case_4: 40.254%

Test_case_5: 30.776%

Test_case_6: 39.460%

We can see that the similarity of test_case_4 is reduced from 70.4% to 40.25%, which is basically in line with the judgment by the naked eye. In addition, the similarity of other images is slightly decreased because of the influence of the location factor. As a result, the histogram similarity Algorithm Based on blocks is simple and effective.

Image similarity calculation is the basis for image search and recognition. This article only introduces the most basic calculation methods. If you want to learn and study better algorithms, remember that python can also help you ~

 

For all the code and test cases of this experiment, please slam here to download them. Thanks again to Simon, who provides image support.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.