Local Self-similar description sub-thesis "Matching" self-similarities across Images and videos

Source: Internet
Author: User

This week began to study the local self-similar description, because there is less research on the internet, so decided to read two main literature, and then record their understanding here. If I can make myself clear, I think I know the record.

The first one is the 2007 CVPR "Matching local self-similarities across Images and videos"

Http://www.vision.cs.chubu.ac.jp/CV-R/pdf/EliCVPR2007.pdf

The second is the extension and detailed details of the previous article in the 2009 ICCV to improve the efficient retrieval of deformable Shape Classes using local self-similarities

Http://www.robots.ox.ac.uk/~vgg/publications-new/Public/2009/Chatfield09/chatfield09.pdf


So why should we have local self-similar descriptors?

In the image above, in addition to this loving shape, we couldn't find the similarity of the other four graphs, so in order to find love in the other graphs, Daniel invented the local self-similar descriptor, which means that, as I understand it, this is a skeleton-like match that is not the same as the traditional gray edges:


I. "Matching local self-similarities across Images and videos"

Each pixel point in the image has a self-similar description of DQ (DQ How to get the bottom detail), this is done by associating a Q-centric image block with a larger image block around the image block (how it is also mentioned below), and then getting a correlation surface, The correlation surface is then converted to the logarithmic polar coordinates bin representation.


1. Constructs the local self-similar description child DQ:

(1) Taking pixel Q as the center, taking a small image block (5*5 in the paper) and a larger image block (radius 40);

(2) Compare the patch of 5*5 with the region of RADIUS 40, compute SSD;

(3) Convert SSD to a "correlation surface" Sq (x,y):

where Var (noise) is a constant that represents acceptable illumination changes, VAR (auto) is the maximum variance of all patch (RADIUS 1) deviations in a very small field around the Q-centric;

(4) Then, the "Correlation surface" Sq (x,y) is converted to the logarithmic polar coordinates and then divided into 80 bins (20 angles, 4 radial intervals). For each bin, select the maximum value in the bin as the value of the bin;

(5) The local self-similar descriptor of Point Q is formed by the 80 bin, and finally the descriptor vector is normalized to the [0,1] range.


2. Characteristics and advantages of local self-similar descriptors:

(1) Local self-similar descriptors represent the local characteristics of the image, not the global one, which enables the descriptor to be applied to more challenging images.

(2) The logarithmic polar coordinates result in local affine deformation;

(3) Selecting the maximum value in bin as the value of bin, making the descriptor insensitive to the best match position performance;

(4) The use of patch rather than a single pixel can obtain more image information;

(5) The local Self-similar descriptor not only contains the skeleton of the target, but also contains rich information such as color, edge and so on.


3. Detection steps:

(1) Calculating the local self-similar descriptors in template F and the image G to be detected respectively;

(2) All the descriptors in F are composed of "description set (Ensemble of descriptor)", when a "description set" similar to f is found in g, it means that f can be detected in G (the similarity here is not limited to the similarity of the relative geometric position of the descriptor);

(3) Not every descriptor in the description set is informative (the article mentions two "saliency" and "homogeneity"), so we need to screen out those that have no information before the normalized descriptor;

(4) Matching method: Improved the "Ensemble matching" algorithm in the article "detecting irregularities in images and in". In practice, the description of F is aggregated into a simple description set, and then a similar description set is found in G using the method in the detecting irregularities in Images and in, which is based on the Manhattan distance (L1 Distance), the logical S-functions (sigmoid function) measure the similarity between the descriptors, ensemble matching constructs an intensive likelihood map (likelihood map) in G, each of which corresponds to the similarity of F, The one with the highest similarity is the position of f in G.


4. Scale processing:

Since self-similarity may appear on different scales and regions of different sizes, the Gaussian image pyramid is used to obtain multiple scale descriptors. The parameters of the same size (patch size,surrounding region) are used for all the scales, each of which generates its own description set, and the respective likelihood graph is generated by independent search.

Combination of different scale descriptors: 1 Firstly, each logarithmic likelihood graph is normalized according to the number of descriptors in its scale, 2 is then combined with weighted average according to the sparsity of each likelihood surface.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.