Descriptor Matching with convolutional neural networks:a Comparison to SIFT

Source: Internet
Author: User

first, the main ideas

The main purpose of this article is to verifyCNNand theSiftadvantages and disadvantages in feature matching. Sift is widely used in computer vision, and before 2012, local descriptors based on directional histograms, such asSIFT, HOG, SURF and so the dominant position. But 2012 Krizhevsky a new generation ofCNNthe framework has achieved unprecedented results. But there are two questions that are not yet clear:CNNdoes it benefit from big data orCNNcan really learn the visual characteristics of abstraction. From the outset,SIFTIt is very successful not only in recognition but also in feature matching. But until now, there was noCNNin the research of matching, so the main task of this paper is to verifyCNNrelative toSIFTWhat is the effect of the feature match.

Second, CNN -based feature learning

using Berkeley's Open source code-- Caffe to carry out the corresponding experiment.

1. Supervision and Training

Adopt Caffe In the Imagenet well-trained models above

2. No supervised training

Reference Notes "unsupervised feature learning by augmenting single images" the algorithm

Three, the experiment contrast research

The recognition task of image classification and object detection relies on the semantic structure of the scene, and we want the matching of points of interest to be independent of this information. This will make it interesting to automatically learn the features based on the matching task, and it creates a problem: whether the feature descriptors that are based on the classification task learning can achieve good results. Standard uses SIFT and raw RGB values as the base method.

1. Database

A database with 48 images and a 416 (16 seed Image Library + 400 images after a series of changes)

     2. Test method of results

(1) using Mser algorithm to extract regions of interest and corresponding image blocks

(2) Image corresponding image block characteristics, the use of European distance to match, resulting in a series of descriptors

(3) A pair of descriptors as the correct positive sample condition: The target image of the descriptor of the ellipse and the true ellipse of the IOU greater than 0.6, the rest as the wrong positive sample.

3, the choice of correlation coefficient

Image block Size effects (416 images of the database)

It can be seen from the image that sift's optimal patch size is 69, while CNN increases accuracy with patch size. Sift usually divides the image blocks into 4*4 cells,resize, which blurs the important gradients and brings a lot of new information. CNN's input size is fixed, that is, feel the field fixed, the higher the number of layers will benefit from the greater feeling of the wild.

4. Experimental results

(1) Different transformation methods (database of 416 images)

In addition to blurring, CNN has a better effect than sift. Unsupervised CNN has a slightly better effect than sift in fuzzy changes. The gap between CNN and SIFT is like the gap between SIFT and RGB, so we get the conclusion that CNNs is one of the best artificial design features--sift. A serious blur is a special issue for the oversight of CNN, but it is less of a concern for unsupervised CNN because of the ambiguity in the unsupervised CNN. Blurring this situation also occurs in target recognition, so trying to make CNN more robust to this situation will help improve the accuracy of target recognition. Because small targets are usually magnified and then identified, which leads to blurring.

(2) Comparison of different feature pairs (database of 416 images)


The corresponding point is above the diagonal, the first method is better, the second method is just the same. In most cases, unsupervised CNN is better than overseeing CNN.

(3) 48 Image Library Experimental results

The results are similar to the 416 images, and are not mapped and described.

5. Calculation Time


This computing time on a single CPU, CNN does need a lot of computing time, but if placed on the GPU, CNN is only 5.5ms, this time is still within the scope of acceptance.

Iv. Summary

This article has several important conclusions:

(1) Two CNNs in the matching task will be Hao Yu Sfit.

(2) In the image classification, there are tags in the case, is very useful for supervised learning. But in feature matching, unsupervised CNN has the advantage.

(3) The fuzzy transformation shows a weakness of imagenet training. But unsupervised CNN can handle some degree of ambiguity, but blurring is still a weakness of neural networks, which may be a common weakness of this structure.

(4) Sift has a great advantage in calculating time.

For the task of speed and simplicity as the main indicator, SIFT is still a feature worth considering. For most computer vision tasks that rely on feature matching, it makes sense to consider using CNN to train features.



Descriptor Matching with convolutional neural networks:a Comparison to SIFT

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.