Paper notes weakly-supervised Spatial Context Networks

Source: Internet
Author: User

    • Background

In the field of text processing, "the idea of the local spatial context within a sentence, proved to is a effective supervisory signal for learning di stributed word vector Representations ", which has the following two functions" Given a word tokenized corpus of text, to learn a representation for a tar Get word that allows it to predict representations of contextual words around it; Or vice versa, given contextual words to predict a representation of the "target word."

Text is a one-dimensional, the basic constituent unit is the word, the context of a word can be obtained by looking at its left and right words (this is the character of the text). The practice of finding "its natural characteristics" from the object of research and exploiting it may help us to get rid of "the need to manually tag data in the field of image processing in order to learn a highly expressive and deep network" to indicate a direction (in the absence of manual labeling of samples, we can also learn to express a strong network).

Now the question is, what is the word "image"? pixels, edges, objects, or scenes?

How is the context of the image word defined? Neighborhood domain of pixels, spatial layout of objects, or what?

Different people have different understandings of the above problems, which leads to a series of papers along this line of thinking. This article will focus on this paper view.

This article paper that "the word should be an object, its context should be the spatial layout of the object", the original is "by the working with the patches at the object scale, we network can focus on more object- Centric features and potentially ignore some of the texture and color details that is likely less important For semantic tasks". The spatial layout of an object does not mean that the object should be next to each other.

Having understood this premise, we can look at the overall framework of this paper, as shown in

Figure 1 wants to show that given "car" and its "offset" with "airplane", we predict the expression of "airplane".

    • Main points

After understanding the basic idea, there are several technical problems that need to be solved

1) How do we know if the predicted "airplane" expression is good or bad?

In this, the author uses a "trick". If we know in advance that a good expression of "airplane" does not work, the use of existing VGG, alexnet activity an object is very good expression is trivial. The characteristics we have learned can be compared with the existing "good expression". Based on this, the author designs The following network structure

The top stream and bottom stream have the same network structure, and the parameters of the top stream are fixed (capable of consistently outputting good feature representations). "Offset" is entered into the spatial context module with the output of the bottom stream H1, which compares the output of the module with the output of the top stream by loss. In this way the whole network has no barriers to thinking, that is to say, at least in the idea of the work can be done.

2) Network initialization

The Bottom stream and top stream use a model that has been pretrained on imagenet to initialize the weights, and the top stream parameter is fixed, Bottom stream and spatial context The module parameter is available for learning.

3) "Car", "airplane" and other objects how to produce it?

We have a ready-made "object agnostic" region proposal generation method, the author in the paper also compares the effects of several algorithms.

    • Summary

1) After the network structure and parameter initialization methods are determined, the following problem is the actual effect of the algorithm. This part of the people still directly find the original paper to read better.

2) The shortcomings of this paper is too dependent on the pretrained model, strictly speaking, should belong to the category of unsupervised. If there is no pretrained model, do not know the effect of this paper?

3) This paper is more ingenious also relies on pretrained model to avoid unsupervised learning trivial solution.

Paper notes weakly-supervised Spatial Context Networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.