Paper notes weakly-supervised Spatial Context Networks

Last Update:2017-06-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Background

In the field of text processing, "the idea of the local spatial context within a sentence, proved to is a effective supervisory signal for learning di stributed word vector Representations ", which has the following two functions" Given a word tokenized corpus of text, to learn a representation for a tar Get word that allows it to predict representations of contextual words around it; Or vice versa, given contextual words to predict a representation of the "target word."

Text is a one-dimensional, the basic constituent unit is the word, the context of a word can be obtained by looking at its left and right words (this is the character of the text). The practice of finding "its natural characteristics" from the object of research and exploiting it may help us to get rid of "the need to manually tag data in the field of image processing in order to learn a highly expressive and deep network" to indicate a direction (in the absence of manual labeling of samples, we can also learn to express a strong network).

Now the question is, what is the word "image"? pixels, edges, objects, or scenes?

How is the context of the image word defined? Neighborhood domain of pixels, spatial layout of objects, or what?

Different people have different understandings of the above problems, which leads to a series of papers along this line of thinking. This article will focus on this paper view.

This article paper that "the word should be an object, its context should be the spatial layout of the object", the original is "by the working with the patches at the object scale, we network can focus on more object- Centric features and potentially ignore some of the texture and color details that is likely less important For semantic tasks". The spatial layout of an object does not mean that the object should be next to each other.

Having understood this premise, we can look at the overall framework of this paper, as shown in

Figure 1 wants to show that given "car" and its "offset" with "airplane", we predict the expression of "airplane".

Main points

After understanding the basic idea, there are several technical problems that need to be solved

1) How do we know if the predicted "airplane" expression is good or bad?

In this, the author uses a "trick". If we know in advance that a good expression of "airplane" does not work, the use of existing VGG, alexnet activity an object is very good expression is trivial. The characteristics we have learned can be compared with the existing "good expression". Based on this, the author designs The following network structure

The top stream and bottom stream have the same network structure, and the parameters of the top stream are fixed (capable of consistently outputting good feature representations). "Offset" is entered into the spatial context module with the output of the bottom stream H1, which compares the output of the module with the output of the top stream by loss. In this way the whole network has no barriers to thinking, that is to say, at least in the idea of the work can be done.

2) Network initialization

The Bottom stream and top stream use a model that has been pretrained on imagenet to initialize the weights, and the top stream parameter is fixed, Bottom stream and spatial context The module parameter is available for learning.

3) "Car", "airplane" and other objects how to produce it?

We have a ready-made "object agnostic" region proposal generation method, the author in the paper also compares the effects of several algorithms.

Summary

1) After the network structure and parameter initialization methods are determined, the following problem is the actual effect of the algorithm. This part of the people still directly find the original paper to read better.

2) The shortcomings of this paper is too dependent on the pretrained model, strictly speaking, should belong to the category of unsupervised. If there is no pretrained model, do not know the effect of this paper?

3) This paper is more ingenious also relies on pretrained model to avoid unsupervised learning trivial solution.

Paper notes weakly-supervised Spatial Context Networks

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Paper notes weakly-supervised Spatial Context Networks

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Paper notes weakly-supervised Spatial Context Networks

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support