Squeeze-and-excitation Networks

Source: Internet
Author: User

Squeeze-and-excitation Networks

Paper

In recent years, convolutional neural networks have made great breakthroughs in many fields. Convolution nucleus, as the core of convolutional neural network, is often regarded as the aggregation of information of spatial (spatial) information and feature dimension (Channel-wise) on the local sensing field. convolutional neural networks consist of a series of convolution layers, non-linear layers, and lower sampling layers, which enable them to capture the image's characteristics from the global sensing field to describe the image.

However, to learn a very strong network is very difficult, and its difficulties come from many aspects. Recently, a lot of work has been proposed to improve the performance of the network from the spatial dimension level, such as the inception structure embedded in the multi-scale information, aggregation of the characteristics of a variety of different senses to obtain performance gains; The contextual information in the space is considered in the Inside-outside network , and the introduction of the attention mechanism into the spatial dimension and so on. All of this work has been fairly well-achieved.

As you can see, there has been a lot of work on the spatial dimension to improve the performance of the network. It is natural to think that the network can be considered to improve performance from other aspects, such as considering the relationship between feature channels? The author's work is based on this point proposed squeeze-and-excitation Networks (abbreviated Senet). In the proposed structure, squeeze and excitation are two very critical operations, so the author names them. The author's motivation is to display the interdependencies between the model feature channels. In addition, it is not intended to introduce a new spatial dimension to the fusion of feature channels, but a new "feature re-calibration" strategy is adopted. Specifically, it is through learning to automatically obtain the importance of each channel, and then according to this important degree to improve the useful features and suppress the use of the current task is not very useful features.

Is the SE module proposed by the author. Given an input x, its feature channel number is C1, and a series of convolution and other general transformations to obtain a feature channel number of C2 features. Unlike the traditional CNN, the previous features are then re-calibrated with three operations.

First of all, try squeeze operation, along the spatial dimension to carry out feature compression, each two-dimensional feature channel into a real number, the real number of a degree with a global sense of the field, and the output of the dimension and the number of input feature channel match. It represents the global distribution of the response on the feature channel, and it makes it possible to get a global feel field near the input layer, which is very useful in many tasks.

The second is the excitation operation, which is a mechanism similar to the gate in a cyclic neural network. Parameters are used to generate weights for each feature channel, where the parameters are learned to display correlations between the ground modeling feature channels.

Finally, a reweight operation, we regard the weight of the output of excitation as the importance of each feature channel after feature selection, and then, by multiplying by the channel weighting to the previous feature, complete the re-calibration of the original feature on the Channel dimension.

The image on the left is an example of embedding SE modules into the inception structure. The dimension information next to the box represents the converted output. Here we use global average pooling as the squeeze operation. Immediately after the two fully connected layers constitute a bottleneck structure to model the correlation between channels, and output and input features the same number of weights. The feature dimension is first lowered to 1/16 of the input, then Relu activated and then raised back to the original dimension through a fully connected layer. This is better than using a fully connected layer directly (1). With more nonlinearity, it can better fit the complex correlation between channels, 2). Greatly reduced the number of parameters and the amount of calculation. Then through the sigmoid door to get the 0~1 between the weights, and finally through a scale operation to the normalized weight weighted to each channel characteristics.

In addition, the SE module can be embedded in a module containing skip-connections. The image on the right is an example of embedding se into the ResNet module, and the operation process is basically the same as se-inception, except that the residual feature on the branch is re-calibrated before the addition. If the features on the main branch of addition are re-calibrated, due to the existence of 0~1 scale operation on the trunk, it is easy to get gradient dissipation near the input layer when the network is more deep BP optimized, which makes the model difficult to optimize.

Most of the mainstream networks are now built on the basis of these two similar units superimposed by repeat. This shows that the SE module can be embedded in almost all of the network structures now. By embedding the SE module in the building block unit in the original network structure, we can obtain different kinds of senet. such as se-bn-inception,se-resnet, etc.

As can be seen from the above introduction, the Senet structure is very simple

http://www.sohu.com/a/161793789_642762

Squeeze-and-excitation Networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.