Paper notes-squeeze-and-excitation Networks

Source: Internet
Author: User

The author proposes that in order to enhance the network's expressive ability, the existing work shows the function of strengthening space coding. In this paper, the author focuses on the information on the channel, proposed the "Squeeze-and-excitation" (SE) block, is actually explicitly let the network focus on the channel between the information (adaptively Recalibrates channel-wise feature Responsesby explicitly modelling interdependencies between channels. )。 Senets made the first place in ILSVRC2017, top-5 error 2.251%

Some of the previous architectural designs focus on spatial dependencies
Inception architectures:embedding multi-scale processes in its modules
Resnet, Stack Hourglass
Spatial attention:spatial Transformer Networks

The author's design ideas:
We investigate a different
aspect of Architectural design-the channel relationship

Our goal are to improve the representational Power's a network by explicitly
Modelling the interdependencies between the channels of its
Convolutional features. To achieve this, we propose a mechanism that allows the network to perform feature recalibration, through which it can Lea RN to use global information
To selectively emphasise informative features and suppress
less useful ones.
The author wants to be able to recalibration the convolution feature, which, according to my understanding, is weighting the channel.

Related work
Network structure:
Vggnets, Inception models, BN, Resnet, densenet, Dual Path network
Other ways: grouped convolution, multi-branch convolution, Cross-channel correlations
This approach reflects a assumption that channel relationships can
is formulated as a composition of instance-agnostic functions with the local receptive fields.

Attention, gating mechanisms

SE Block

\ ({f_{tr}}:x \in r{^{w ' \times H ' \times C '}},{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} U \in {\kern 1pt} {\kern 1pt} {r^{w \times H \times c}}\)
Set \ (V = [V_1, V_2, ..., v_c]\ represents the learned filter kernel, \ (v_c\) represents the parameters of the C filter, then output \ (U = [u_1,u_2,..., u_c]\) :
\[ {U_c} = {\rm{}}{{\rm{v}}_c} * X = \sum\limits_{s = 1}^{c '} {v_c^s} * {x^s}\]
\ (v_c^s\ The is a channel kernel, and a newly generated channel is the same as the original all channel with the corresponding filter kernel convolution. The relationship between channel is implicitly contained in the \ (v_c\) , but the information is intertwined with spatial correlations, and the author's goal is to give the network more attention to useful information. Divided into squeeze and excitation two steps to accomplish the purpose.
Squeeze
Problems with existing networks: Because the convolution is really local receptive field, each convolution unit can only focus on the spatial information in this field.
to alleviate this problem, it is proposed that the squeeze operation will encode the global spatial information into the channel descriptor, in particular through the global average pooling operation.

\[{z_c} = {F_{SQ}} ({U_c}) = {1 \over {W \times h}}\sum\limits_{i = 1}^w {\sum\limits_{j = 1}^h {{U_c} (I,J)}}\]

is to ask for the mean value of each channel as a global description.
excitation:adaptive recalibration
To take advantage of the information obtained by Squeeze, a second OP is proposed, which needs to meet 2 requirements: One is flexible enough to be able to learn the nonlinear relationship between the channel, and the other is to be able to learnnon-mutually-exclusiveRelationship, the word I understand is non-exclusive, and may be that there are a variety of relationships between multiple channnel.
\[s = {F_{ex}} (Z,w) = \sigma (g (z,w)) = \sigma ({w_2}\delta ({w_1}z)) \]
$\delta\ (is relu,\){W_1} \in {r^{{c \over R} \times C}}\ (, \){w_2} \in {r^{c \times {C \over R}}}\ (, \)W_1\ (is bottleneck, reduce channel number, \)W_2\ (is to increase channel number, \)\gamma\ (set to 16.) End again \)U\ (with \)The S $ to scale, in fact, is weighted. This gives you the output of a block.
\[{x_c} = {F_{scale}} ({U_c},{s_c}) = {S_c} \cdot {u_c}\]
\ (f_{scale}\)Represents feature Map\ (u_c \in r^{w \times h}\)And\ (s_c\)The channel-wise multiplication

The Activations act as channel weights
adapted to the input-specific descriptor Z. In this regard,
SE blocks intrinsically introduce dynamics conditioned on
the input, helping to boost feature DISCRI Minability

    The
    1. Example

      SE block can be easily added to other network structures.
    2. Mxnet code

        squeeze = mx.sym.Pooling (DATA=BN3, Global_pool=true, kernel= (7, 7), Pool_ Type= ' avg ', Name=name + ' _squeeze ') squeeze = Mx.symbol.Flatten (data=squeeze, Name=name + ' _flatten ') excitation = Mx.symbol.FullyConnected (Data=squeeze, Num_hidden=int (num_filter*ratio), Name=name + ' _excitation1 ') # Bottleneckexcitation = Mx.sym.Activation (data=excitation, act_type= ' Relu ', Name=name + ' _excitation1_relu ') excitation = mx.symbol.FullyConnected (Data=excitation, Num_hidden=num_filter, Name=name + ' _excitation2 ') excitation = Mx.sym.Activation (data=excitation, act_type= ' sigmoid ', Name=name + ' _excitation2_sigmoid ') BN3 = Mx.symbol.broadcast_ Mul (BN3, Mx.symbol.reshape (Data=excitation, shape= ( -1, Num_filter, 1, 1)))  
    3. Network structure

    4. Experiments

Reference documents:
[1] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." ArXiv preprint arxiv:1709.01507 (2017).

Welcome to the public number: Vision_home study together, sharing papers and resources on an occasional basis

Paper notes-squeeze-and-excitation Networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.