Highway Networks Pytorch

Source: Internet
Author: User
Tags mul pytorch

Guide

This paper discusses the reasons why deep neural network training is difficult and how to use highway Networks to solve the problem of deep neural network training, and realizes Highway Networks on Pytorch.

I. The relationship between Highway Networks and deep Networks

The deep neural network has better effect compared with the shallow neural network, in many aspects has achieved very good results, especially in the image processing has made a great breakthrough, however, with the increase in depth, the problem of deep neural network is greater, like everyone knows the gradient disappear problem, This also creates difficulties in training deep neural networks. The network structure (Highway Networks), which was inspired by the lstm gate mechanism in 2015 by Rupesh Kumar Srivastava, is a good solution to the problem of training deep neural networks, Highway Networks allows information High-speed unimpeded through the layers of deep neural networks, which effectively slows down the problem of gradients, so that deep neural network is not only the effect of shallow neural network.

Second, deep Networks gradient disappearance/explosion (vanishing and exploding gradient) problem

Let's take a look at the simple Deep Neural network (just a few hidden layers)

Write out the formulas for each layer first.

We have a derivative of W1:

W = W-LR * g (t)

The above formula is only four hidden layers of the case, when the number of hidden layer reached dozens of layers or even hundreds of layers of the case, a layer of reverse propagation back, the power value of < 1 when the reverse propagation to a layer after the weight of the near constant, equivalent to the input x mapping, for example, G (t) =〖0.9〗^ 100 is already very small and small, which causes only the front layer can be normal reverse propagation, the back of those hidden layer is only equivalent to the weight of the input x mapping, weights are not updated. Conversely, when the authority value > 1, it will cause a gradient explosion, the same is only the previous layers can change the normal learning, the hidden layer will become very large.

Third, Highway Networks Formula
    • Notation

      (.) The operation represents a matrix by phase multiplication

      sigmoid function:

    • Highway Networks Formula

      For our normal neural network, the input x is converted to y with the nonlinear activation function h, and equation 1 ignores the bias. However, H is not limited to activation functions, but also to other forms, such as convolutional and recurrent.

      For the highway networks Neural network, two nonlinear conversion layers are added, one is T (transform gate) and one is C (Carry gate), in layman's terms, T means that the input information is convolutional or the recurrent of the information is converted, C represents the original input information x reserved part, where t=sigmoid (WX + b)

      For computing convenience, C = 1-t is defined here.

      It is important to note that the dimensions of x, Y, H, T must be the same, and to ensure that the dimensions are consistent, you can adopt sub-sampling either the strategy or the zero-padding normal linear layer to change the dimension to make it consistent.

      Compared to a few formulas, Equation 3 is more flexible than Formula 1, you can consider a special case, t= 0, y = x, the original input information is all reserved, do not make any changes, T = 1, y = H, the original information all converted, not preserving the original information, just equivalent to a normal neural network.

Iv. Highway bilstm Networks
  • Highway bilstm Networks Structure Diagram
    is Highway bilstm Networks structure diagram:
    Input: vector of words representing input
    B: In this task represents Bidirection lstm, representing the H in the formula (2)
    T: Represents the T in the formula (2), which is the transform gate in the highway networks
    C: Represents the C in the formula (2), which is the carry gate in the highway networks
    Layer = N, which represents the nth level in the highway networks
    Highway: Box represents a layer of Highway Networks
    In this structure diagram, the output of the Highway networks n-1 layer as the input of the nth layer

  • Highway bilstm Networks Demo
    Pytorch build a neural network generally need to inherit nn.Module this class, and then implement the forward() function inside, build highway bilstm Networks wrote two classes, and use nn.ModuleList to associate two classes:


    Class Hbilstm (NN. Module):
    Init (self, args):
    Super (hbilstm, self). Init ()
    ......
    Def forward (self, x):
    # Implementation of Highway Bilstm networks formula
    ......

    Class Hbilstm_model (NN. Module): Def __init__ (self, args):    super (Hbilstm_model, self). __init__ () ...    # Args.layer_num_highway represents Highway bilstm networks there are several layers of    Self.highway = nn. Modulelist ([Hbilstm (args) for _ in range (Args.layer_num_highway)])    ... def forward (self, x):    ...    # Call the Forward () function of the Hbilstm class for    Current_layer in Self.highway:        x, Self.hidden = Current_layer (x, Self.hidden)

    The HBiLSTM forward() formula we implement in the function of the class Highway BiLSTM Networks
    First, let's calculate H, as mentioned above, H can be convolution or lstm, where normal_fc we need h


    X, Hidden = self.bilstm (x, Hidden)
    # Torch.transpose is a transpose operation
    Normal_fc = torch.transpose (x, 0, 1)

    As mentioned above, the dimensions of the x,y,h,t must be consistent and provide two strategies, where we use a normal Linear to convert dimension


    source_x = Source_x.contiguous ()
    Information_source = Source_x.view (source_x.size (0) * source_x.size (1), Source_x.size (2))
    Information_source = Self.gate_layer (Information_source)
    Information_source = Information_source.view (source_x.size (0), source_x.size (1), information_source.size (1))

    You can also usezero-paddingThe policy guarantees that the dimensions are consistent

    You also can choose the strategy that zero-padding

    Zeros = Torch.zeros (source_x.size (0), source_x.size (1), Carry_layer.size (2)-source_x.size (2))
    source_x = Variable (Torch.cat (Zeros, Source_x.data), 2)


    After the dimension is consistent, we can write the code according to our formula:

    Transformation Gate layer in the formula is T

    Transformation_layer = f.sigmoid (Information_source)

    Carry gate layer in the formula is C

    Carry_layer = 1-transformation_layer

    Formula Y = H * T + x * C

    Allow_transformation = Torch.mul (Normal_fc, Transformation_layer)
    Allow_carry = Torch.mul (Information_source, Carry_layer)
    Information_flow = Torch.add (allow_transformation, Allow_carry)


    The finalinformation_flowIs our output, however, it is necessary to ensure that the dimension is consistent by the transformation dimension.
    For more information please refer to Github:highway Networks implement in Pytorch

Five, Highway bilstm Networks Experimental results

The task of this experiment is to use Highway bilstm Networks to complete the emotional classification task (the attitude of a sentence is divided into positive or negative), the data comes from the Twitter sentiment classification data set, the following is the number of sentences in the data set of each label:

is the test result of this experiment task in the 2-class data set. Figure 1-300 shows the dimension of layer = 1,bilstm in Highway bilstm Networks is 300 dimensions.

Experimental results: It can be seen that the simple multi-layer bidirectional lstm does not bring emotional analysis performance improvement, especially after the 10 layer, the effect is not as random as guessing. After the use of highway networks, although the performance is gradually declining, but the extent of the decline has been significantly improved.

References
    • Highway Networks (paper)

    • Training Very Deep Networks

    • Why deep neural networks are hard to train

    • Training Very Deep Networks--highway Networks

    • Very deep learning with Highway Networks

    • Hightway Networks Study Notes

Highway Networks Pytorch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.