4th Course-Convolution neural network-second week Job 2 (gesture classification based on residual network)

Source: Internet
Author: User
Tags python list keras
0-Background

This paper introduces the deep convolution neural network based on residual network, residual Networks (resnets).
Theoretically, the more neural network layers, the more complex model functions can be represented. CNN can extract the features of low/mid/high-level, the more layers of the network, the richer the features that can be extracted to different levels. Moreover, the more abstract the deeper the feature of the network extraction, the more semantic information is.
But in practical training, deep neural networks are hard to train. Simply increasing the number of network layers can lead to gradient dispersion or gradient explosions (all refer to the reverse propagation process). This makes the gradient fall more and more slowly.
The exact first ascent on the training set was followed by subsequent saturation or even descent.
We can see from the following figure, as the number of iterations, the previous layers of neural networks, the gradient amplitude quickly dropped nearly 0.

Figure 1 : Vanishing Gradient
The speed of learning decreases very rapidly for the early layers as the network trains
The idea of residuals is to remove the same body part, thus highlighting small changes, somewhat similar to a differential amplifier.

1 Import Dependency pack:

Import NumPy as NP
import TensorFlow as TF
from Keras import layers from
keras.layers import Input, Add, dense , activation, zeropadding2d, batchnormalization, Flatten, conv2d, Averagepooling2d, Maxpooling2d, GlobalMaxPooling2D
from keras.models import Model, Load_model from
keras.preprocessing import image from
keras.utils import Layer_utils from
keras.utils.data_utils import get_file from
keras.applications.imagenet_utils Import Preprocess_input
import Pydot from
ipython.display import SVG from
keras.utils.vis_utils import model_to _dot from
keras.utils import Plot_model to
resnets_utils import * from
keras.initializers Import Glorot_uniform
import Scipy.misc from
matplotlib.pyplot import imshow
%matplotlib inline

Import Keras.backend as K
k.set_image_data_format (' channels_last ')
k.set_learning_phase (1)
2-Create residual network

In the residual network, the forward propagation process, the network of the back layer can receive the direct input of the front layer network as part of the input of the activation function of the back layer; In the reverse propagation process, the backward layer network gradient can be transmitted directly to the front layer network through the form of spanning.

The deep network model can be built by stacking the residual network module.
Because of the existence of this residual module, it is easier for the model to learn identity functions (the identity function). This tiered network module has little impact on training performance.
According to whether the input and output dimensions of the same residual network can be divided into two types of modules, identity modules and convolution modules. 2.1-identity Block

The first type of input and output dimensions are the same:
Is the dimension of the input a[l] a [l] a^{[l]} dimension = Output A[L+2] A [l + 2] a^{[l+2]}:

The above arc is called Shortcut path, and the following is called main path.
Note that the two are added before the Relu on the next level.
One of the batchnorm is to speed up the training.
Above this "skips over" 2 layer, the following this is "skips over" 3 layers:

The first module in main path: The first convolution layer conv2d, F1 F 1 f_1 filters, size = (1,1), stride= (1,1), padding set to "valid", named Conv_name_base + ' 2a ', s Eed=0 is used for random initialization of parameters. The first batchnorm is normalized along the channel direction, and the name Bn_name_base + ' 2a ' Relu activation function is not required to be named and has no super parameters.

The second module in main path: The second convolution layer conv2d, the number of filter = F2 F 2 f_2,filter shape= (f,f) (F, f) (f,f) stride= (1,1), padding is set to "Same", named For Conv_name_base + ' 2b '. Seed=0 is used for random initialization of parameters. The second batchnorm is normalized along the direction of the channel, named Bn_name_base + ' 2b '. The Relu activation function is not required to be named and has no super parameters.

The third module of main path: the third convolution layer conv2d, the number of filter = F3 F 3 f_3,shape= (1,1), stride= (1,1). Padding is set to "valid". The layer is named Conv_name_base + ' 2c '. The same seed=0 is used for random initialization of parameters. The third batchnorm is normalized along the direction of the channel, named Bn_name_base + ' 2c '. Note that there is no relu activation function after this.

Finally: Shortcut and input are summed and the result of the summation is entered into the Relu activation function. The activation function is also unnamed and has no super parameters.

Specifically implemented as follows:

# graded Function:identity_block def identity_block (X, F, filters, stage, block): "" Implementation of the Iden Tity block as defined in Figure 4 arguments:x--Input tensor to shape (m, N_h_prev, N_w_prev, N_c_prev) F- -Integer, specifying the shape of the middle CONV ' s window for the main path filters--python list of integers, defi  Ning the number of filters in the CONV layers of the main path stage--integer, used to name the layers, depending on Their position in the network blocks--string/character, used to name the layers, depending on their position in the Network RETURNS:X--output of the identity block, tensor of shape (N_h, N_w, N_c) "" "# Defining name 

    Basis conv_name_base = ' res ' + str (stage) + blocks + ' _branch ' bn_name_base = ' bn ' + str (stage) + block + ' _branch ' # Retrieve Filters F1, F2, F3 = Filters # Save the input value. 
    You'll need this later to add back to the main path. X_shortcut = X # component of main path x = conv2d (filters = F1, Kernel_size = (1, 1), strides = (1,1), padding = ' V Alid ', name = conv_name_base + ' 2a ', Kernel_initializer = Glorot_uniform (seed=0)) (x) x = batchnormalization (axis = 3, Name = Bn_name_base + ' 2a ') (x) x = Activation (' Relu ') (X) ### START CODE here ### # Second component of Main Path (≈3 lines) X = conv2d (filters = F2, Kernel_size = (f, f), strides = (1,1), padding = ' same ', name = Conv_name_bas E + ' 2b ', Kernel_initializer = Glorot_uniform (seed=0)) (x) x = batchnormalization (axis=3, name = bn_name_base + ' 2b ') (x x = Activation (' Relu ') (x) # Third component of main path (≈2 lines) x = conv2d (filters = F3, kernel_size =
    (1, 1), strides = (1,1), padding = ' valid ', name = Conv_name_base + ' 2c ', Kernel_initializer = Glorot_uniform (seed=0)) (X) x = batchnormalization (axis=3, name = bn_name_base + ' 2c ') (x) # Final Step:add shortcut value to main path, and Pass it through a relu aCtivation (≈2 lines) x = Layers.add ([X, x_shortcut]) x = Activation (' Relu ') (x) ### end CODE here ### ret Urn X

Identity Module Test:

Tf.reset_default_graph () with

TF.  Session () as test:
    np.random.seed (1)
    A_prev = Tf.placeholder ("float", [3, 4, 4, 6])
    X = Np.random.randn (3, 4, 4, 6)
    A = Identity_block (A_prev, f = 2, filters = [2, 4, 6], stage = 1, block = ' A ')
    Test.run (tf.global_variables _initializer ()) Out
    = Test.run ([A], feed_dict={a_prev:x, K.learning_phase (): 0})
    print ("out =" + str (out[0][1) [1] [0]))

The output results are as follows:

out = [0.94822985  0.          1.16101444  2.747859    0.          1.36677003]
2.2-convolutional block convolution module

The convolution module is another type of residual network except the identity module. When the input and output dimensions do not match, the module can be used for processing. The convolution module differs from identity block in that shortcut path has an extra convolution layer.

The conv2d on shortcut path is to reshape the dimension of the input x x x so that it can be added and manipulated on main path. For example, to halve both length and width, we can use the 1x1 convolution with a stride of 2 of the convolution layer operation. The conv2d layer on shortcut path does not use any non-linear activation function. This is because the function of the layer is to learn a linear function to reduce the dimension of input to match the input.

The first component of main path: the first convolution layer conv2d, the number of filter F1 F 1 f_1,shape = (1,1), stride= (s,s), padding set to "valid", named Conv_name_base + ' 2a '. The first batchnorm is normalized along the direction of the channel, named Bn_name_base + ' 2a '. Re-enter the Relu activation function without naming and no hyper-parameters.

Second component of Main path: Second convolution layer conv2d, filter number F2 F 2 f_2,shape= (f,f), stride = (1,1), padding set to "Same", named Conv_name_base + ' 2 B '. The second batchnorm is normalized along the direction of the channel, named Bn_name_base + ' 2b '. Then enter the Relu activation function without naming and no super parameters

The third component of main path:
-Number of Conv2d,filter of the third convolution layer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.