TensorFlow Combat Series 13--lenet-5 Model

Source: Internet
Author: User

The LENET-5 model, presented by Professor Yann LeCun in his paper gradient-basedlearning applied to document recognition in 1998, was the first volume to be successfully applied to digital recognition issues. Accumulated neural network. On the Mnist dataset, the LENET-5 model can achieve a correct rate of approximately 99.2%. The LENET-5 model has a total of 7 layers, and Figure 7 shows the architecture of the LENET-5 model.


The structure of each layer of the LENET-5 model is described in detail in the following space. In the LENET-5 model proposed by gradientbased Learning applied to Document recognition, the implementation of the convolution layer and the pool layer is slightly different from the implementation of the TensorFlow described above, where there is not much discussion Specific details, but focus on the overall framework of the model.

First layer, convolution layer
This layer of input is the original image pixel, the LENET-5 model accepts the input layer size is 32x32x1. The first convolution layer filter has a 5x5 size, a depth of 6, no full 0 padding, and a step size of 1. Because the full 0 padding is not used, the output of this layer is 32-5+1=28 and the depth is 6. This convolution layer has a total of 5x5x1x6+6=156 parameters, of which 6 are offset parameters. Because the node matrix of the next layer has 28x28x6=4704 nodes, each node is connected to the 5x5=25 current layer node, so there is a total of 4704x (25+1) = 122,304 connections in this layer of convolution layer.
Second layer, pool layer
The input of this layer is the output of the first layer, which is a 28x28x6 node matrix. The filter size used in this layer is 2x2, the length and width of the step are 2, so the output matrix of this layer is 14x14x6. The filters used in the original LeNet-5 model and some of the nuances described in this article are not specifically described here.
Third layer, convolution layer

The input matrix size for this layer is 14x14x6, the filter size used is 5x5 and the depth is 16. This layer does not use full 0 padding, and the step size is 1. The output matrix size of this layer is 10x10x16. According to the standard convolution layer, this layer should have 5x5x6x16+16=2416 parameters, 10x10x16x (25+1) = 41,600 connections.
Fourth floor, pool layer
The input matrix size of this layer is 10x10x16, the filter size is 2x2 and the step length is 2. The output matrix size of this layer is 5x5x16.
The fifth layer, the full connection layer of the input matrix size is 5x5x16, this layer is called the convolution layer in the LENET-5 model paper, but because the size of the filter is 5x5, there is no difference from the full connection layer, and the layer is also seen in the TensorFlow program implementation after the complete connection layer. If you pull the nodes in the 5x5x16 matrix into a vector, this layer is the same as the full join layer described earlier. The number of output nodes in this layer is 120, with a total of 5x5x16x120+120=48120 parameters.
Sixth floor, fully connected layer
The number of input nodes in this layer is 120, the number of output nodes is 84, the total parameter is 120x84+84=10164.
Seventh floor, fully connected layer
The number of input nodes in this layer is 84, the number of output nodes is 10, the total parameter is 84x10+10=850.
This paper introduces the structure and setup of each layer of LeNet-5 model, and gives a tensorflow program to implement a convolution neural network similar to LeNet-5 model to solve the problem of mnist digital recognition. The process of TensorFlow training convolution neural network is exactly the same as that of the training full connection neural network described above. The calculation of the loss function and the realization of the reverse propagation process can be used to reuse the mnist_train.py program given in the previous article. The only difference is that because the input layer of the convolution neural network is a three-dimensional matrix, you need to adjust the format of the data:

# Adjust the format of the input data placeholder and enter it as a four-dimensional matrix.
x = Tf.placeholder (Tf.float32, [
batch_size,
# The first dimension represents the number of examples in a BATCH.
mnist_inference. Image_size,
# The second and third dimensions represent the dimensions of the picture.
mnist_inference. Image_size,
mnist_inference. Num_channels],
name= ' x-input ')
# The four dimensions represent the depth of the picture, and for the RBG
#式的图片, the depth is 5.
# Similarly, the input training data format is adjusted to a four-dimensional matrix and the adjusted data is passed into the sess.run process.
Reshaped_xs = Np.reshape (xs, batch_size,
mnist_inference. Image_size,
mnist_inference. Image_size,
mnist_inference. Num_channels))
After the input format is adjusted, it is only necessary to implement the forward propagation process similar to the LENET-5 model structure in the program mnist_inference.py. The modified mnist_infernece.py program is given below.

#-*-Coding:utf-8-*-import tensorflow as TF # Configures the parameters of the neural network.
Input_node = 784 Output_node = image_size = Num_channels = 1 Num_labels = 10 # The size and depth of the first layer of the convolution layer.
Conv1_deep = Conv1_size = 5 # The size and depth of the second-tier convolution layer.
Conv2_deep = Conv2_size = 5 # Number of nodes in the fully connected layer. Fc_size = 512 # Defines the forward propagation process of the convolution neural network. A new parameter train is added here to differentiate the training process from the test # process.
The dropout method will be used in this program, dropout can further improve the reliability of the model and prevent the fitting, # The dropout process is only used during training. def inference (Input_tensor, Train, Regularizer): # Declares the variables of the first layer of the convolution layer and implements the forward propagation process.
    This process is consistent with the one described in the 6.3.1 section. # by using different namespaces to isolate different layers of variables, this allows the variable naming in each layer to take account of the role in the current layer, without having to worry about duplicate names. Unlike the standard LeNet-5 model, here # The defined convolution layer is entered as 28X28X1 's original mnist picture pixel.
    Because all 0 fills are used in the convolution layer, # So the output is a 28x28x32 matrix. With Tf.variable_scope (' Layer1-conv1 '): Conv1_weights = tf.get_variable ("Weight", [conv1_ SIZE, Conv1_size, Num_channels, Conv1_deep], Initializer=tf.truncated_normal_initializer (stddev=0.1)) Conv1_bi ASEs = tf.get_variable ("bias", [Conv1_deep], Initializer=tf.constant_initializer (0.0)) #With a filter with a side length of 5 and a depth of 32, the filter moves in increments of 1 and uses full 0 padding. CONV1 = tf.nn.conv2d (Input_tensor, Conv1_weights, strides=[1, 1, 1, 1], padding= ' SAME ') RELU1 = Tf.nn . Relu (Tf.nn.bias_add (CONV1, conv1_biases)) # Implements the forward propagation process of the second layer of the pool layer. The maximum pool layer is selected here, and the edges of the pool layer filter is 2, with a full 0 padding and a moving step of 2. The input of this layer is the output of the previous layer, which is the matrix of 28x28x32 #.
    The output is a 14x14x32 matrix. With Tf.name_scope (' Layer2-pool1 '): Pool1 = Tf.nn.max_pool (RELU1, Ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding= ' SAME ') # declares the variables of the third-layer convolution layer and implements the forward propagation process.
    The input of this layer is the 14x14x32 matrix.
    # output as a 14x14x64 matrix. With Tf.variable_scope (' Layer3-conv2 '): Conv2_weights = tf.get_variable ("Weight", [Conv2_size, conv2_ SIZE, Conv1_deep, Conv2_deep], Initializer=tf.truncated_normal_initializer (stddev=0.1)) conv2_biases = tf.get_ Variable ("bias", [Conv2_deep], Initializer=tf.constant_initializer (0.0)) # Using a filter with a side length of 5 and a depth of 64, the filter moves in increments of
    1, and use full 0 padding. Conv2 = tf.nn.conv2d (Pool1, Conv2_weights, strides=[1, 1, 1, 1], padding= ' SAME ') RELU2 = Tf.nn.relu (Tf.nn.bias_add (Conv2, conv2_biases)) # Implements the forward propagation process of the fourth layer of the pool layer. The structure of this layer and the second layer is the same.
    The input of this layer is the matrix of the # 14x14x64, and the output is the 7x7x64 matrix. With Tf.name_scope (' Layer4-pool2 '): Pool2 = Tf.nn.max_pool (RELU2, Ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding= ' SAME ', converts the output of layer fourth into the input format of the layer fifth full join. The fourth layer's output is the 7x7x64 matrix, # However, the input format required for the fifth layer full join layer is vector, so this 7x7x64 matrix must be pulled straight into a vector. The Pool2.get_shape function can get the dimensions of the fourth-tier output matrix without the need for manual computation.
    Note # since the input and output of each layer of neural networks is a batch matrix, the resulting dimension also contains the number of data in a # batch. Pool_shape = Pool2.get_shape (). As_list () # Calculates the length after the matrix is pulled straight into a vector, which is the product of the length and depth of the matrix.
    Notice here # Pool_shape[0] is the number of data in a batch.
    nodes = pool_shape[1] * pool_shape[2] * pool_shape[3] # Converts the fourth layer of output into a tf.reshape vector through the batch function. reshaped = Tf.reshape (Pool2, [pool_shape[0], nodes]) # Declares the variable of layer fifth full join layer and implements the forward propagation process. The input of this layer is a set of vectors that are straightened out, and the # vector length is 3136, and the output is a set of vectors with a length of 512. This layer is consistent with the basic # described in the fifth chapter, the only difference being the introduction of the dropout concept. Dropout will randomly change part of the node's # output to 0 during training.
   Dropout can avoid the problem of fitting, thus making the model more effective in testing data. # Dropout are generally used only in the full connection layer, not the convolution layer or the pool layer.
            With Tf.variable_scope (' Layer5-fc1 '): Fc1_weights = tf.get_variable ("Weight", [nodes, Fc_size],
    Initializer=tf.truncated_normal_initializer (stddev=0.1)) # Only the weights of the full connection layer need to be added to the regularization. 
        If Regularizer!= None:tf.add_to_collection (' Losses ', Regularizer (fc1_weights)) fc1_biases = Tf.get_variable ( "Bias", [fc_size], Initializer=tf.constant_initializer (0.1)) FC1 = Tf.nn.relu (Tf.matmul (reshaped, fc1_weights + fc1_biases) If train:fc1 = Tf.nn.dropout (FC1, 0.5) # Declares the variable of layer sixth full join layer and implements the forward propagation process. The input of this layer is a set of 512-length vectors, and the # output is a set of vectors with a length of 10.
    After the output of this layer passes through the Softmax, the final classification result is obtained.
            With Tf.variable_scope (' LAYER6-FC2 '): Fc2_weights = tf.get_variable ("Weight", [Fc_size, Num_labels], Initializer=tf.truncated_normal_initializer (stddev=0.1)) if Regularizer!= None:tf.add_to_collect Ion (' Losses ', Regularizer (fc2_weights)) fc2_biases = Tf.get_variable ("bias", [Num_labeLS], Initializer=tf.constant_initializer (0.1)) Logit = Tf.matmul (FC1, fc2_weights) + fc2_biases # Returns the sixth level of output
    。 Return logit
After you run the modified mnist_train.py and mnist_eval.py, you can get a test result:
$ python mnist_train.py
Extracting/home/tianlei/notebook/mnist_data/train-images-idx3-ubyte.gz
Extracting/home/tianlei/notebook/mnist_data/train-labels-idx1-ubyte.gz
Extracting/home/tianlei/notebook/mnist_data/t10k-images-idx3-ubyte.gz
Extracting/home/tianlei/notebook/mnist_data/t10k-labels-idx1-ubyte.gz
After 1 training step (s), Loss on training batch is 6.45373.
After 1001 training Step (s), loss to training batch is 0.824825.
After 2001 training Step (s), loss to training batch is 0.646993.
After 3001 training step (s), loss to training batch is 0.759975.
After 4001 training step (s), loss to training batch is 0.68468.
After 5001 training step (s), loss to training batch is 0.630368.
The above procedure can mnist the correct rate to ~99.4%.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.