Deep Learning Basics Series (i) | Understand the meanings of each layer of building a model with KERSA (the calculation method of mastering the output size and the number of parameters that can be trained)

Source: Internet
Author: User
Tags keras

When we learn the mature network model, such as Vgg, Inception, ResNet, etc., the first question is how to set the parameters of each layer of these models? In addition, if we want to design our own network model, how to set the parameters of each layer? If the model parameter setting error, in fact, the model also often can not run.

Therefore, we need to first understand the meaning of each layer of the model, such as the output size and the number of training parameters. After understanding, everyone in the design of their own network model, you can first draw a web flowchart on paper, set the parameters, calculate the output size and the number of training parameters, and finally can be encoded as a realization.

In Keras, when we build a model or get a mature model, we can often observe the information of each layer of the model through Model.summary ().

This article will be illustrated by a simple example. This example is based on a simple model vgg-like model of Keras official website, with a slight change of code as follows:

 fromTensorFlowImportKeras fromTensorflow.keras.modelsImportSequential fromTensorflow.keras.layersImportdense, dropout, Flatten fromTensorflow.keras.layersImportconv2d, maxpool2d (Train_data, Train_labels), (Test_data, Test_labels)=keras.datasets.mnist.load_data () train_data= Train_data.reshape (-1, 28, 28, 1)Print("train Data type:{}, shape:{}, dim:{}". Format (Type (train_data), Train_data.shape, Train_data.ndim))
# The first set of model=sequential () Model.add (conv2d (Filters=32, Kernel_size= (3, 3), strides= (1, 1), padding='valid', activation='Relu', Input_shape= (28, 28, 1)) Model.add (conv2d (Filters=32, Kernel_size= (3, 3), strides= (1, 1), padding='valid', activation='Relu')) Model.add (maxpool2d (pool_size= (2, 2)) ) Model.add (Dropout (0.25))
# The second group of Model.add (conv2d (Filters=64, Kernel_size= (3, 3), strides= (1, 1), padding='valid', activation='Relu')) Model.add (conv2d (Filters=64, Kernel_size= (3, 3), strides= (1, 1), padding='valid', activation='Relu')) Model.add (maxpool2d (pool_size= (2, 2)) ) Model.add (Dropout (0.25))
# Third Group Model.add (Flatten ()) Model.add (Dense (units=256, activation='Relu')) Model.add (Dropout (0.5)) Model.add (Dense (units=10, activation='Softmax') ) model.summary ()

The data in this example is from Mnist, which is the size of 28*28, the number of channels is 1, that is, only black and white two-color pictures. Where the convolution layer parameter means:

    • Filters: Indicates the number of filters, each filter will be with the corresponding input layer convolution operation;
    • Kernel_size: Indicates the size of the filter, generally odd value, such as 1,3,5, which is set to 3*3 size;
    • Strides: Indicates the step, that is, the number of steps each time the filter moves on the picture;
    • padding: Indicates whether to fill pixels at the edge of the picture, there are generally two values to choose from, one is the default valid, which means that the image size will be smaller when the convolution is not filled, the other is the same, fill the pixel, so that the output size and input size are consistent.

If you choose valid, assuming that the input size is n * N, the size of the filter is F * F, and the step is s, the size of the output picture is: [(n-f)/s + 1] * [(N-F)/s + 1)], if the calculation result is not an integer, then rounding down;

If you select Same, assuming that the input size is n * N, the size of the filter is F * F, and the edge pixel width to fill is p, then the formula for P is: n + 2p-f +1 = n, and finally P = (f-1)/2.

Running the above example, you can see the following results:

Train data type:<class ' Numpy.ndarray ', Shape: (60000, 1), dim:4________________________________________ _________________________layer (type) Output Shape Param # ================================ =================================conv2d (conv2d) (None, 26, 26, 32) 320 ________________________ _________________________________________conv2d_1 (conv2d) (None, 24, 24, 32) 9248 ________________ _________________________________________________max_pooling2d (Maxpooling2d) (None, 12, 12, 32) 0 ________         _________________________________________________________dropout (Dropout) (None, 12, 12, 32) 0 _________________________________________________________________conv2d_2 (conv2d) (None, 10, 10, 64) 18          496 _________________________________________________________________conv2d_3 (conv2d) (None, 8, 8, 64) 36928 _________________________________________________________________max_pooling2d_1 (MaxPooling2 (None, 4, 4, 64) 0 ____         _____________________________________________________________dropout_1 (Dropout) (None, 4, 4, 64) 0              _________________________________________________________________flatten (Flatten) (None, 1024)               0 _________________________________________________________________dense (dense) (None, 256) 262400 _________________________________________________________________dropout_2 (Dropout) (None, 2              0 _________________________________________________________________dense_1 (dense) (None, ten) 2570 =================================================================total params:329,962 Trainable params:329,962non-trainable params:0

Let us read, first mnist for input data, size size (60000, 28, 28, 1), this is a typical NHWC structure, i.e. (number of pictures, width, height, number of channels);

Secondly, we need to pay attention to the output shape of the table, which follows the same structure as mnist, except that the first bit is usually none, indicating that the number of pictures is to be determined, and the following three digits are calculated according to the above rules;

The final concern is that the "param" can be trained in the number of parameters, different model layer calculation method is not the same:

    • For the convolution layer, assuming that the filter size is f * F, the number of filters is n, if bias is turned on, the number of bias is fixed to 1, the number of channels for the input image is C, then the PARAM formula = (f * F * C + 1) * n;
    • For the pooling layer, flatten, dropout operation, it is not necessary to train parameters, so param is 0;
    • For fully connected layers, assuming that the input column vector size is I, the output column vector size is O, if bias is turned on, the Param is calculated as =i * o + O

The output size and number of training parameters can be computed as shown in the three set of model hierarchies divided by code:

First group:

Second group:

Third group:

  

At this point, the meaning of each layer of the model and the relevant calculation methods have been introduced, I hope this article can help you better understand the composition of the model and related calculations.

  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.