Learning notes TF032: Implementing Google Inception Net and tf032inception

Source: Internet
Author: User

Learning notes TF032: Implementing Google Inception Net and tf032inception

Google Inception Net, No. 1 in ILSVRC 2014. Control the calculation amount and parameter quantity, and the classification performance is very good. V1, top-5 Error Rate 6.67%, 22 layers, 1.5 billion floating point operations, 5 million parameters (AlexNet 60 million ). V1 reduces the number of parameters. The larger the model, the larger the data size, and the higher the cost of high-quality data. The more parameters, the larger the computing resource consumption. The number of layers in the model is deeper, and the expression capability is stronger. The final full connection layer is removed. The global average pooled layer is used (the image size is changed to 1x1), and the parameters are greatly reduced, making model training faster, inception Module improves the efficiency of parameter utilization in large networks and small networks. Added branch networks, NIN cascade convolution layer, and NLPConv layer. Generally, the convolution layer increases the number of output channels, improves the expression capability, increases the calculation workload, and overfits. Each output channel corresponds to a filter. The same filter shares parameters and can only extract one type of features. NIN, output channel group protection information. MLPConv, common convolution layer, connected to 1x1 convolution, ReLU activation function.

Inception Module structure, with four branches. The first branch is input 1x1 convolution. 1x1 convolution, organizes information across channels, improves network expression capability, and increases or decreases the output channel dimension. Each of the four branches uses 1x1 convolution to perform cross-Channel feature transformation at a low cost. The second branch, 1X1 convolution, 3x3 convolution, and two feature transformations. The third branch is 1x1 convolution and 5x5 convolution. The fourth branch, with a pooling of 3x3 and 1x1 convolution. 1x1 convolution is cost-effective, with a small amount of computing, feature transformation, and non-linearity. Merge after four branches (aggregation of the number of output channels ). The Inception Module consists of three convolution types with different sizes and one largest pool to increase adaptability to different scales. The network depth and width are efficiently expanded to improve accuracy, but the fitting is not acceptable.

Inception Net, find the optimal sparse structure unit (Inception Module ). Hebbian principle: sustained and repetitive neuroreflection activities, persistent enhancement of neuron connection stability, close distance between two neuron cells, involvement in the other's repetition, continuous excitation, metabolic changes become excited cells of the other side. The neurons that emit together will be connected together (Cells that fire together, wire together). The stimulation of the learning process increases the SYN strength between neurons. Provable Bounds for Learning Some Deep Representations, a large sparse Neural Network expresses the probability distribution of datasets. The optimal network construction method is layer-by-layer construction. The upper layer is highly correlated (correlated) node clustering, and each small cluster is connected together. Highly correlated nodes are connected together.

Image Data, highly correlated data in neighboring regions, and convolution of adjacent pixels. Output results of multiple convolution kernels at the same spatial location and different channels are highly correlated. A little larger convolution (3x3, 5x5), the connection node has a high correlation, and a large-size convolution is used as appropriate to increase the diversity (diversity ). Inception Module 4 branch, small convolution of different sizes (1x1, 3x3, 5x5), connecting highly correlated nodes.

Inception Module, 1X1 convolution ratio (proportion of the number of output channels) is the highest, and 3x3 and 5x5 convolution is slightly lower. Multiple Inception modules are stacked throughout the network. The Inception Module gradually reduces the concentration of convolution space, captures larger area features, and captures higher level abstract features. Back-to-Back Inception Module, with a larger percentage of Large-area convolution kernels (number of output channels) of 3x3 and 5x5.

Inception Net 22 layers, the last layer of output, the intermediate node classification effect is good. The secondary classification node (auxiliary classifiers) is used, and the intermediate layer outputs the node as a classification node. The result is added to the final classification result by a smaller weight (0.3. Equivalent Model fusion, adding reverse propagation gradient signals to the network and providing additional regularization.

Google Inception Net family: "Going Deeper with Convolutions" Inception V1 in September 2014, top-5 Error Rate: 6.67%. In February 2015, "Batch Normalization: Accelerating Deep Network Trainign by allocation cing Internal Covariate" Inception V2, the top-5 error rate was 4.8%. In December 2015, Rethinking the Inception Architecture ofr Computer Vision, Inception V3, top-5, with an error rate of 3.5%. February 2016 Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Inception V4, top-5 Error Rate 3.08%.

Inception V2 uses two 3x3 convolution instead of the 5x5 convolution to reduce the number of parameters and overfitting. A Batch Normalization method is proposed. BN, a very effective regularization method that accelerates the training speed of large-scale convolutional networks many times. After convergence, the classification accuracy is greatly improved. BN processes the Internal normalization of each mini-batch data, and the output is normalized to N () normal distribution, reducing Internal Covariate Shift (Internal neuron distribution changes ). In traditional deep neural networks, the input distribution of each layer changes with a small learning rate. The learning rate of each layer of BN increases by many times. The number of iterations is only 1/14 of the original time, and the training time is shortened. BN regularization reduces or removes Dropout and simplifies the network structure.

Increase the learning rate, accelerate the learning attenuation, apply BN normalized data, remove Dropout, reduce L2 regular expressions, remove LRN, and thoroughly shuffle training samples, reduce data optical distortion during data enhancement (BN training is faster, samples are trained less frequently, and more real samples are helpful for training ).

Inception V3 introduces the Factorization into small convolutions idea. Large two-dimensional convolution is split into two smaller one-dimensional convolution, which saves a lot of parameters, accelerates computation, reduces overfitting, and adds a layer of attention linearity, extends the model expression capability. Asymmetric convolution structure splitting is more effective than symmetric splitting with the same small convolution kernel. It processes more and richer spatial features and increases feature diversity.

Optimize the Inception Module structure, 35x35, 17x8. Use the branch In the branch. The 8x8 structure is Network In Network. V3 integrates with Microsoft ResNet.

Use tf. contrib. slim to design the 42-layer Inception V3 network.

Inception V3 Network Structure
Type kernel size/step size (or comment) Input size
Convolution 3x3/2 299x299x3
Convolution 3x3/1 149x149x32
Convolution 3x3/1 147x147x32
Pooled 3x3/2 147x147x64
Convolution 3x3/1 73x73x64
Convolution 3x3/2 71x71x80
Convolution 3x3/1 35x35x192
Inception module group 3 InceptionModule 35x35x288
Inception module Group Five inceptionmodules 17x17x768
Inception module group 3 InceptionModule 8x8x1280
Pooled 8x8 8x8x2048
Linear logits 1x1x2048
Softmax category output 1x1x1000

Define the simple function trunc_normal to produce the truncation normal distribution.

Define the inception_v3_arg_scope function to generate the default parameters of common network functions, including convolution activation function, weight initialization method, and standardization tool. Set L2 regular weight_decay default value 0.00004, standard deviation stddev default value 0.1, parameter batch_norm_var_collection default value moving_vars.

Define the batch normalization parameter dictionary, define the attenuation coefficient decay 0.997, epsilon 0.001, updates_collections as tf. GraphKeys. UPADTE_OPS, and set None, moving_mean, and moving_variance in the dictionary variables_collections.

Slim. agr_scope. The default value is automatically assigned to function parameters. With slim. arg_scope ([slim. conv2d, slim. fully_connected], weights_regularizer = slim. l2_regularizer (weight_decay), for [slim. conv2d, slim. fully_connected] Two function parameters are assigned automatically. The weights_regularizer value is set to slim by default. l2_regularizer (weight_decay ). You do not need to set the parameters again each time. You only need to set the parameters when you modify them.

Nest a slim. arg_scope, convolution layer generation function slim. assign the default value to the conv2d parameter. Set the weight initializer to trunc_normal (stddev), the activation function to ReLU, and the standardization tool to slim. batch_norm, The canonicalizer parameter is set to batch_norm_params, And the defined scope is returned.

Define the function inception_v3_base to generate Inception V3 network convolution. Input Image Data tensor by the inputs parameter. The default parameter environment of the scope function. Define the dictionary table end_points and save key nodes. Slim. agr_scope, set the default values of the slim. conv2d, slim. max_pool2d, and slim_avg_pool2d functions, stride 1, and padding. Non-Inception Module convolution layer. slim. conv2d creates a convolution layer. The first parameter is tensor, the second parameter is the number of output channels, the third parameter is the convolution kernel size, the fourth parameter step is stride, and the fifth parameter is padding mode. The first convolution layer outputs 32 channels, convolution core size 3x3, step 2, padding mode VALID.

Non-Inception Module convolution layer, mainly using 3x3 Small convolution core. Factorization into small convolutions: Two-dimensional convolution is used to simulate large-sized 2-dimensional convolution, which reduces the number of parameters and increases non-linearity. 1x1 convolution, low-cost cross-Channel feature combination. Step 2 of the first convolutional layer and step 1 of the remaining convolutional layer. The size of the pooling layer is 3x3, and the step size is 2. The network input data volume is 299x299x3. After 3 step steps and 2 layers, the size is reduced to 35x35x192. The space size is greatly reduced, and the output channel is increased a lot. A total of five convolution layers and two pooling layers are used to compress input image data and abstract image features.

Three consecutive Inception module groups.

The structure of the 1st Inception Module group is similar to that of the Inception Module.

1st Inception Module group: 1st Inception modules, named Mixed_5b. Slim. arg_scope sets the default parameters of all Inception module groups. The step size of all convolution layers, maximum pooling, and average pooling layers is set to 1, and the padding mode is set to SAME. Set Inception Module variable_scope name Mixed_5b. Four Branches: Branch_0 to Branch_3. The first branch 64 output channel is 1x1 convolution. The 48 output channels of the second branch are 1x1 convolution, and 64 output channels are connected to 5x5 convolution. The third branch 64 output channel is 1x1 convolution, which connects 2 96 output channels 3x3 convolution. The fourth branch is pooled by an average of 3x3 and connects 32 output channels with 1x1 convolution. Finally, tf. concat merges 4 branch outputs (the third-dimension output channel merges) to generate the final output of the Inception Module. The step size of all layers is 1, the padding model is SAME, and the image size is not reduced. The size is 35x35, the number of channels increases, and the number of 4 branch channels and 64 + 64 + 96 + 32 = 256, the final output tensor size is 35x35x256.

1st Inception Module group: 2nd Inception modules, named Mixed_5c. Step 1: padding model SAME. Four branches, and the fourth branch is connected to 64 output channels with 1x1 convolution. The size of the output tensor is 35x35x288.

1st Inception Module group: 3rd Inception modules, named Mixed_5d. The size of the output tensor is 35x35x288.

Five Inception modules in the 2nd Inception Module group. The structure of the Inception Module from 2nd to 5th is similar.

2nd Inception Module group: 1st Inception modules, named Mixed_6a. Three branches. The first branch 384 output channel is 3x3 convolution, step size 2, padding mode VAILD, and the image size is compressed to 17x17. The second branch has 3 layers, 64 output channels 1 X1 convolution, two 96 output channels 3 X3 convolution, the last layer step 2, padding mode VAILD, and branch output tensor size 17x17x96. The third branch has a maximum pooling layer of 3x3, step size of 2, padding mode VAILD, and branch output tensor size of 17x17x256. Merge output channels with three outputs. The final output size is 17x17x (384 + 96 + 256) = 17x17x768. 2nd the size of five Inception modules in the Inception Module group is the same.

2nd Inception Module group: 2nd Inception modules, named Mixed_6b. Four Branches. 1x1 convolution of the output channel of the first branch 192. Layer 3 of the second branch, Layer 1 128 output channel 1 X1 convolution, Layer 2 128 output channel 1x7 convolution, and Layer 3 192 output channel 7x1 convolution. Factorization into small convolutions is a series of 1x7 convolution and 7x1 convolution. It is equivalent to 7x7 convolution, with a large reduction in parameters, reduced overfitting, added an activation function, and enhanced nonlinear feature transformation. Layer 3 branch layer 5, Layer 1 128 output channel 1 X1 convolution, Layer 2 128 output channel 7 X1 convolution, Layer 3 128 output channel 1 X7 convolution, Layer 4 128 output channel 7 X1 convolution, fifth Layer 192 output channel 1x7 convolution. Factorization into small convolutions model, repeatedly split 7x7 convolution. The 3x3 average pooling layer of the fourth branch connects the 1x1 convolution of the 192 output channel. After merging the four branches, the tensor size is 17x17x (192 + 192 + 192 + 192 + 192) = 17x17x768.

2nd Inception Module group: 3rd Inception modules, named Mixed_6c. The number of output channels in the first several convolution layers of the second and third branches changes from 128 to 160, and the number of output channels is still 192. Each time the network passes through an Inception Module, even if the output size remains unchanged, the features are refined again to enrich convolution and non-linearity, improving network performance.

2nd Inception Module group: 4th Inception modules, named Mixed_6d.

2nd Inception Module group: 5th Inception modules, named Mixed_6e. Mixed_6e stores end_points for the Auxiliary Classifier model classification.

There are 3 Inception modules in the 3rd Inception Module group. The structure of the Inception Module from 2nd to 3rd is similar.

3rd Inception Module group: 1st Inception modules, named Mixed_7a. Three branches. Layer 2 of the first branch, 192 output channel 1 X1 convolution, connect 320 output channel 3 X3 convolution, step 2, padding mode VAILD, image size compressed to 8x8. Layer 4 of the second branch, 192 output channel 1 X1 convolution, 192 output channel 1 X7 convolution, 192 output channel 7 X1 convolution, 192 output channel 3 X3 convolution, last Layer Step 2, padding mode VAILD, branch output tensor size: 8x8x192. The third branch has a maximum pooling layer of 3x3. The step size is 2. the padding mode is VAILD. The pooling layer does not change the output channel. The branch output tensor size is 8x8x768. The output channels are merged with three outputs. The final output size is 8x8x320 (192 + 768 + 1280) = 8x8 x. From this Inception Module, the size of the output image is reduced, the number of channels is increased, and the total size of the tensor is decreased.

3rd Inception Module group: 2nd Inception modules, named Mixed_7b. Four Branches. 1x1 convolution of the output channel of the first branch 320. The second branch, the first layer 384 output channel 1 X1 convolution, the second layer 2 branches, the 384 output channel 1 X3 convolution and the 384 output channel 3 X1 convolution, with tf. concat merges two branches to obtain the output tensor size of 8x8x384 (384 + 768) = 8x8 x. Third Branch: Layer 1 448 output channel 1 X1 convolution, Layer 2 384 output channel 3 X3 convolution, Layer 3 2 branch, 384 output channel 1 X3 convolution, and 384 output channel 3 X1 convolution, merge to obtain the 8x8x768 output tensor. The 3x3 average pooling layer of the fourth branch connects the 1x1 convolution of the 192 output channel. After merging the four branches, the tensor size is 8x8x320 (768 + 768 + 192 + 2048) = 8x8 x. This Inception Module increases the number of output channels from 1280 to 2048.

3rd Inception Module group: 3rd Inception modules, named Mixed_7c. Return the Inception Module result and make the final output of the inception_v3_base function.

Inception V3 network structure. First, five convolution layers and two pooling layers alternate with the normal structure, three Inception Module groups. Each Module group contains multiple structures similar to Inception modules. Design the important principle of Inception Net. The image size is constantly reduced from 299x299 to 8 through 5 step sizes of 2 convolution layers or pooling layers. The number of output channels continues to increase, from 3 (RGB) to 2048. Each layer is convolutionalized, pooled, or Inception module, which simplifies the spatial structure, converts spatial information into high-level Abstract Feature Information, and converts spatial dimensions into channel dimensions. The total size of the tensor output at each layer keeps decreasing, reducing the calculation workload. Inception Module law, generally four branches, 1st branches 1x1 convolution, 2nd branches 1x1 convolution, after decomposition (factorized) 1xn and nx1 convolution, 3rd branches and 2nd branches are similar, deeper, 4th branch pooling or average pooling. Inception Module, through combination of simple feature abstraction (branch 1), more complex feature abstraction (branch 2, Branch 3), a simplified structure pooled layer (branch 4 ), four different levels of feature abstraction and transformation have the choice to retain higher-order features at different layers, maximizing network expression capabilities.

Global Average pooling, Softmax, and Auxiliary Logits. Input parameter of the inception_v3 function. num_classes requires the number of categories. The default value is 1000ILSVRC. The is_training flag indicates whether the training process is used. during training, Batch Normalization and Dropout are enabled, dropout_keep_prob indicates the number of nodes to be retained in the dropotr during training. The default value is 0.8. Prediction_fn category function. slim. softmax is used by default. The spatial_squeeze parameter indicates whether the output performs the squeeze operation (Dimension 1 is removed ). Reuse indicates whether the network and Variable are reused. Scope contains the default parameter environment of the function. tf. variable_scope is used to define the default values of Network name and reuse parameters. slim. arg_scope is used to define the is_trainin flag default values of Batch Normalization and Dropout. Build the entire network convolution with incepiton_v3_base, and get the final output net and the end_points of the key node dictionary table.

Auxiliary Logits assists with classification nodes to help predict classification results. Use slim. arg_scope for Convolution, maximum pooling, and average pooling to set the default step 1, and the default padding mode SAME. Use end_points to obtain Mixed_6e, and then connect to the 5x5 average pool. Step 3, set the VALID for padding, and change the output size to 17x17x768 to 5x5x768. Connect the 128 output channel with 1x1 convolution and the 768 output channel with 5x5 convolution. Weight initialization method resetting standard deviation 0.01 normal distribution, padding mode VALID, output size changed to 1x1x768. The output value is 1x1x1000. Use the tf. squeeze function to remove the first two dimensions of the output tensor. Finally, the data transmission node outputs aux_logits and stores it in the dictionary table end_points.

Classification prediction logic. Mixed_7e the final convolution layer outputs an 8x8 global average pooling. the padding mode is VALID, and the output tensor size is changed to 1x1x2048. Connect to the Dropout layer, and the node retention rate is dropout_keep_prob. Connect 1x1 convolution of 1000 of the output channels, and set the activation function and normalization function to null. Tf. squeeze removes the output tensor Dimension 1 and uses the Softmax classification prediction result. Finally, the output result logits and end_points are returned.

Inception V3. Super parameter selection, including layers, convolution kernel size, pooled location, step size, factorization time, and branch design, requires a lot of exploration and practice.

Inception V3 computing performance test. The network structure is large, so that batch_size is 32. The image size is 299x299. tf. random_uniform is used to generate random image data input. Use slim. arg_scope to load inception_v3_arg_scope (). scope contains the default Batch Normalization parameters, and the default values of activation functions and parameter initialization methods. In arg_scope, call the inception_v3 function, input inputs, and obtain logits and end_points. Create a Session and initialize all model parameters. Set to test the batch quantity by 100. Use time_tensorflow_run to test the Inception V3 Network forward performance.

Inception V3 network, the image area is 78% higher than VGGNet 224x224, and the forward speed is faster than VGGNet. 25 million parameter, more than 7 million of Inception V1, less than half of AlexNet's 60 million, much less than VGGNet's 0.14 billion. On layer 42, the floating point computing workload of the entire network is only 5 billion times, more than 1.5 billion times of Inception V1, less than VGGNet. It can be transplanted to a common server to provide a rapid response service, or transplanted to a mobile phone for real-time image recognition.

Inception V3 backward performance test: Adds all network parameters to the parameter list to test the time required to export all parameters, or directly downloads the ImageNet dataset, and uses real samples for training and evaluation.

Inception V3 and Factorization into small convolutions are very effective. They can reduce the number of parameters, reduce overfitting, and increase the non-linear network expression capability. From the input to the output of a convolutional network, the image size is gradually reduced, the number of output channels is gradually increased, the space structure is simplified, and the spatial information is converted into high-level Abstract Feature Information. The Inception Module extracts high-level features of different abstractions from multiple branches, which is very effective and enriches network expression capabilities.

 

References:
TensorFlow practice

Welcome to paid consultation (150 RMB per hour), My: qingxingfengzi

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.