Learning Note TF032: Implementing Google Inception Net

Source: Internet
Author: User
Tags define function

Google Inception NET,ILSVRC 2014 first place in the competition. Control the calculation amount, parameter quantity, the classification performance is very good. V1,top-5 Error Rate 6.67%, 22 layer, 1.5 billion floating point arithmetic, 5 million parameter (AlexNet 60 million). V1 to reduce the number of parameters, the larger the parameter model, the larger the data volume, the higher the high-quality data, and the more parameters, the greater the cost of computing resources. The model layer is deeper, the expression is more powerful, remove the last fully connected layer, with the global average pool layer (image size to 1x1), parameter reduction, model training faster, reduce overfitting ("Network in Network" thesis), Inception module to improve parameter utilization efficiency, Large network small and medium network. Increase branch network, NIN Cascade convolutional layer, NLPCONV layer. In general, the convolution layer increases the number of output channels, enhance the expression ability, increase the computational capacity, over-fitting, each output channel corresponding to a filter, the same filter shared parameters, only a class of characteristics can be extracted. NIN, output channel Group warranty information. Mlpconv, ordinary convolution layer, then 1x1 convolution, relu activation function.

INCEPTION module structure, 4 branches. First branch, enter the 1x1 convolution. 1x1 convolution, cross-channel organization of information, improve network expression ability, output channel up dimension, reduce dimension. Each of the 4 branches is a 1x1 convolution with low cost cross-channel feature transformations. Second Branch, 1x1 convolution, 3x3 convolution, two-time feature transformation. Third Branch, 1x1 convolution, 5x5 convolution. Branch four, 3x3 maximum pooling, 1x1 convolution. 1x1 convolution cost-effective, small computational capacity, feature transformation, nonlinearity. 4 Branches post-aggregation operation merge (output channel count aggregation). The Inception Module contains 3 different sizes of convolution, 1 maximum pooling, and increases the adaptability of different scales. Network depth, width and efficient expansion, improve accuracy, but fit.

Inception Net, find the optimal sparse structure unit (Inception module). Hebbian principle, the nerve reflex activity continues, repeats, the neuronal connection stability is prolonged, the two neuron cell is close, participates in the other side repeats, continues the excitement, the metabolic change becomes the other side excites the cell. Together, the firing neurons are connected (Cells, fire together,wire together), and the learning process stimulates the synaptic strength between neurons. "Provable Bounds for learning Some deep Representations", very large sparse neural network expression data set probability distribution, the best network construction method is layered. Upper-level height-dependent (correlated) node clustering, each small cluster (cluster) connected together. Correlate high node connections together.

Image data, adjacent to the area of high data correlation, the neighboring pixel dot convolution join together. Multiple convolution cores, the same spatial location, different channels convolution core output results, the correlation is very high. A little larger convolution (3x3?5x5), High connection node correlation, suitable for large-size convolution, increase diversity (diversity). Inception Module 4 Branch, a small convolution of different sizes (1x1, 3x3, 5x5), with high connection affinity nodes.

The Inception module,1x1 convolution ratio (output channel count) is the highest, and the 3x3?5x5 convolution is slightly lower. Across the network, multiple inception module stacks. The post-inception module convolution space concentration decreases, capturing larger area features and capturing higher-order abstract features. Back inception module,3x3?5x5 a large area convolution ratio (number of output channels) more.

Inception Net 22 layer, the last layer of output, intermediate node classification effect is good. Using the Secondary classification node (auxiliary classifiers), the middle-tier output is categorized and added to the final classification result by a smaller weight (0.3). A rather model fusion, which adds a reverse propagation gradient signal to the network, provides additional regularization.

Google Inception NET family: September 2014 "Going Deeper with convolutions" Inception v1,top-5 error rate 6.67%. February 2015 "Batch normalization:accelerating deep Network trainign by reducing Internal covariate" Inception v2,top-5 error rate 4.8%. December 2015 "Rethinking the Inception Architecture ofr computer Vision" Inception v3,top-5 error rate 3.5%. February 2016 inception-v4,inception-resnet and the Impact of residual Connections on learning Inception v4,top-5 error rate 3.08%.

Inception V2, using two 3x3 convolution to replace 5x5 large convolution, reduce the number of parameters, reduce overfitting, propose batch normalization method. BN, very effective regularization method, so that the large convolutional network training speed many times, convergence after the classification accuracy greatly improved. BN to each Mini-batch data internal normalization (normalization) processing, the output normalized to n (0,1) normal distribution, reducing the internal covariate Shift (internal neuron distribution changes). Traditional deep neural networks, each layer of input distribution changes, only a small learning rate. Each layer of bn learning rate increases many times, the number of iterations only the original 1/14, training time shortened. bn regularization, reduces or cancels the dropout, simplifies the network structure.

Increase learning rate, accelerate learning decay rate, apply bn normalized data, remove dropout, reduce L2, remove LRN, more thoroughly shuffle training samples, reduce data enhancement process data optical distortion (BN training is faster, samples are trained less, more realistic samples are helpful for training).

Inception V3, introducing factorization into small convolutions thought, the larger two-dimensional convolution is split into two smaller one-dimensional convolution, saving a large number of parameters, accelerating the operation, reducing overfitting, adding a layer of gossip, and extending the model's expressive ability. The asymmetric convolution structure is more obvious than the symmetric splitting of the same small convolution nucleus, which deals with more and richer spatial features and increases the feature diversity.

Optimize inception module structure, 35x35,17x17,8x8. Branches are used in branches, the 8x8 structure, and the network in Network in network. V3 combines Microsoft ResNet.

Use Tf.contrib.slim to assist in designing a 42-layer inception V3 network.

Inception V3 Network structure
Type kernel size/step (or note) input dimensions
Convolution 3X3/2 299x299x3
Convolution 3X3/1 149x149x32
Convolution 3X3/1 147x147x32
Pooling of 3X3/2 147x147x64
Convolution 3X3/1 73x73x64
Convolution 3X3/2 71x71x80
Convolution 3X3/1 35x35x192
Inception module Group 3 x inceptionmodule 35x35x288
Inception module Group 5 x inceptionmodule 17x17x768
Inception module Group 3 x inceptionmodule 8x8x1280
The pool of 8x8 8x8x2048
Linear Logits 1x1x2048
Softmax categorical Output 1x1x1000

Defines a simple function trunc_normal, which produces a truncated normal distribution.

Define the function Inception_v3_arg_scope, generate the network common function default parameters, convolution activation function, weight initialization method, the standardization of the device. Set L2 regular Weight_decay default value of 0.00004, standard deviation StdDev default value of 0.1, parameter batch_norm_var_collection default value Moving_vars.

Define the batch normalization parameter dictionary, define the attenuation factor decay 0.997,epsilon 0.001,updates_collections to TF. Graphkeys.upadte_ops, dictionary variables_collections in beta, Gamma None,moving_mean, Moving_variance set Batch_norm_var_ Collection

Slim.agr_scope, the function parameter automatically assigns the default value. With Slim.arg_scope ([slim.conv2d, slim.fully_connected], Weights_regularizer=slim.l2_regularizer (Weight_decay)), [Slim.conv2d, slim.fully_connected] Two function parameters are automatically assigned, the parameter Weights_regularizer value is set to Slim.l2_regularizer (Weight_decay) by default. You do not need to set the parameters every time, only when you modify them.

Nested a slim.arg_scope, convolution layer generation function slim.conv2d parameter assignment default value, weight initializer Weights_initializer set Trunc_normal (stddev), activate function set Relu, The Slim.batch_norm is set and the normalization parameter is set Batch_norm_params, which returns the defined scope.

Define function Inception_v3_base, generate inception V3 network convolution. Parameter inputs enter picture data tensor,scope function default parameter environment. Define the dictionary table end_points and save the key nodes. Slim.agr_scope, set slim.conv2d, slim.max_pool2d, slim_avg_pool2d function parameter default value, Stride set 1,padding set valid. Non-inception module convolution layer, slim.conv2d create convolutional layer, the first parameter input tensor, the second parameter output channel number, the third parameter convolution kernel size, the fourth parameter step stride, the five parameter padding mode. The first convolution output channel number 32, convolutional core size 3x3, step 2,padding mode valid.

Non-inception module convolution layer, mainly with 3x3 small convolution core. Factorization into small convolutions thought, using two 1-dimensional convolution to simulate large-size 2-dimensional convolution, reduce the number of parameters, increase nonlinearity. 1x1 convolution, low cost cross-channel feature combination. The first convolution step 2, the remaining convolution step 1. The size of the pool layer 3x3? Step 2 overlap maximum pooling. Network input data surprised inch 299X299X3, after 3 steps 2 layer, size reduced to 35x35x192, space size greatly reduced, output channel increased a lot. A total of 5 convolutional layers, 2 pooled layers, to achieve the input image data size compression, abstract image features.

Three consecutive inception module groups.

1th Inception Module Group 3 structure similar to inception module.

1th Inception Module Group 1th inception module, name mixed_5b. Slim.arg_scope set all inception module group default parameters, all convolutional layer, maximum pooling, average pooled layer step set 1,padding mode set same. Set the inception Module variable_scope name mixed_5b. 4 branches, branch_0 to Branch_3. First branch 64 output channel 1x1 convolution. The second branch 48 output channel 1x1 convolution, connected 64 output channel 5x5 convolution. The third branch 64 output channel 1x1 convolution, connecting 2 x 96 output channels 3x3 convolution. The four-branch 3x3 average pooling, connecting 32 output channel 1x1 convolution. Finally Tf.concat merges 4 branch outputs (the third dimension output channel is merged) to generate the Inception module final output. All layer steps are 1,padding model same, picture size does not shrink, maintain 35x35, number of channels increased, 4 branch channels and 64+64+96+32=256, final output tensor size 35x35x256.

1th Inception Module Group 2nd inception module, name MIXED_5C. Step 1,padding Model Same. 4 branches, the fourth branch is finally connected to the 64 output channel 1x1 convolution. Output tensor size 35x35x288.

1th Inception Module Group 3rd inception module, name mixed_5d. Output tensor size 35x35x288.

2nd inception Module Group of 5 inception modules. The 2nd to 5Inception module structure is similar.

2nd Inception Module Group 1th inception module, name MIXED_6A. 3 branches. The first branch 384 output channel 3x3 convolution, step 2,padding mode vaild, picture size compression to 17x17. Second Branch 3 layer, 64 output channel 1x1 convolution, two 96 output channel 3x3 convolution, last layer step 2,padding mode vaild, branch output tensor size 17x17x96. The third branch 3x3 maximum pooling layer, step 2,padding mode vaild, branch output tensor size 17x17x256. The three-branch output channels are combined and the final output size is 17x17x (384+96+256) =17x17x768. The 2nd Inception Module group has the same size of 5 inception modules.

2nd Inception Module Group 2nd inception module, name mixed_6b. 4 branches. First branch 192 output channel 1x1 convolution. Second Branch 3 layer, the first layer 128 output channel 1x1 convolution, the second Layer 128 output channel 1x7 convolution, the third layer 192 output channel 7x1 convolution. Factorization into small convolutions thought, in series 1x7 convolution and 7x1 convolution, quite synthesized 7x7 convolution, the number of parameters is greatly reduced, reduced over fitting, an activation function is added, and the nonlinear feature transform is enhanced. Third Branch 5 layer, first layer 128 output channel 1x1 convolution, second layer 128 output channel 7x1 convolution, third layer 128 output channel 1x7 convolution, fourth layer 128 output channel 7x1 convolution, fifth layer 192 output channel 1x7 convolution. Factorization into small convolutions model, repeatedly split 7x7 convolution. The four-branch 3x3 average pooling layer, connected to the 192 output channel 1x1 convolution. Four-branch merge, final output tensor size 17x17x (192+192+192+192+192) =17x17x768.

2nd Inception Module Group 3rd inception module, name mixed_6c. The number of output channels for the second and third branches is changed from 128 to 160, and the final output channel number is 192. Every time the network passes through a inception Module, the feature is re-refined again, enriched with convolution and nonlinearity, and improves network performance, even if the output size is constant.

2nd Inception Module Group 4th inception module, name MIXED_6D.

2nd Inception Module Group 5th inception module, name mixed_6e. mixed_6e storage end_points, as auxiliary classifier transmission model classification.

3rd Inception Module Group of 3 inception modules. The 2nd to 3Inception module structure is similar.

3rd Inception Module Group 1th inception module, name mixed_7a. 3 branches. First branch 2 layer, 192 output channel 1x1 convolution, connection 320 output channel 3x3 convolution, step 2,padding mode vaild, picture size compressed to 8x8. Second Branch 4 layer, 192 output channel 1x1 convolution, 192 output channel 1x7 convolution, 192 output channel 7x1 convolution, 192 output channel 3x3 convolution, last layer step 2,padding mode vaild, branch output tensor size 8x8x192. The third branch 3x3 maximum pool layer, step 2,padding mode vaild, the pool layer does not change the output channel, branch output tensor size 8x8x768. The three-branch output channels are combined and the final output size is 8x8x (320+192+768) =8x8x1280. Starting with this inception module, the output image size shrinks, the number of channels increases, and the tensor total size drops.

3rd Inception Module Group 2nd inception module, name mixed_7b. 4 branches. First branch 320 output channel 1x1 convolution. The second branch, the first layer 384 output channel 1x1 convolution, the second Layer 2 branches, 384 output channel 1x3 convolution and 384 output channel 3x1 convolution, with Tf.concat merge two branches, get output tensor size 8x8x (384+384) =8x8x768. The third branch, the first layer 448 output channel 1x1 convolution, the second layer 384 output channel 3x3 convolution, the third Layer 2 branches, 384 output channel 1x3 convolution and 384 output channel 3x1 convolution, combined to get 8x8x768 output tensor. The four-branch 3x3 average pooling layer, connected to the 192 output channel 1x1 convolution. Four-branch merge, final output tensor size 8x8x (320+768+768+192) =8x8x2048. This inception Module, the number of output channels increased from 1280 to 2048.

3rd Inception Module Group 3rd inception module, name mixed_7c. Returns the result of this inception module as the final output of the Inception_v3_base function.

Inception V3 Network structure, first 5 convolution layer and 2 pool layer alternating common structure, 3 Inception module Group, each module group contains multiple structures similar to Inception module. Design inception net important principles, the picture size is shrinking, from 299x299 through 5 Step 2 convolution layer or pool layer, reduce 8x8, output channel number continues to increase, starting from 3 (RGB tri-color) to 2048. Each layer convolution, pooling or inception module group, spatial structure simplification, spatial information conversion Higher order abstract feature information, spatial dimension to channel dimension. The total size of the output tensor per layer continues to decline, reducing the amount of computation. Inception Module Law, General 4 branches, 1th branch 1x1 convolution, 2nd branch 1x1 convolution after decomposition (factorized) 1xn and nx1 convolution, 3rd branch and 2nd branch similar, deeper, 4th branch maximum pooling or averaging pooling. Inception Module, by combining simple feature abstraction (branch 1), comparing complex feature abstraction (branch 2, Branch 3), a simplified structure pooling layer (branch 4), 4 different degrees of feature abstraction and transformation to have the choice to retain different layers of high-level features, maximizing network expression capacity.

Global average pooling, Softmax, auxiliary logits. function Inception_v3 input parameters, num_classes finally need to classify the number, the default 1000ILSVRC tournament data set type number, Is_training flag whether the training process, training batch normalization, Dropout is enabled, Dropout_keep_prob training DROPOUTR the desired retention node ratio, default 0.8. The PREDICTION_FN classification function, which uses Slim.softmax by default. The Spatial_squeeze parameter flag outputs whether the squeeze operation is performed (removing dimension 1 dimensions). Reuse flag Network and variable are reused. Scope includes function default parameter environment, define network name, reuse parameter default value with Tf.variable_scope, define batch normalization and dropout Is_ with Slim.arg_scope The Trainin flag default value. Build the entire network convolution with Incepiton_v3_base, get the last layer of output net and important node dictionary table end_points.

Auxiliary logits, a secondary classification node, helps predict classification results. Default Step 1 is set with slim.arg_scope convolution, maximum pooling, and average pooling, and the default padding mode is same. Through end_points take mixed_6e, then the 5x5 average pool, step 3,padding set valid, output size 17x17x768 change 5x5x768. 128 output channel 1x1 convolution and 768 output channel 5x5 convolution. The weight initialization method resets the standard deviation 0.01 normal distribution, the padding mode valid, the output size becomes 1x1x768. The output becomes 1x1x1000. Use the Tf.squeeze function to eliminate the first two 1 dimensions of the output tensor. Finally, the output of the AUX_LOGITS classification node is stored in the dictionary table end_points.

Categorical predictive logic. mixed_7e The final convolution layer output 8x8 global average pooling, padding mode valid, output tensor size changed 1x1x2048. Dropout layer, node retention rate dropout_keep_prob. The 1x1 convolution that connects the output channel number 1000, the activation function, the normalization function set empty. Tf.squeeze removes the output tensor dimension 1 dimensions, then softmax the predicted results. Finally, the output result is returned logits, including the end_points of the input node.

Inception V3 Network build is complete. The choice of super-parameters, including the number of layers, convolution core size, pooling position, step size, factorization use time, branch design, need a lot of exploration and practice.

Inception V3 Operational Performance test. The network structure is big, make Batch_size 32. Image size 299x299, using tf.random_uniform to generate random image data input. Load Inception_v3_arg_scope () with Slim.arg_scope, scope contains the batch normalization default parameters, activation functions and default values for parameter initialization. In Arg_scope, tune the INCEPTION_V3 function, pass in inputs, get logits and end_points. Create session, initialize all model parameters. Set the number of test batch 100, test inception V3 network forward performance with Time_tensorflow_run.

Inception V3 Network, image area than vggnet 224x224 78%,forward faster than vggnet faster. 25 million parameters, more than 7 million of the inception V1, less than half of alexnet 60 million, a lot less than vggnet 140 million. 42 layer, the whole network floating point calculation is only 5 billion times, than inception V1 1.5 billion times more, less than vggnet. Can be ported to a normal server to provide quick response service, or ported to mobile phone real-time image recognition.

Inception V3 Backward Performance test, add all parameters of the entire network to the parameter list, test the time required to take a derivative of all the parameters, or download the imagenet data set directly, and use real samples to train and evaluate the time required.

Inception v3,factorization into small convolutions is very effective, can reduce the number of parameters, reduce over-fitting, increase the network nonlinear expression ability. Convolution network from input to output, picture size gradually reduced, the number of output channels gradually increased, spatial structure simplification, spatial information into higher-order abstract characteristics information. Inception module Multiple branches extract different levels of abstraction high-order features are very effective, rich network expression ability.

Resources:
"TensorFlow Practice"

Welcome to pay consultation (150 yuan per hour), my: Qingxingfengzi

Learning Note TF032: Implementing Google Inception Net

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.