"TensorFlow Combat" tensorflow realization of the classical convolutional neural network vggnet

Source: Internet
Author: User

Vggnet

 Vggnet is a deep convolutional neural network developed by the computer Vision Group of Oxford University and a researcher at Google DeepMind. Vggnet explores the relationship between the depth of convolutional neural networks and their performance, and vggnet successfully constructs a convolutional neural network for 16~19 layers by stacking 3*3 's largest pool of small convolution cores 2*2. Vggnet compared to the previous State-of-the-art network structure, the error rate dropped sharply and achieved the first place in the ILSVRC 2014 tournament classification project and the second place in the positioning project. The vggnet structure is very concise and the entire network is using the same size convolution size 3*3 and maximum pooled size 2*2. So far, vggnet has often been used to extract image features. Vggnet Training model parameters are open source on their official web, and can be re-trained on the domain specific's image classification tasks (equivalent to providing very good initialization weights), so they are used in many places.

All of the vggnet papers used the 3*3 core and the pooled core of 2*2 to improve performance by deepening the network structure. One for vggnet levels of network structure, two for each level of the number of parameters, from the 11-tier network to the 19-tier network has detailed performance testing. The large number of parameters is the back of the fully connected layer, but the training is more time-consuming is convolution. where D, E is what we often call VGGNet-16 and VGGNet-19. C is very interesting, compared to b more than a few 1*1 convolution layer, the meaning of the 1*1 convolution mainly lies in the linear transformation, and the number of input channels and output channels are not changed, there is no dimensionality reduction.

Figure A

Figure II

 Vggnet has 5 pieces of convolution, each of which has two or three convolution layers, and each section of the tail will be connected to a maximum pool layer to reduce the size of the picture, the number of convolution cores: 64--128--256--512--512.

One of the most useful designs is that multiple identical 3*3 convolution layers are stacked together. The convolution layers of two 3*3 are concatenated together equivalent to 1 5*5 convolution layers, that is, one pixel is associated with the surrounding 5*5 pixels, which can be said to be 5*5. The concatenation of 3 3*3 convolution layer is equivalent to 1 7*7 convolution layer, and the number of parameters is less than 1 7*7 convolution layer, only the latter (3*3*3)/(7*7) =55%. Most importantly, the 3 3*3 convolution layer has more nonlinear transformations than the 1 7*7 convolution layers (the former can use 3 Relu, and the latter is intelligently used once), making CNN more capable of learning features.

Vggnet in the training of a small skill, first training level a network, and then re-use the weight of a network to initialize the following several complex models, so training convergence faster. When predicting, Vgg uses the Multi-scale method to scale the image to a size Q and input the picture into the neural network calculation. Then the last convolution layer using sliding window to classify the prediction, will not use the results of the average window classification, and then the results of different size Q to obtain the final result, which can improve the utilization of image data and improve the accuracy of prediction. At the same time in the training, Vggnet also used the Multi-scale method to do data enhancement, the original image scaled to different size s, and then randomly cut 224*224 pictures, so that can increase a lot of data, to prevent model Overfitting has a very good effect. In practice, the author makes s value in the "256,512" interval, uses Multi-scale to obtain multiple versions of the data, and trains multiple versions of the data together. Is the result of vggnet using Multi-scale training:

You can see that both D and E can achieve a 7.5% error rate. The version that was finally submitted to ILSVRC 2014 is a 6-tier network that uses only Single-scale to fuse with the Multi-scale D network, reaching a 7.3% error rate. But after the game, the authors found that only the fusion of Multi-scale D and E can achieve better results. The error rate reaches 7%, and the final error rate of the other optimization strategies is up to 6.8%, very close to that year's champion Google Inceptin Net. At the same time, the author summarizes the following points when comparing the network of all levels:

1.LRN layer has little effect;

2. The deeper the network effect, the better;

3.1*1 's convolution is also very effective, but no 3*3 convolution is good, larger convolution cores can learn a larger feature.

VGGNET-16 implementation:

 fromDatetimeImportdatetimeImportMathImport TimeImportTensorFlow as TF#defining convolutional layersdefConv_op (input_op, name, KH, KW, n_out, DH, DW, p):#Input_op is the input tensor    #Name is this layer of names    #KH is the high kernel height convolution core    #kw is the width of the kernel width convolution core    #N_out is the number of convolution cores that is the output channel    #DH is the height of the step    #DW is the width of the step    #p is the parameter listn_in = Input_op.get_shape () [ -1].value#gets the number of channels Input_opWith Tf.name_scope (name) as Scope:kernel= Tf.get_variable (scope+"W", Shape=[kh, KW, n_in, N_out], Dtype=Tf.float32, initializer=tf.contrib.layers.xavier_initializer_conv2d ()) Conv= tf.nn.conv2d (Input_op, Kernel, (1, DH, DW, 1), padding='same') Bias_init_val= Tf.constant (0.0, shape=[n_out], dtype=tf.float32) Biases= TF. Variable (Bias_init_val, Trainable=true, name='b') Z=Tf.nn.bias_add (conv, biases) activation= Tf.nn.relu (Z, name=scope) P+=[Kernel, biases]returnactivation#define an all-connected layerdefFc_op (input_op, Name, N_out, p): N_in= Input_op.get_shape () [-1].value with Tf.name_scope (name) as Scope:kernel= Tf.get_variable (scope+"W", Shape=[n_in, N_out],#[Number of channels entered, number of channels output]Dtype=Tf.float32, initializer=Tf.contrib.layers.xavier_initializer ()) Biases= TF. Variable (Tf.constant (0.1, shape=[n_out], dtype=tf.float32), name='b')#give 0.1 instead of 0 to avoid dead neuronActivation = Tf.nn.relu_layer (input_op, kernel, biases, name=scope) P+=[Kernel, biases]returnactivation#define the maximum pooling layerdefMpool_op (input_op, name, kh, KW, DH, DW):returnTf.nn.max_pool (input_op, Ksize=[1, KH, KW, 1], strides=[1, DH, DW, 1], padding='same', name=name)#Defining the VGGNET-16 network structuredefinference_op (Input_op, Keep_prob): P= []    #assume input_op shape is 224x224x3    #Block 1--outputs 112x112x64Conv1_1 = Conv_op (Input_op, name="conv1_1", Kh=3, Kw=3, n_out=64, Dh=1, Dw=1, p=p) conv1_2= Conv_op (Conv1_1, name="conv1_2", Kh=3, Kw=3, n_out=64, Dh=1, Dw=1, p=p) pool1= Mpool_op (Conv1_2, name="Pool1", kh=2, kw=2, dw=2, dh=2)    #Block 2--outputs 56x56x128Conv2_1 = Conv_op (Pool1, name="Conv2_1", Kh=3, Kw=3, n_out=128, Dh=1, Dw=1, p=p) conv2_2= Conv_op (Conv2_1, name="conv2_2", Kh=3, Kw=3, n_out=128, Dh=1, Dw=1, p=p) pool2= Mpool_op (Conv2_2, name="pool2", kh=2, kw=2, dh=2, dw=2)    #Block 3--outputs 28x28x256Conv3_1 = Conv_op (pool2, name="Conv3_1", Kh=3, Kw=3, n_out=256, Dh=1, Dw=1, p=p) Conv3_2= Conv_op (Conv3_1, name="Conv3_2", Kh=3, Kw=3, n_out=256, Dh=1, Dw=1, p=p) Conv3_3= Conv_op (Conv3_2, name="Conv3_3", Kh=3, Kw=3, n_out=256, Dh=1, Dw=1, p=p) pool3= Mpool_op (Conv3_3, name="pool3", kh=2, kw=2, dh=2, dw=2)    #Block 4--outputs 14x14x512Conv4_1 = Conv_op (Pool3, name="Conv4_1", Kh=3, Kw=3, n_out=512, Dh=1, Dw=1, p=p) Conv4_2= Conv_op (Conv4_1, name="Conv4_2", Kh=3, Kw=3, n_out=512, Dh=1, Dw=1, p=p) Conv4_3= Conv_op (Conv4_2, name="Conv4_3", Kh=3, Kw=3, n_out=512, Dh=1, Dw=1, p=p) Pool4= Mpool_op (Conv4_3, name="Pool4", kh=2, kw=2, dh=2, dw=2)    #here, every network of VGGNet-16 will reduce the edge length of the image by half, but the rule of doubling the number of convolution output channels    #Block 5--outputs 7x7x512Conv5_1 = Conv_op (Pool4, name="Conv5_1", Kh=3, Kw=3, n_out=512, Dh=1, Dw=1, p=p) Conv5_2= Conv_op (Conv5_1, name="Conv5_2", Kh=3, Kw=3, n_out=512, Dh=1, Dw=1, p=p) Conv5_3= Conv_op (Conv5_2, name="Conv5_3", Kh=3, Kw=3, n_out=512, Dh=1, Dw=1, p=p) Pool5= Mpool_op (Conv5_3, name="Pool5", kh=2, kw=2, dw=2, dh=2)    #Flatten    #The output of the fifth convolutional network is flattened and transformed into a one-dimensional vector of 7*7*512=25088.SHP =pool5.get_shape () Flattened_shape= Shp[1].value * Shp[2].value * shp[3].value Resh1= Tf.reshape (POOL5, [-1, Flattened_shape], name="Resh1")    #Fully connectedFc6 = Fc_op (Resh1, name="Fc6", n_out=4096, p=p) Fc6_drop= Tf.nn.dropout (Fc6, Keep_prob, name="Fc6_drop") Fc7= Fc_op (Fc6_drop, name="Fc7", n_out=4096, p=p) Fc7_drop= Tf.nn.dropout (Fc7, Keep_prob, name="Fc7_drop") Fc8= Fc_op (Fc7_drop, name="Fc8", n_out=1000, p=p) Softmax=Tf.nn.softmax (FC8) predictions= Tf.argmax (Softmax, 1)    returnpredictions, Softmax, FC8, p#Defining evaluation FunctionsdefTime_tensorflow_run (session, target, feed, info_string): num_steps_burn_in= 10total_duration= 0.0total_duration_squared= 0.0 forIinchRange (Num_batches +num_steps_burn_in): Start_time=Time.time () _= Session.run (Target, feed_dict=feed) Duration= Time.time ()-start_timeifI >=num_steps_burn_in:if  notI% 10:                Print('%s:step%d, duration =%.3f'%(DateTime.Now (), I-num_steps_burn_in, duration)) Total_duration+=Duration total_duration_squared+ = Duration *Duration mn= Total_duration/num_batches VR= Total_duration_squared/num_batches-mn *mn SD=math.sqrt (VR)Print('%s:%s across%d steps,%.3f +/-%.3f Sec/batch'%(DateTime.Now (), info_string, Num_batches, MN, SD))#defining the evaluation main function#input data is still randomly generated.defRun_benchmark (): with TF. Graph (). As_default (): Image_size= 224Images=TF.                                               Variable (Tf.random_normal ([Batch_size, Image_size, Image_size,3], Dtype=Tf.float32, StdDev=1e-1)) Keep_prob=Tf.placeholder (tf.float32) predictions, Softmax, FC8, p=inference_op (images, keep_prob) init=tf.global_variables_initializer () config=TF. Configproto () Config.gpu_options.allocator_type='BFC'Sess= TF. Session (config=config) sess.run (init) time_tensorflow_run (sess, predictions, {keep_prob:1.0},"Forward") Objective=Tf.nn.l2_loss (FC8) Grad=tf.gradients (objective, p) Time_tensorflow_run (Sess, Grad, {keep_prob:0.5},"Forward-backward") Batch_size=32num_batches=100 Run_benchmark () 

 VGGNet-16 's computational complexity is a lot higher than alexnet, but it also brings a great deal of accuracy.

"TensorFlow Combat" tensorflow realization of the classical convolutional neural network vggnet

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.