Learning notes TF031: VGGNet and tf031vggnet

Source: Internet
Author: User

Learning notes TF031: VGGNet and tf031vggnet

VGGNet, Visual Geometry Group of Oxford University, and Google DeepMind jointly developed a deep convolutional neural network. VGGNet repeatedly stacked 3x3 Small convolution core and 2x2 largest pooled layer, successfully constructed 16 ~ 19-layer Deep convolutional neural network. Compared with the state-of-the-art network structure, the error rate is decreased, and the ILSVRC 2014 game category is set to 2nd and the location is set to 1st. Strong scalability and good generalization of data migrated to other images. The structure is concise, and the entire network uses the same size convolution kernel size and the maximum pooled size. After VGGNet training, the model parameters are officially open-source and domain specific image classification tasks are re-trained to provide better initialization weights.

                       ConvNet Configuration    A           A-LRN      B         C         D         E    weight layers 11 11    13        16        16        19                       input(224x224 RGB image)    conv3-64  conv3-64  conv3-64  conv3-64  conv3-64  conv3-64                 LRN    conv3-64  conv3-64  conv3-64  conv3-64                             maxpool    conv3-128 conv3-128 conv3-128 conv3-128 conv3-128 conv3-128                        conv3-128 conv3-128 conv3-128 conv3-128                             maxpool    conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256    conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256                        conv1-256 conv3-256 conv3-256 conv3-256                              maxpool    conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512    conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512                        conv1-512 conv3-512 conv3-512 conv3-512                             maxpool    conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512    conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512                        conv1-512 conv3-512 conv3-512 conv3-512                             maxpool                             FC-4096                             FC-4096                             FC-1000                             soft-max    Network                 A,A-LRN   B    C    D    E    Number of parameters      133    133  134  138  144

 

The number of convolution layer parameters is small, and the last three full connection layer parameters are many. Training takes a long time in Convolution, resulting in a large amount of computing. D is the VGGNet-16, E is the VGGNet-19. C has three 1x1 convolution layers more than B, linear transformation, and the number of input and output channels remain unchanged without dimensionality reduction.

VGGNet 5-segment convolution, 2 ~ 3 convolution layers, each segment is followed by the largest pooled layer to narrow down the image size. The number of Convolutional kernels is the same for each segment. The larger the number of Convolutional kernels, the more 64-128-256-512-512. Multiple 3x3 convolution layers are stacked. Two 3x3 convolution layers are connected in series with equivalent 1 5x5. Three 3x3 convolution layers are connected in series with equivalent 7X 7. Fewer parameters, more nonlinear transformations, and enhanced feature learning capabilities.

First training level A simple network, and then reusing A network weight as A complex model, the training convergence speed is faster. Prediction, Multi-Scale, image scale Q, image input convolution network computing. At last, the convolution layer, sliding window classification prediction, average results of different windows classification, and average results of Q results of different sizes are obtained, which improves the image data utilization and prediction accuracy. During the training process, the Multi-Scale data is used for enhancement. The original image scales to different sizes, and the 224x224 image is randomly cropped to increase the data volume and prevent overfitting.

The LRN layer does not function much. The deeper the network, the better the effect. 1x1 convolution is very effective, but the larger convolution core can learn more space features.

Load the system library and TensorFlow.

The conv_op function creates a convolution layer and stores parameters in the parameter list. Input, input_op tensor, name layer name, kh kernel height, convolution kernel height, kw kernel width, convolution kernel width, n_out Number of convolution kernels, number of output channels, dh step height, dw step width, the p parameter list. Get_shape () [-1]. value gets the number of input_op channels. Tf. name_scope (name) sets scope. Tf. get_variable creates kernel (convolution kernel), shape [kh, kw, n_in, n_out], convolution kernel height and width, and input and output channels. Tf. contrib. layers. xavier_initializer_conv2d () parameter initialization.

Tf. nn. conv2d Convolution Processing input_op. Convolution kernel, step dhxdw, paddings mode SAME. Tf. constant values biases 0, tf. Variable to train parameters. Tf. nn. bias_add adds conv and bias, and tf. nn. relu produces activation for nonlinear processing. Create a convolution layer. The kernel and biases parameters are added to the parameter list p. The convolution layer outputs the activation returned.

The fc_op function is created at the full connection layer. Obtain the number of input_op channels. Tf. get_variable: create a full connection layer parameter. The number of input channels in the first dimension is n_in, and the number of output channels in the second dimension is n_out. Initialize the xavier_initializer parameter. Biases initializes 0.1 to avoid dead neuron. The tf. nn. relu_layer matrix is multiplied by input_op and kernel, and biases and ReLU are non-linear. activation is exchanged. The parameter list p and activation are added to the kernel and biases parameters of the full connection layer.

Defines the mpool_op function for creating the largest pooling layer. Tf. nn. max_pool, input input_op, pooled size khxkw, step dhxdw, padding mode SAME.

VGGNet-16 network structure, six parts, the first five segments of the convolutional network, the last segment of the fully connected network. Define the VGGNet Network Structure Function inference_op. Enter input_op and keep_prob (control dropout ratio, placeholder ). Initialize parameter list p first.

Create the first convolutional network, two conv_op layers, and one mpool_op layer ). Convolution core size 3x3, convolution core Quantity (number of output channels) 64, step 1x1, full pixel scan. The input_op size of the first convolution layer is 224x224x3, and the output size is 224x224x64. The input and output dimensions of the second convolution layer are 224x224x64. The maximum pool layer is 2x2, and the output is 112x112x64.

The second convolutional network has two convolution layers and one largest pooling layer. The number of convolution output channels is 128. The output size is 56x56x128.

The third convolutional network has three convolution layers and one largest pooling layer. The number of convolution output channels is 256. The output size is 28x28x256.

The fourth convolution network has three convolution layers and one largest pooling layer. The number of convolution output channels is 512. Output size: 14x14x512.

The fifth section is a convolutional network with three convolution layers and one largest pooling layer. The number of convolution output channels is 512. The output size is 7x7x512. For each sample in the output result, tf. reshape is flat to a one-dimensional vector with a length of 7x7x512 = 25088.

Connect to the full connection layer of the 4096 hidden point and activate the ReLU function. Connected to the Dropout layer, the training node retention rate is 0.5, with a prediction of 1.0.

Full connection layer, Dropout layer.

Connect the 1000 hidden point full connection layer and the Softmax classification output probability. Tf. argmax maximum output probability category. Fc8, softmax, predictions, and parameter list p are returned.

VGGNet-16 network structure construction is complete.

The evaluation function time_tensorflow_run. The session. run () method introduces feed_dict to facilitate the transfer of keep_prob to control the Dropout layer retention ratio.

Evaluate the main function run_benchmark. Evaluate the computing performance of forward (inference) and backward (trainning. Generates a random image of 224x224 size. The tf. random_nornal function generates a random number with a standard deviation of 0.1 normal distribution.

Create keep_prob placeholder, call the inference_op function to build the VGGNet-16 network structure, get predictions, softmax, fc8, parameter list p.

Create a Session and initialize global parameters. Set keep_prob 1.0 prediction. Time_tensorflow_run evaluates the forward operation time.

Computing VGGNet-16 final full connection layer output fc8 l2 loss. Tf. gradients evaluate the parameter gradients of all loss models. Time_tensorflow_run evaluates the backward operation time. Target is grad, keep_prob 0.5, which is used to solve the gradient operation. Set batch_size 32.

Run the evaluate main function run_benchmark () to test the time consumption of the VGGNet-16 TensorFlow forward and backward. Forward consumes an average of 0.152 s for each batch. Backward solves the gradient, with an average batch time of 0.617 s.

VGGNet, 7.3% error rate. Deeper networks, smaller convolution kernels, and implicit regularization.

    from datetime import datetime    import math    import time    import tensorflow as tf    def conv_op(input_op, name, kh, kw, n_out, dh, dw, p):        n_in = input_op.get_shape()[-1].value        with tf.name_scope(name) as scope:            kernel = tf.get_variable(scope+"w",                                 shape=[kh, kw, n_in, n_out],                                 dtype=tf.float32,                                  initializer=tf.contrib.layers.xavier_initializer_conv2d())            conv = tf.nn.conv2d(input_op, kernel, (1, dh, dw, 1), padding='SAME')            bias_init_val = tf.constant(0.0, shape=[n_out], dtype=tf.float32)            biases = tf.Variable(bias_init_val, trainable=True, name='b')            z = tf.nn.bias_add(conv, biases)            activation = tf.nn.relu(z, name=scope)            p += [kernel, biases]            return activation    def fc_op(input_op, name, n_out, p):        n_in = input_op.get_shape()[-1].value        with tf.name_scope(name) as scope:            kernel = tf.get_variable(scope+"w",                                 shape=[n_in, n_out],                                 dtype=tf.float32,                                  initializer=tf.contrib.layers.xavier_initializer())            biases = tf.Variable(tf.constant(0.1, shape=[n_out], dtype=tf.float32), name='b')            activation = tf.nn.relu_layer(input_op, kernel, biases, name=scope)            p += [kernel, biases]            return activation    def mpool_op(input_op, name, kh, kw, dh, dw):        return tf.nn.max_pool(input_op,                          ksize=[1, kh, kw, 1],                          strides=[1, dh, dw, 1],                          padding='SAME',                          name=name)    def inference_op(input_op, keep_prob):        p = []        # assume input_op shape is 224x224x3        # block 1 -- outputs 112x112x64        conv1_1 = conv_op(input_op, name="conv1_1", kh=3, kw=3, n_out=64, dh=1, dw=1, p=p)        conv1_2 = conv_op(conv1_1,  name="conv1_2", kh=3, kw=3, n_out=64, dh=1, dw=1, p=p)        pool1 = mpool_op(conv1_2,   name="pool1",   kh=2, kw=2, dw=2, dh=2)        # block 2 -- outputs 56x56x128        conv2_1 = conv_op(pool1,    name="conv2_1", kh=3, kw=3, n_out=128, dh=1, dw=1, p=p)        conv2_2 = conv_op(conv2_1,  name="conv2_2", kh=3, kw=3, n_out=128, dh=1, dw=1, p=p)        pool2 = mpool_op(conv2_2,   name="pool2",   kh=2, kw=2, dh=2, dw=2)        # # block 3 -- outputs 28x28x256        conv3_1 = conv_op(pool2,    name="conv3_1", kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)        conv3_2 = conv_op(conv3_1,  name="conv3_2", kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)        conv3_3 = conv_op(conv3_2,  name="conv3_3", kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)            pool3 = mpool_op(conv3_3,   name="pool3",   kh=2, kw=2, dh=2, dw=2)        # block 4 -- outputs 14x14x512        conv4_1 = conv_op(pool3,    name="conv4_1", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)        conv4_2 = conv_op(conv4_1,  name="conv4_2", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)        conv4_3 = conv_op(conv4_2,  name="conv4_3", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)        pool4 = mpool_op(conv4_3,   name="pool4",   kh=2, kw=2, dh=2, dw=2)        # block 5 -- outputs 7x7x512        conv5_1 = conv_op(pool4,    name="conv5_1", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)        conv5_2 = conv_op(conv5_1,  name="conv5_2", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)        conv5_3 = conv_op(conv5_2,  name="conv5_3", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)        pool5 = mpool_op(conv5_3,   name="pool5",   kh=2, kw=2, dw=2, dh=2)        # flatten        shp = pool5.get_shape()        flattened_shape = shp[1].value * shp[2].value * shp[3].value        resh1 = tf.reshape(pool5, [-1, flattened_shape], name="resh1")        # fully connected        fc6 = fc_op(resh1, name="fc6", n_out=4096, p=p)        fc6_drop = tf.nn.dropout(fc6, keep_prob, name="fc6_drop")        fc7 = fc_op(fc6_drop, name="fc7", n_out=4096, p=p)        fc7_drop = tf.nn.dropout(fc7, keep_prob, name="fc7_drop")        fc8 = fc_op(fc7_drop, name="fc8", n_out=1000, p=p)        softmax = tf.nn.softmax(fc8)        predictions = tf.argmax(softmax, 1)        return predictions, softmax, fc8, p        def time_tensorflow_run(session, target, feed, info_string):        num_steps_burn_in = 10        total_duration = 0.0        total_duration_squared = 0.0        for i in range(num_batches + num_steps_burn_in):            start_time = time.time()            _ = session.run(target, feed_dict=feed)            duration = time.time() - start_time            if i >= num_steps_burn_in:                if not i % 10:                    print ('%s: step %d, duration = %.3f' %                       (datetime.now(), i - num_steps_burn_in, duration))                total_duration += duration                total_duration_squared += duration * duration        mn = total_duration / num_batches        vr = total_duration_squared / num_batches - mn * mn        sd = math.sqrt(vr)        print ('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %           (datetime.now(), info_string, num_batches, mn, sd))    def run_benchmark():        with tf.Graph().as_default():            image_size = 224            images = tf.Variable(tf.random_normal([batch_size,                                               image_size,                                               image_size, 3],                                               dtype=tf.float32,                                               stddev=1e-1))            keep_prob = tf.placeholder(tf.float32)            predictions, softmax, fc8, p = inference_op(images, keep_prob)            init = tf.global_variables_initializer()            config = tf.ConfigProto()            config.gpu_options.allocator_type = 'BFC'            sess = tf.Session(config=config)            sess.run(init)            time_tensorflow_run(sess, predictions, {keep_prob:1.0}, "Forward")            objective = tf.nn.l2_loss(fc8)            grad = tf.gradients(objective, p)            time_tensorflow_run(sess, grad, {keep_prob:0.5}, "Forward-backward")    batch_size=32    num_batches=100    run_benchmark()

 

References:
TensorFlow practice

Welcome to paid consultation (150 RMB per hour), My: qingxingfengzi

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.