TensorFlow: Google deep Learning Framework (v) image recognition and convolution neural network

Source: Internet
Author: User
Tags join valid

6th Chapter Image Recognition and convolution neural network 6.1 image recognition problems and the classic data set 6.2 convolution neural network introduction 6.3 convolutional neural network common structure 6.3.1 convolution layer 6.3.2 Pool Layer 6.4 Classic convolutional neural network model 6.4.1 LENET-5 model 6.4.2 in Ception Model 6.5 convolution neural network to realize migration learning 6.5.1 Migration Learning Introduction

the 6th Chapter image recognition and convolution neural network

This chapter illustrates how to use tensorflow to realize convolution neural network 6.1 image recognition problem and classical data set by using CNN to realize image recognition.

1. Cifar

Cifar-10:10 60000 images of different kinds, color images with pixel size of 32*32

Cifar-100:20 a big class, the big class is subdivided into 100 small categories, each class contains 600 images. Cifar the biggest difference compared to mnist: color, and each picture contains one kind of entity classification more difficult. No matter cifar or mnist, there are two main problems compared to real-world image recognition:
Real life picture resolution is much higher than 32*32, and the resolution is not fixed; there are many kinds of objects in real life, and there is not only one object in each image;

2. ImageNet was led by Stanford Professor Li Feifei (Feifei Li) to organize the database, more close to the real life environment. The imagenet dataset has more than 14 million images covering more than 20,000 categories, of which more than millions of images have clear category labels and the location of objects in the image,

Related information:
1) Total number of Non-empty synsets:21841

2) Total number of images:14,197,122

3) Number of images with bounding box annotations:1,034,908

4) Number of synsets with SIFT features:1000

5) Number of images with SIFT features:1.2 million
Introduction to 6.2 convolutional neural Networks

The neural networks described above are all connected neural networks, and the convolutional neural network in this section is a non-fully connected neural network, and the structure of full-connection and convolutional neural networks is compared as follows:

The main difference between the two is the connection between the adjacent layers

1. Why the full connection does not work well with image data

The biggest problem is that there are too many parameters for the full-join layer, which slows down the computation and leads to overfitting problems.

2. Advantages of convolutional neural Networks

The first layers of convolutional neural network are organized into a three-dimensional matrix, it can be seen that each node of the first layer is connected to the upper part of the node, convolutional Neural network is composed of the following five parts:

1. Input Layer

Input layer is the input of the entire neural network, in the image processing, the input generally represents a picture of the pixel matrix. In the image above, the leftmost three-dimensional matrix represents a picture. The length and width of the three-dimensional matrix represent the image's size, and the depth represents the color channel of the image. From the input layer, convolutional neural network transforms the three-dimensional matrix of the previous layer into the three-dimensional matrix of the next layer through different neural network structures, knowing the final fully connected layer.

2. Convolution layer

Convolution layer is the most important part of convolutional neural network. The input of each node in the convolution layer is just a small piece of the previous layer of neural network, the size of which is usually 3∗3 or 5∗5, and the convolution layer attempts to further analyze each small block in the neural network in order to obtain a higher level of abstraction, and the node matrix processed by the convolution layer becomes deeper.

3. Pool Layer

The pooled layer neural network does not change the depth of the three-dimensional matrix, but it can reduce the size of the matrix. Pooling can be considered as converting a picture with a higher resolution to a lower resolution image. The pooling layer can further reduce the number of nodes in the last fully connected layer, thus reducing the parameters in the whole neural network.

4. Fully connected layer

After the processing of the multi-wheel convolution layer and the pooling layer, the final classification result of convolutional neural network is usually 1-2 full-connected layers. After several rounds of convolution and pooling, it can be assumed that the information in the image has been abstracted into a higher-content feature. Convolution and pooling layer can be regarded as the process of feature extraction, and after extraction, it is still necessary to use the full join layer to complete the classification task.

5. Softmax Layer (pooling layer)

The Softmax layer is mainly used for classification problems, and the current sample belongs to different kinds of probability distributions by Softmax. 6.3 general structure of convolutional neural networks 6.3.1 convolutional layer

In the TensorFlow document, the section of the image below is called "filter" or "kernel", and the filter can transform a sub-node matrix on the current layer neural network into a single-node matrix on the next layer of neural network (both length and width are 1, but the depth is not limited to the node matrix).

Filter: The size of the commonly used filter is 3*3 or 5*5, the depth of the filter processing matrix is the same as that of the current car network node matrix. Size: Filter input node matrix size depth: The depth of the output node matrix, the size of the left small matrix is the size of the filter, the right unit matrix depth is the depth of the filter. Forward propagation process: the process of calculating nodes in the right unit matrix by the node of the left small matrix

forward propagation of filters
The forward propagation process of a convolutional layer structure is achieved by moving a filter from the upper-left corner of the current layer of the neural network to the lower-right corner and calculating each corresponding unit matrix during the move.

Propagation process: upper left corner →→\to upper right corner →→\to lower left corner →→\to lower right corner

Full 0 padding: To avoid dimensional changes, you can use "Full 0 fill", which allows the two matrix sizes to be the same after the forward propagation.

Set different steps: You can also adjust the size of the matrix after the convolution

parameter sharing: The parameters in the filter used in each convolutional layer are the same (very important properties)

So that the content on the image is not affected by the position, because the filter on a graph is the same, regardless of where "1" appears in the figure, the result of the filter is the same.

Greatly reduce the parameters of the neural network

Example: convolutional layer forward propagation process with full 0 fill and step 2


The upper-left corner of the calculation method:

ReLU (0x1+0x (−1) +0x0+1x2+1) =relu (3) =3 R e L U (0x1 + 0x (−1) + 0x0 + 1x2 + 1) = R e L U (3) = 3 ReLU (0\times1 +0\times ( -1) +0\times0+1\times2+1) =relu (3) =3

tensorflow Realization of convolutional neural networks

1. Create a filter's weights and biases

Filter_weight=tf.get_variable (' weights ', [5,5,3,16],
                              Initializer=tf.truncated_initializer (stddev=0.1))
 #  T create filter weights and offsets by tf.get_variable
 #  declares a 4-dimensional matrix, the first two is the filter size, the third represents the current layer depth, and the fourth represents the depth of the filter (that is, the number of convolution cores)
Biases=tf.get_variable (' biases ', [16],initializer=tf.constant_initializer (0.1)]
 #  the offsets from different positions on the current layer matrix are also shared, So the number of offsets = the next layer of depth, in this case 16

2. Forward propagation of the convolution layer

conv=tf.nn.conv2d (input,filter_weight,strides=[1,1,1,1],padding= ' same ')
 #  Tf.nn.conv2 provides a very handy function to implement the convolution layer's forward propagation
 # The  first input: The current layer node matrix
 # (such as input layer, input[0,:,:,:] means input first image, Input[1,:,:, :] To enter the second image
 #  The second parameter: the weight of the convolution layer # The step of the  third parameter on different dimensions (the first and last dimension requirements must be 1, because the step size is only long and wide valid for The matrix)
 #  Fourth parameter: Fill method, optional ' same ' (full 0 fill)/' VALID ' (not filled)

3. Plus biased items

Bias=tf.nn.bias_add (conv,biases)
 #  Tf.nn.bias_add provides a convenient function for each node plus offset
 #  Do not use addition directly: because nodes in different positions on the matrix need to have the same offset

4. Activate

Actived_conv=tf.nn.relu (bias)
 #  Activates the calculation result through the Relu function
6.3.2 Pool Layer

function: reduce the parameter to prevent over-fitting to get translation invariance

common pooling types: maximum pooled average pooling

the scope of the pool layer: affects only one depth of the nodes in the length, width, depth of the three dimensions are to be moved

TensorFlow to realize the forward propagation of the maximal pool layer

Pool = Tf.nn.max_pool (actived_conv,ksize[1,3,3,1],strides=[1,2,2,1],padding= ' same ')

 # first parameter: Current layer node Matrix
 # The second parameter: the filter dimension
 #             gives a single array of length 4, but the first and last of the array must be 1
 # means that             the pool layer filter is not to cross the third parameter of the same example or node matrix depth
 : Step, The first and last dimension must be 1, that is, the pooling layer cannot reduce the depth of the node matrix or the number of input Samples
 # Fourth parameter: Fill method, ' same ' means full 0 padding, ' VALID ' means no padding
TensorFlow to realize the forward propagation of the average pool layer
Pool = Tf.nn.avg_pool (actived_conv,ksize[1,3,3,1],strides=[1,2,2,1],padding= ' same ')

 # first parameter: Current layer node Matrix
 # The second parameter: the size of the filter
 #             gives a one-dimensional array of length 4, but the first and last of the array must be 1
 # means that             the pool layer filter is not to cross the third parameter of the same example or node matrix depth
 : Step, The first and last dimension must be 1, that is, the pooling layer cannot reduce the depth of the node matrix or the number of input Samples
 # Fourth parameter: Fill method, ' same ' means full 0 padding, ' VALID ' means no padding

example of convolution layer and pool layer

# "TensorFlow actual combat Google Deep Learning framework" 06 image recognition and convolution neural network # WIN10 Tensorflow1.0.1 python3.5.3 # CUDA v8.0 cudnn-8.0-windows10-x64-v5.1 # filename:ts06.01.py # convolutional layer, Pooled Layer sample import TensorFlow as TF import numpy as NP # 1. Input Matrix M = Np.array ([[[[1],[-1],[0]], [[[ -1],[2],[1]], [[0],[2],[-2]]) print ("Matrix shape is: ", M.shape) # Matrix shape is: (3, 3, 1) # 2. Define the convolution filter with a depth of 1 filter_weight = tf.get_variable (' Weights ', [2, 2, 1, 1], initializer = Tf.constant_initializer ([[1, -1],[0, 2])) biases = tf.get_variable (' biases ', [1], initializer = Tf.constant_initializer (1)) # 3. Adjust the input format to meet the requirements of tensorflow m = Np.asarray (M, dtype= ' float32 ') m = M.reshape (1, 3, 3, 1) # 4. The computed matrix is computed by the convolution layer filter and the pooled layer filter result x = Tf.placeholder (' float32 ', [1, none, none, 1]) conv = tf.nn.conv2d (x, Filter_weight, strides=  [1, 2, 2, 1], padding= ' same ') bias = tf.nn.bias_add (conv, biases) pool = Tf.nn.avg_pool (x, Ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding= ' same ') with TF. Session () as Sess:tf.global_variables_initializer (). Run () Convoluted_m = Sess.run (bias, feed_dict={x:m}) Pooled_m = Sess.run (pool, feed_dict={x:m}) Print ("Convoluted_m: \ n", convoluted_m) print ("Pooled_m: \ n", Pooled_m)

Output:

Matrix shape is:  (3, 3, 1)
convoluted_m: [[[[ 
 7.],[1.]]
  [[ -1.],[-1]

] Pooled_m: [[[[ 
 0.25],[0.5]]
  [[1.  ],[-2.  ]]
6.4 Classic convolutional neural network model 6.4.1 LeNet-5 Model

Yann LeCun in 1998, was the first successful convolutional neural network for digital recognition, with a 99.2% effect on the Mnist dataset, a total of 7 layers, as shown in the following figure.

The size of the input original image is 32x32

1. Convolution layer

Input: pixels of the original image (32*32*1)

Filter: Size 5*5, depth 6, no full 0 fill, step 1

Output: Size 32-5+1=28, depth 6

Number of parameters: 5*5*1*6+6=156,

Nodes of the next layer node matrix: 28*28*6=4704, each node connected to 5*5=25 current layer node

Total number of connections in the convolution layer: 4704* (25+1) =122304

2. Pool Layer

Input: The first layer of output, is a 28*28*6 node matrix

Filter: Size is 2*2, length, width, step is 2

Output: 14*14*6

3. Convolution layer

Input: 14*14*6

Filter: Size 5*5, depth 16, 0 padding not used, step 1

Output: 10*10*16, according to the standard convolution layer, this layer should have 5*5*6*16+16=2416 parameters

Total: 10*10*16* (25+1) = 41,600 connections

4. Pool Layer

Input: 10*10*16

Filter: Size is 2*2, step is 2

Output: Matrix size is 5*5*16

5. Fully connected layer

Input: 5*5*16, originally the paper said that this layer is a convolution layer, but because the filter size is 5*5, so there is no difference with the full connection layer, then it is considered as the full connection layer. If the matrix 5*5*16 is pulled into a vector, then the fourth chapter is no different

Output: The number of nodes is 120

Total parameters: 5*5*16*120+120 parameters.

6. Fully connected layer

Input: The number of nodes is 120

Output: The number of nodes is 84

Total Parameters: 120*84+84=10164

7. Fully connected layer

Input: 84 nodes

Output: 10 nodes

Total Parameters: 84*10+10=850

code Example: lenet_inference.py

# "TensorFlow actual combat Google Deep Learning framework" 06 image recognition and convolution neural network # WIN10 Tensorflow1.0.1 python3.5.3 # CUDA v8.0 cudnn-8.0-windows10-x64-v5.1 # filename:LeNet5_infernece.py # LeNet5 forward propagate import TensorFlow as TF # 1. Set the parameters of the neural network Input_node = 784 Output_node = image_size = Num_channels = 1 Num_labels = Ten Conv1_deep = conv1_size = 5 Conv2_deep = Conv2_size = 5 Fc_size = 512 # 2. Define the process of forward propagation def inference (Input_tensor, Train, Regularizer): With Tf.variable_scope (' Layer1-conv1 '): Conv1_weigh ts = tf.get_variable ("Weight", [Conv1_size, Conv1_size, Num_channels, Conv1_deep], INITIALIZER=TF
                                       . Truncated_normal_initializer (stddev=0.1)) conv1_biases = Tf.get_variable ("bias", [conv1_deep], Initializer=tf.constant_initializer (0.0)) Conv1 = tf.nn.conv2d (Input_tensor, conv1_weights, S Trides=[1, 1, 1, 1], padding= ' same ') RELU1 = Tf.nn.relu (Tf.nn.bias_add (CONV1, Conv1_b iases)) WitH Tf.name_scope ("Layer2-pool1"): Pool1 = Tf.nn.max_pool (relu1, ksize = [1,2,2,1],strides=[1,2,2,1],padding= "Same") With Tf.variable_scope ("Layer3-conv2"): Conv2_weights = tf.get_variable ("Weight", [Conv2_size, Conv2_size, Conv1_deep, Conv2_deep], Initializer=tf.truncated_normal_initializer (stddev=0.1)) Conv2_bi ASEs = tf.get_variable ("bias", [Conv2_deep], Initializer=tf.constant_initializer (0. 0)) Conv2 = tf.nn.conv2d (Pool1, Conv2_weights, strides=[1, 1, 1, 1], padding= ' same ') RELU2 = Tf.nn.relu (tf . Nn.bias_add (Conv2, conv2_biases)) with Tf.name_scope ("Layer4-pool2"): Pool2 = Tf.nn.max_pool (RELU2, ksize=[1 , 2, 2, 1], strides=[1, 2, 2, 1], padding= ' same ') Pool_shape = Pool2.get_shape (). As  _list () nodes = pool_shape[1] * pool_shape[2] * pool_shape[3] reshaped = Tf.reshape (Pool2, [pool_shape[0], Nodes])

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.