Convolution neural Network (convolutional neural Network, CNN) is a feedforward neural network, which is widely used in computer vision and other fields. This article will briefly introduce its principles and analyze the examples provided by the TensorFlow official.

I. Principle of work

Convolution is a basic method in image processing. The convolution kernel is a nxn matrix, usually N-odd, so that the matrix has the concept of a center point and a radius.

The N-order square matrix is taken as the center of each point in the image, and the square matrix is multiplied with the corresponding position in the convolution nucleus, and is used as the corresponding point in the result matrix.

The following diagram shows the computational process of convolution:

The operation of the above processing image to get the new image is called convolution, and the resultant matrix of the convolution is called the feature graph (Feature map). Grayscale graphs use a matrix to indicate that an RGB image requires 3 matrices. In other words, 1 RGB images using a convolution kernel convolution will get 3 feature Map.

If the volume kernel of the elements and 1 of the image brightness unchanged, if less than 1 is darkened, more than 1 will be lightened.

The center of the convolution nucleus cannot be aligned to the pixel of the edge of the original image (with the edge distance less than the volume kernel radius), to compute the point of the edge must fill (padding) The outer missing points so that the center of the convolution nucleus can be aligned to them. Commonly used padding strategies are: Using the value of the center point instead of the missing point, using the mean value of the central Point neighborhood instead of the missing point filling the 0 special convolution kernel can achieve special effects: Sharpening

Extract Edge

Relief

The following four pictures were:

-A: Original artwork

-B: Sharpening

-C: Edge detection

-D: Relief

Second, local perception

It is generally believed that people's visual cognition is from local to global, and the spatial relation of image is close to that of local pixels, while the correlation between pixels is weaker.

Similarly, it is not necessary for each neuron to perceive the global image, but to establish a connection with the local image. In the deeper part of the network, the global information can be understood by further synthesizing the local sense of neurons.

The method of local perception reduces the number of weights needed for training. In practical applications, the resolution of the image and the number of training iterations are limited, and fewer weights usually bring higher accuracy.

Third, the weight value sharing

In convolution neural networks, the same weighted matrix is used for the connection of all convolution layer neurons and the image input layer for the same convolution kernel.

Weight sharing further reduces the number of weights required for training, and the weight of one convolution layer becomes the number of elements in the convolution kernel.

The implied principle of weight sharing is that the statistical properties of a part of an image are the same as those of other parts, and the features learned in one part of the image can be applied to other parts.

From the above description of the special convolution kernel, it can be found that a convolution kernel usually only extracts one feature of the image. and weight sharing makes the number of weights that the connection can train greatly reduced. In order to fully extract the feature, a method using multiple convolution cores is usually used.

Four, Chihua

The image features learned by convolution are still large, and it is inconvenient to classify them directly. The pool layer is used to reduce the number of features.

The pooling operation is very simple, for example, we use a convolution to check a picture to filter to get a 8x8 square, we can divide the phalanx into 16 2x2 square, each small square is called neighborhood.

Using the mean value of 16 small square matrices to form a 4x4 phalanx is a mean pool, similar to the maximum pool of operations. The mean pool is better for preserving the background, and the maximum pool is good for texture extraction.

Random pooling gives the probability (weight) According to the pixel value and then sums it by its weight.

V. Implementation of TensorFlow

TensorFlow's documentation Deep mnist for experts describes how to identify handwritten digits on a mnist dataset using CNN.

The complete code can be found on the GitHub, and this article will make a simple analysis of it. Source code from the tensorflow-1.3.0 version sample.

There are 3 main articles introduced:

Import tempfilefrom tensorflow.examples.tutorials.mnist import Input_dataimport TensorFlow as TF main (_) function for network construction: def m Ain (_): # import Mnist DataSet # Flags.data_dir is the path to local data and can be replaced with an empty string to automatically download the DataSet mnist = Input_data.read_data_sets (Flags.data_dir, One_hot=true) # x is the input layer, each 28x28 image is expanded to 784-order vector x = Tf.placeholder (Tf.float32, [None, 784]) # Y_ is a predefined result of the training set, using the One-hot side The method represents 10 classifications Y_ = Tf.placeholder (Tf.float32, [None,]) # Deepnn Method constructs a CNN, Y_conv is CNN's predictive output # Keep_prob is the dropout layer parameter, below Again y_conv, Keep_prob = DEEPNN (x) # Calculate the cross entropy of Y_conv and label Y_ as loss function with Tf.name_scope (' loss '): Cross_entropy = tf.nn.s Oftmax_cross_entropy_with_logits (Labels=y_, Logits=y_conv) c Ross_entropy = Tf.reduce_mean (cross_entropy) # uses the ADAM optimization algorithm to minimize the loss function as the target with Tf.name_scope (' Adam_optimizer '): Train _step = Tf.train.AdamOptimizer (1e-4). Minimize (cross_entropy) # Calculates accuracy (the proportion of samples that are correctly categorized as a percentage of the number of test samples), used to evaluate model effects with tf.name_scope ( ' Accuracy '): CORRECT_PREdiction = Tf.equal (Tf.argmax (Y_conv, 1), Tf.argmax (Y_, 1)) Correct_prediction = Tf.cast (correct_prediction, TF.FLOAT3 2) accuracy = Tf.reduce_mean (correct_prediction) main function and other tensorflow neural networks are not two, the key analysis Deepnn method to build Cnn:def DEEPNN (x): # x The structure of [N, 784], expanded into [N, 28, 28] # Four-dimensional to represent the characteristics of the image, the current gray scale is therefore 1.
That is, each pixel needs a value to describe # Similarly, the RGB image is 3, the RDBA image is 4 with tf.name_scope (' reshape '): X_image = Tf.reshape (x, [-1, 28, 28, 1]) # The first convolution layer uses the 28x28 grayscale graph to use 32 convolution cores for convolution with tf.name_scope (' Conv1 '): # Initializes the join weights, in order to avoid the gradient vanishing weights using regular distributions to initialize # using the 5x5 size of the convolution kernel, using the 32 convolution cores, extracting 32 features from the original (32 feature-map) W_CONV1 = Weight_variable ([5, 5, 1, 32]) # Initializes the offset value, which uses the 0.1 b_conv 1 = bias_variable ([s]) # CONV2D implementation: tf.nn.conv2d (x, W, strides=[1, 1, 1, 1), padding= ' SAME ') # strides is the step of convolutional kernel movement Amplitude # padding has two values: SAME: The Feature-map is the same length and x_image after convolution; Valid is to ignore the edge pixels, feature-map than x_image Small # H_CONV1 structure is [N, a,,] h_conv1 = Tf.nn.relu (conv2d (X_image, W_CONV1) + B_CONV1) # The first pool layer, the 2x2 phalanx will be the largestValue pool to a feature, pool to 14x14 matrix with Tf.name_scope (' Pool1 '): H_pool1 = max_pool_2x2 (h_conv1) # Second convolution layer, extract the first convolution layer 32 features using 64 convolution cores Extracts 64 features with Tf.name_scope (' Conv2 '): # The convolution nucleus here is 3 D, there are 32 5*5 two-dimensional convolution cores, each two-dimensional convolution nucleus is convolution with a 14x14feature-map # will these 32 14x14 result moments The matrix adds up to get a new Feature-map # 64 three-dimensional convolution nuclei get 64 new Feature-map w_conv2 = Weight_variable ([5, 5,,,]) B_conv2 = Bias_v Ariable ([]) # H_CONV2 's structure is [N, A, O] h_conv2 = Tf.nn.relu (conv2d (h_pool1, w_conv2) + b_conv2) # The second pool layer, 2
X2 matrix maximum pool into a feature, pool into 7x7 matrices with tf.name_scope (' Pool2 '): # H_POOL2 structure for [N, 7, 7, $] h_pool2 = max_pool_2x2 (h_conv2)
# The first full connection layer, which maps [7, 7, 64] feature matrices to the full connection layer to 1024 features with Tf.name_scope (' FC1 '): W_FC1 = weight_variable ([7 * 7 * 64, 1024]) B_FC1 = Bias_variable ([1024]) H_pool2_flat = Tf.reshape (H_pool2, [-1, 7*7*64]) H_fc1 = Tf.nn.relu (Tf.matmul (h_ Pool2_flat, W_FC1) + b_fc1) # using the dropout layer to avoid fitting # that is, in one iteration of the training process, randomly selecting a certain proportion of neurons does not participate in this iteration # The probability value of the participating iterations is specified by Keep_prob, Keep_prob =1.0 to use the entire network with TF.NAMe_scope (' Dropout '): Keep_prob = Tf.placeholder (tf.float32) H_fc1_drop = Tf.nn.dropout (H_FC1, Keep_prob) # second full company The layer, mapping 1024 features to 10 features, that is, 10 categories of one-hot encoding # One-hot encoding refers to using ' 100 ' instead of 1, ' 010 ' instead of 2, ' 001 ' instead of 3 ... Encoding mode with Tf.name_scope (' FC2 '): W_FC2 = weight_variable ([1024, ten]) B_FC2 = Bias_variable ([ten]) Y_conv = TF . Matmul (H_fc1_drop, W_FC2) + b_fc2 return Y_conv, Keep_prob please focus on the implementation of the second convolution layer the entire network exposed interface has 3: input layer x[n, 784] output layer y_conv[n, Dropout retention ratio keep_prob[1] Now you can continue to focus on the main method, after completing the network construction main first cache the network structure to the hard disk: Graph_location = Tempfile.mkdtemp () print (' Saving graph to:%s '% graph_location) Train_writer = Tf.summary.FileWriter (graph_location) train_writer.add_graph ( Tf.get_default_graph ()) is followed by initialization of the TF. Session () Training: with TF.
Session () as Sess: # Initialize global variable Sess.run (Tf.global_variables_initializer ()) for I in Range (10000): # 50 samples per training data set, 10,000 times to remove # Batch[0] for feature sets, the structure is [50, 784] that is 50 group 784 order vector # Batch[1] for the label set, the structure is [50, 10] or 50 uses
One-hot coded label Batch = Mnist.train.next_batch (50) # Every 100 iterations are evaluated once for accuracy if I% = = 0:train_accuracy = Accu Racy.eval (feed_dict={x:batch[0], Y_: batch[1], keep_prob:1.0}) print (' Step%d, training ACCU Racy%g '% (i, train_accuracy)) # for training, dropout keep prob is set to 0.5 Train_step.run (feed_dict={x:batch[0), Y_: b ATCH[1], keep_prob:0.5}) # evaluates the final precision, dropout keep prob is set to 1.0 even with all network print (' Test accuracy%g '% accuracy.eval
(feed_dict={x:mnist.test.images, Y_: Mnist.test.labels, keep_prob:1.0})) The startup code handles command-line arguments and options: if __name__ = = ' __main__ ': parser = argparse. Argumentparser () parser.add_argument ('--data_dir ', type=str, default= '/tmp/tensorflow/mnist/in
Put_data ', help= ' Directory for storing input data ') FLAGS, unparsed = Parser.parse_known_args () Tf.app.run (Main=main, argv=[sys.argv[0]] + unparsed)