convolutional Neural Network Primer (1)
Original address : http://blog.csdn.net/hjimce/article/details/47323463
Author : HJIMCE
convolutional Neural Network algorithm is an n-year-old algorithm, only in recent years because of deep learning related algorithms for the training of multi-layered networks to provide a new method, and now the computing power of the computer is not the same level of computing, and now the training data a lot, so the neural network related algorithms and re-fire up, So convolutional neural networks are alive again.
Before we start, we need to make it clear that the related tutorials on the online convolutional neural network generally refer to the forward conduction process of neural networks, which are trained by gradient descent method, and most deep learning libraries have the function of reverse derivation to encapsulate well, if you want to learn the reverse derivative, you need to learn slowly.
Because the convolution neural network classic model is: LENET-5 implementation, as long as the understanding of the forward conduction process, basically OK, so we mainly explain the implementation of Lenet-5 behind.
first, the theoretical stage
As an introductory article on CNN, there is no intention to nag too much, because what weights share, local feel wild, talk so much, are the related theories of biology, look at those things, most beginners have been bored. convolutional neural Network related blog post is also a lot of, but speaking, basically is copied over, just like I did not understand from the S2 layer to the C3 layer is how to achieve, online read a lot of tutorials, no answer this question. My personal feeling the whole process, only S2 to C3 is the most difficult to understand. Then I'll explain it in the most understandable way.
1. Convolution
The concept of convolution I think as long as the image processing people understand the concept, this does not explain. We know that for a given image, given a convolution kernel, convolution is a weighted sum of pixels based on the convolution window.
convolutional neural Networks and the convolution of the images we have learned before, I understand that: we have previously learned the convolution of image processing, in general, the convolution kernel is known, such as various edge detection operators, Gaussian blur, and so on, are already known convolution kernel, and then the image of convolution operations. However, convolutional nuclei in deep learning are unknown, and we train a neural network to train these convolution cores, which are equivalent to those parameters W when we learn a single-layer perceptron, so you can think of these convolution cores as a training parameter w for neural networks.
2, the pool of
When I first started to study CNN, I saw the word as if it were tall, so I looked up a lot of information and theory, but the practice and algorithm did not talk about it, nor did I know how to achieve it. In fact, the so-called pooling, is the picture under the sampling. At this point, you will find that the construction of each layer of CNN is somewhat similar to the construction of the Gaussian pyramid of images, so if you already know the algorithm of image Pyramid Fusion, then it becomes easy to understand. In the construction of the Gaussian gold tower, each layer is convolutional and then sampled after convolution, and CNN is the same process. Don't say much nonsense, here's a talk about the pooling of CNN:
CNN's pooling (Image down-sampling) method is numerous: Mean pooling (mean sampling), max pooling (maximum sampling), overlapping (overlapping sampling), L2 pooling (mean square sampling), Local contrast Normalization (normalized sampling), stochasticpooling (immediately sampled), def-pooling (deformation constrained sampling). The most classic is the maximum pooling, so I'll explain the implementation of the maximum pooling:
Original picture
For the sake of simplicity, I use the above image as an example, assuming that the image size above is 4*4, as shown in the image above, and then the value of each pixel in the image is the number in each of the above squares. Then I want to 4*4 the image of the pool, the size of the pool (2,2), step to 2, then the largest pool is the image of the above 4*4 is divided, the size of each block is 2*2, and then the maximum value of each block, as the next sample image of the pixel value, the following figure is calculated as follows:
That is to say, we finally get the following image after sampling:
This is called the maximum pooling. Of course, you will also encounter a variety of pooling methods, such as mean pooling, that is, the average value of each block as the next sampling of new pixel values. There is the pooling of overlapping samples, I above this example is no overlapping of the sample, that is, each block is not overlapping between the parts, above I said the step is 2, is to make the score blocks are non-overlapping, and so on, these later to explain to you the common method of pooling. Here is the first to remember the maximum pool, because this is currently the most commonly used.
3. Feature Maps
The word people translate it into a feature map, to make up a very professional noun. So what do you mean by feature maps? In fact, the feature map is actually a CNN in every picture, can be called feature map. In CNN, the convolution cores that we want to train are not just one, and these convolution cores are used to extract features, the more convolution cores are, the more features they extract, the more precise the theory is, and the more the convolution cores mean the more parameters we have to train. In the LENET-5 classic structure, the first layer of convolution core selection of 6, and in Alexnet, the first layer of convolution core selected 96, the specific number of suitable, still need to learn.
Back to the concept of feature map, each of CNN's convolution layer we have to artificially select the appropriate number of convolution cores, and convolution kernel size. Each convolution core and the image convolution, you can get a feature map, such as LENET-5 classic structure, the first convolution core selected 6, we can get 6 feature maps, these features are the next layer of network input. We can also consider the input image as a feature map as input to the first layer of the network.
4. The classic structure of CNN
For people just getting started with CNN, we first need some of the classic structures now:
(1) LeNet-5. This is N. A CNN classic structure, mainly used for handwritten font recognition, is also a network that has just started to learn, and this blog post is mainly about this network
(2) AlexNet.
Image classification on Imagenet challenge the Alexnet network structure model that the great God Alex proposed won 2012 titles, inspiring, using CNN to achieve a picture classification, others use the traditional machine learning algorithm to jump to half-death, and that's it, Alex uses the CNN precision far beyond the traditional network.
What else is "network in Network", Googlenet, Deconvolution Network, in the future study we will encounter. For example, the use of deconvolution network deconvolution networks to achieve the blur of the picture, a trick to coax.
OK, the theoretical stage of the wordy here is good, and then to explain LeNet-5, LeNet-5 is used for handwriting recognition of a classic CNN:
LENET-5 structure
input:32*32 handwritten font images, these handwritten fonts contain 0~9 numbers, which is equivalent to 10 categories of pictures
output: classification result, a number between 0~9
So we can see that this is a multi-classification problem with a total of 10 classes, so the final output layer of the neural network is necessarily the Softmax problem, and then the number of neurons is 10. LENET-5 structure:
Input layer: A picture of 32*32, which is equivalent to 1024 neurons
C1 Layer:paper author, choose 6 features convolution core, then convolution kernel size selection 5*5, so we can get 6 feature map, then each feature map size is 32-5+1=28, that is, the number of neurons is 6*28*28=784.
S2 Layer: This is the next sampling layer, that is, using the maximum pooling for the next sample, pool size, select (2,2), which is equivalent to the C1 layer 28*28 the picture, to block, each block size is 2*2, so we can get 14*14 blocks, Then we count the maximum value of each block as the new pixel under the sample, so we can get the S1 result: 14*14 size picture, a total of 6 such pictures.
C3 layer : Convolution layer, this layer we choose the size of the convolution core is still 5*5, so we can get a new image size of 14-5+1=10, and then we want to get 16 feature map. So here's the problem. This layer is the hardest to understand, and we know that S2 contains: 6 pictures of 14*14 size, we hope that the result of this layer is: 16 photos of 10*10. Each of the 16 images is a weighted combination of the 6 images of the S2, and how they are combined. The problem is shown in the following figure:
To explain this, let's start with a simple beginning, I now assume that the size of the input 6 feature graph is 5*5, with 6 5*5 convolution cores for convolution, to obtain 6 convolution result picture size of 1*1, as shown in the following figure:
For the sake of brevity, I will first make some definition of the label: we assume that the individual pixel values of the input I feature graph are x1i,x2i......x25i, because each feature graph has 25 pixels. Therefore, after the 5*5 of the image convolution of the first feature map, the resulting convolution image of the pixel value Pi can be expressed as:
This is the convolution formula, not explained. So for the above P1~P6 calculation method, this is directly according to the formula. Then we add the P1~P6 together, namely:
p=p1+p2+ ... P6
Put the formula of the above pi into the above equation, then we can get:
P=wx
where x is the individual pixel values of the 6 5*5 feature images that are entered, and W is the parameter we need to learn, which is equivalent to 6 5*5 convolution cores, and of course it contains 6* (5*5) parameters. So our output feature map is:
Out=f (P+B)
This is the calculation method from S2 to C3, where B is the offset term and f is the activation function.
We return to the original question: There are 6 features of the input 14*14, we want to use the 5*5 convolution kernel, and finally we want to get a 10*10 output feature image.
According to the above process, that is, in fact, we use 5*5 convolution core to the convolution of each input feature map, of course, the convolution kernel parameters of each feature graph is not the same, that is, do not share, so we are equivalent to need 6* (5*5) parameters. After each input feature graph convolution, we get 6 10*10, new picture, this time, we add these 6 pictures together, then add a biased item B, and then use the activation function to map, you can get a 10*10 output feature map.
And we want to get 16 10*10 of the output feature map, so we need the convolution parameter number is 16* (6* (5*5)) =16*6* (5*5) parameters. In summary, each picture of the C3 layer is S2 by a picture and then added together, with bias B, and finally the result of the activation function mapping.
S4 layer: under the sampling layer, relatively simple, is also a confidant of the C3 of the 16 pictures of the maximum pool of 10*10, the size of the pool block is 2*2. So the last S4 layer is a picture of 16 sizes of 5*5. At this point our number of neurons has been reduced to: 16*5*5=400.
C5 Layer: We continue to convolution with the 5*5 convolution, and then we want to get 120 feature maps. So the size of the C5 layer image is 5-5+1=1, which is equivalent to 1 neurons, 120 feature maps, so there are only 120 neurons left. At this time, the number of neurons is already small enough, we can directly use the full-connected neural network, the subsequent processing of these 120 neurons, the specific how to do, as long as the understanding of multilayer perceptron understand, do not explain.
The above structure, is only a reference, in the actual use, each layer of feature map need how many, convolutional core size selection, as well as the pool of time to how much sampling rate, and so these are changes, this is called the CNN tuning, we need to learn flexible.
For example, we can change the above structure to: C1 layer convolution core size 7*7, and then the C3 layer convolution kernel size to 3*3, and then the number of features is also their own choice, perhaps the accuracy of handwriting recognition is higher than the above that, it is possible, in short: need to learn flexible and changeable, Need to learn about the assistant of CNN.
second, the actual combat stage
Learn CNN's source code implementation site: Http://deeplearning.net/tutorial/lenet.html#lenet
1. Training Data Acquisition
In the Theano Learning Library, there are handwritten fonts in the library, can be downloaded from the Internet, named: mnist.pkl.gz handwriting Font library, contains three parts of the data, training data set train_set:50000 training samples, validation set Valid_set, We can read the data with the following code and then use plot to display one of the images:
<span style= "FONT-SIZE:18PX;" >import cpickle
Import gzip
import NumPy as NP
import Matplotlib.pyplot as plt
f = gzip.open (' Mnist.pkl.gz ', ' RB ')
Train_set, valid_set, test_set = Cpickle.load (f) f.close
() tx,ty=train_set
;
matrix of the two-dimensional matrix
print np.shape (ty) #可以看到ty大小为 (50000,1) of the #查看训练样本 print Np.shape (TX) #可以看到tx大小为 (50000,28*28)
# Picture shows
A=tx[8].reshape (28,28) #第八个训练样本
y=ty[8]
print Y
plt.imshow (a,cmap= ' Gray ') #显示手写字体图片 </ Span>
In the above code, I show the 8th picture, you can see the following results:
The eighth sample is the number 1.
2, LeNet-5 realization
First you need to know mnist.pkl.gz this library to our image size is 28*28, so we can first choose 5*5 convolution kernel for convolution to get 24*24, and we want to C1 layer to get 20 features, and so on, the specific code implementation is as follows;
Import OS import sys import timeit import numpy import Theano import theano.tensor as T from theano.tensor.signal Import Downsample from theano.tensor.nnet import conv from logistic_sgd import logisticregression, load_data from MLP import Hi Ddenlayer #卷积神经网络的一层, including: convolution + down sampling two steps #算法的过程是: Convolution-"under sampling-" Activation function class Lenetconvpoollayer (object): #image_shape是输入数据的相关参数设置
Filter_shape the relevant parameters of this layer set Def __init__ (self, rng, input, Filter_shape, Image_shape, poolsize= (2, 2)): "" " : Type rng:numpy.random.RandomState:p aram rng:a random number generator used to initialize weights 3,
Input: Enter the feature graph data, that is, the N feature picture 4, Parameter filter_shape: (Number of filters, num input feature maps, Filter height, filter width) num of filters: is the number of convolution cores, and how many convolution cores, then the number of out feature maps of this layer will also generate how many.
Num Input Feature Maps: Enter the number of feature graphs.
Then filter height, filter width is the width of the convolution core, such as 5*5,9*9 ... filter_shape is a list, so we can get the number of convolution cores with filter_shape[0]5. Parameter image_shape: (batch size, num input feature maps, image height, image width), b
Atch Size: Number of batch training samples, num input feature maps: number of input feature graphs image height, image width is the size of the input feature map image, respectively. Image_shape is a list type, so you can directly use the index to access the above 4 parameters, index subscript from 0~3. For example image_shape[2]=image_heigth image_shape[3]=num input feature maps 6, parameter poolsize: The size of the block sampled under pooling, generally (2,2) " "Assert image_shape[1] = = Filter_shape[1] #判断输入特征图的个数是否一致 If the inconsistency is wrong self.input = input # fan_in =num input feature maps *filter height*filter width #numpy. The prod (x) function calculates the product of each element of x #也就是说fan_in就相当于每个即将输出的fea Ture map requires the number of link parameter weights fan_in = Numpy.prod (filter_shape[1:]) # Fan_out=num output Feature maps * Filter HEI Ght * Filter Width fan_out = (filter_shape[0] * Numpy.prod (filter_shape[2:])/Numpy.prod (pools ize) # Initializes the parameters to the number between [-a,a], where A=SQRT (6./(fan_in + fan_out)), and the parameters are uniformly sampled #权值需要多A little. Number of convolution cores * Number of input feature graphs * convolution core width * convolution core height. This does not contain the number of link weights for the sampling layer W_bound = numpy.sqrt (6./(fan_in + fan_out)) self.
W = theano.shared (Numpy.asarray (Rng.uniform (Low=-w_bound, High=w_bound, Size=filter_shape), DTYPE=THEANO.CONFIG.FLOATX), borrow=true) # B is biased and is a one-dimensional vector. Each output feature diagram I corresponds to a bias parameter b[i] #, so the number of initialization B below is the number of feature graphs filter_shape[0] b_values = Numpy.zeros ((filter_shape[0],), Dtype =THEANO.CONFIG.FLOATX) self.b = theano.shared (Value=b_values, borrow=true) # convolution operation, the first parameter of the function conv.conv2d is the input
, the second parameter is the #第三个参数为卷积核的相关属性 of the convolution kernel parameter of random accident, the correlation attribute of input feature graph conv_out = conv.conv2d (Input=input, Filters=self. W, Filter_shape=filter_shape, Image_shape=image_shape) # pooled operation, maximum pooling Poole
D_out = downsample.max_pool_2d (Input=conv_out, Ds=poolsize, ignore_border=true ) #激励函数, that is to say firstAfter the convolution kernel is re-pooled, then the non-linear mapping # Add the bias term. Since The bias is a vector (1D array), we first # reshape it to a tensor of shape (1, N_filters, 1, 1). Each bias'll # thus be broadcasted across mini-batches and feature map # width & height Self . Output = T.tanh (Pooled_out + self.b.dimshuffle (' x ', 0, ' x ', ' X ')) # save Parameter Self.params = [self. W, self.b] self.input = input #测试函数 def evaluate_lenet5 (learning_rate=0.1, n_epochs=200, data
Set= ' mnist.pkl.gz ', nkerns=[20, [], batch_size=500): "" "Demonstrates lenet on Mnist dataset : learning_rate: Gradient descent Learning rate: N_epochs: Maximum Iteration count: Type dataset:string:p Aram Dataset:path to the dataset used For training/testing (MNIST here): Nkerns: The number of convolution cores per convolutional layer, the number of first convolution cores is nkerns[0]=20, and the second layer of convolutional nuclei is 50 "" "RNG = Numpy.random.RandomState (23455) datasets = Load_data (DataSet) #加载训练数据, the training data contains three parts train_set_x, train_set_y =Datasets[0] #训练数据 valid_set_x, valid_set_y = datasets[1] #验证数据 test_set_x, test_set_y = datasets[2] #测试数据 # calculation Batch Training can be divided into how many batches of data to train, this as long as the people who know the bulk training know n_train_batches = Train_set_x.get_value (borrow=true). shape[0] #训练数据个数 N_valid_batch Es = Valid_set_x.get_value (borrow=true). shape[0] N_test_batches = Test_set_x.get_value (borrow=true). Shape[0] N_tra In_batches/= batch_size# Batch number n_valid_batches/= batch_size n_test_batches/= batch_size # allocate symbolic VA Riables for the data index = t.lscalar () # Index to a [Mini]batch # start-snippet-1 x = T.matrix (' x ') # th
E data is presented as rasterized images y = t.ivector (' y ') # The labels is presented as 1D vector of # [INT] Labels # reshape matrix of rasterized images of shape (batch_size,) # to a 4D tensor,
Compatible with our Lenetconvpoollayer # (+) is the size of MNIST images. Layer0_input = X.reshape ((batch_size, 1, 28, 28))Build the first network: image_shape: Input size 28*28 feature map, batch_size training data, each training data has 1 feature map Filter_shape: The number of convolutional nuclei is nkernes[0]= 20, so each training sample in this layer is about to generate 20 feature graphs after convolution operation, the image size becomes (28-5+1, 28-5+1) = (24, 24) after pooling operation, the image size becomes (24/2, 24/2) = (12, 12) the last generated
Layer Image_shape for (batch_size, Nkerns[0], A, a) ' Layer0 = Lenetconvpoollayer (rng, Input=layer0_input,
Image_shape= (Batch_size, 1, 2), filter_shape= (Nkerns[0], 1, 5, 5), poolsize= (2,))
"' Build a second-tier network: input batch_size training pictures, after the first layer of convolution, each training picture has nkernes[0] a feature map, each feature map size of 12*12 after convolution, the image size changed (12-5+1, 12-5+1) = (8, 8) After pooling, the image size becomes (8/2, 8/2) = (4, 4) The last generated image_shape of this layer is (batch_size, Nkerns[1], 4, 4) "Layer1 = Lenetconvpoolla Yer (rng, Input=layer0.output, image_shape= (Batch_size, Nkerns[0], (+), filter_shape= ( NKERNS[1], Nkerns[0], 5, 5), poolsize= (2, 2)) # The Hiddenlayer being fully-connected, it operates on 2D Matrices of # shape (batch_size, Num_pixels) (i.e matrix of rasterized images). # This would generate a matrix of shape (batch_size, nkerns[1] * 4 * 4), # or ($ 4 * 4) = ($, +) with the D
Efault values. Layer2_input = Layer1.output.flatten (2) "Full link: The input layer2_input is a two-dimensional matrix, the first dimension represents the sample, and the second dimension represents the neurons obtained by each sample after the convolution, i.e., each sample , the Hiddenlayer class is a single-layer network structure under which LAYER2 maps the number of neurons from 800 compression to 500 "Layer2 = Hiddenlayer (RNG, Input=layer2 _input, N_in=nkerns[1] * 4 * 4, n_out=500, Activation=t.tanh) # Last layer: Classification of the logical regression layer, 500 neurons, Compression mapped into 10 neurons, corresponding to the 0~9 Layer3 = Logisticregression (Input=layer2.output, n_in=500, n_out=10) of the handwriting font # The cost we mini Mize during training is the NLL of the model cost = Layer3.negative_log_likelihood (y) # Create a function to comp
Ute the mistakes that is made by the model Test_model = Theano.function ([index], layer3.errors (y), givens={X:test_set_x[index * batch_size: (index + 1) * Batch_Size], Y:test_set_y[index * batch_size: (index + 1) * Batch_size]}) Validate_model = Thean O.function ([index], layer3.errors (y), givens={X:valid_set_x[index * batch_size: (in Dex + 1) * Batch_size], Y:valid_set_y[index * batch_size: (index + 1) * Batch_size]}) #把所有 Parameters in the same list, you can add params = layer3.params + layer2.params + layer1.params + layer0.params #梯度求导 grads = T directly using the list. Grad (cost, params) # Train_model are a function that updates the model parameters by # SGD Since this model have MA NY parameters, it would is tedious to # manually create a update rule for each model parameter.
We Thus # Create the updates list by automatically looping through all # (Params[i], grads[i]) pairs.
Updates = [(Param_i, Param_i-learning_rate * grad_i) for param_i, grad_i in zip (params, grads)]
Train_model = Theano.function ([index], Cost, Updates=updates, givens={X:train_set_x[index * batch_size: (index + 1) * Batch_size ], Y:train_set_y[index * batch_size: (index + 1) * Batch_size]}) # End-snippet-1 ##### ########## # TRAIN MODEL # ############### print ' ... training ' # early-stopping parameters patience =
10000 # Look as this many examples regardless patience_increase = 2 # Wait The much longer when a new best is
# found improvement_threshold = 0.995 # A relative improvement of this much is
# considered significant validation_frequency = min (n_train_batches, PATIENCE/2)
# go through this many # Minibatche before checking the network # on the validation set; In this case we # check every epoch best_validation_lOSS = Numpy.inf Best_iter = 0 Test_score = 0. Start_time = Timeit.default_timer () Epoch = 0 done_looping = False while (Epoch < n_epochs) and _looping): Epoch = epoch + 1 for Minibatch_index in Xrange (n_train_batches): #每一批训练数据 Cost_ij = Train_model (minibatch_index) iter = (epoch-1) * n_train_batches + Minibatch_index if (iter + 1) )% Validation_frequency = = 0: # Compute Zero-one loss on validation set Validation_losse s = [Validate_model (i) for I in Xrange (n_valid_batches)] This_validat
Ion_loss = Numpy.mean (validation_losses) print (' Epoch%i, Minibatch%i/%i, validation error%f percent '%
(Epoch, Minibatch_index + 1, n_train_batches, This_validation_loss * 100.)) # If we got the best validation score until now if This_validAtion_loss < best_validation_loss: #improve patience If loss improvement is good enough
If This_validation_loss < Best_validation_loss * \ improvement_threshold: Patience = max (patience, ITER * patience_increase) # Save best validation score and ITER
ation number Best_validation_loss = This_validation_loss Best_iter = iter
# test it on the test set test_losses = [Test_model (i) For I in Xrange (n_test_batches)] Test_score = Numpy.mean (test_loss ES) print ((' Epoch%i, Minibatch%i/%i, test error of ' Best model%f Percent ')% (epoch, Minibatch_index + 1, n_train_batches, Test_score * 100
.)) If Patience <= iter:done_looping = True Break end_time = Timeit.default_timer () prin
T (' optimization complete. ') Print (' Best validation score of%f percent obtained at iteration%i, ' with test performance%f% '% (BES
T_validation_loss *, Best_iter + 1, Test_score * 100.)
Print >> Sys.stderr, (' The Code for file ' + os.path.split (__file__) [1] +
' ran for%.2fm '% ((end_time-start_time)/60.)) if __name__ = = ' __main__ ': Evaluate_lenet5 () def experiment (State, channel): Evaluate_lenet5 (state.learning_rate , Dataset=state.dataset)
Training results:
Reference Documents:
1, http://blog.csdn.net/zouxy09/article/details/8775360/
2, Http://www.deeplearning.net/tutorial/lenet.html#lenet
Author: hjimce Time: 2015.8.6 contact qq:1393852684 Address: http://blog.csdn.net/hjimce Reprint please keep our information ************** ******