convolutional Neural Networks (convolution neural network, CNN) have achieved great success in the field of digital image processing, which has sparked a frenzy of deep learning in the field of natural language processing (Natural Language processing, NLP). Since 2015, papers on deep learning in the field of NLP have emerged. Although there must be a lot of arty hydrology, there are many classic application-oriented articles. In 2016, I also published a paper on CNN in the text classification, the purpose of this blog today is to be able to make a clearer statement of the structure of CNN, at the same time to make a simple summary of the current research status, and the future direction of development to make a small expectation. As the author in the depth of the qualifications of the study is still shallow, so if there are errors in the text, please advise.
I. The structure of CNN (take LeNet-5 as an example)
The purpose of my writing this section is not to make a detailed description of CNN from beginning to end, and if you are not sure about the structure of CNN, I suggest I go to see LeCun's thesis gradient-based learning applied to document Recognition, moreover, there are many classic blogs on the internet, the structure and principles of CNN have done a more in-depth exposition, here is recommended Zouxy blog of the Great God. Here, the structure of the re-elaboration, mainly for some of the students may encounter some of the problems to be highlighted and discussed, and mainly around the following several issues unfold (a look at the great God Please bypass):
- What is the mathematical implementation of convolution in CNN? Is the convolution of the convolution and digital signal processing here the same?
- Which layers in CNN need to be activated?
- What is the biggest difference between C1 and C3 in LeNet-5?
- How does CNN do the training?
We first look at the structure of the LeNet-5 (Lenet a total of 3 C-Layer 2 s-layer, it is called LeNet-5, if the F6 and output layer is counted, it can be called LeNet-7):
First, let's go through some of the concepts in CNN. In particular, it is important to note that the Filter window(the size of the convolution kernel, which is generally square in Digital image processing),Feature map(feature map, in general, there are several Feature for each filter window) Map to capture different features). We can see that the feature map of the C1 layer is the 6,C3 layer of feature map is the 16,C5 layer of feature map is 120, the last F6 equivalent to the normal neural network of the hidden layer, through the full connection and C5 connected, finally through the Gaussian Connection the problem of converting it to a 10 classification.
For question 1th, what is convolution . Convolutional this thing is often mentioned in digital signal processing, and its mathematical expression is as follows:
See the formula is generally a headache, so here to put out a digital signal processing in the image of the convolution metaphor:
For example, your boss ordered you to work, you went downstairs to play billiards, and later by the boss found that he was very angry, fan you a slap (note, this is the input signal, pulse), so your face will gradually (cheap) puffed up a bag, your face is a system, and the drum up the bag is your face to spank the response, OK, so it's connected to the meaning of the signaling system. Here are some assumptions to ensure the rigor of the argument: assuming your face is linear and unchanging, that is, whenever the boss slaps you in the same position on your face (which seems to require your face to be smooth enough, if you say you have a lot of acne, even the entire skin everywhere is not guide, it is too much difficulty, I have nothing to say. haha), your face will always muster up a package of the same height at the same time interval, and assume the size of the bulging package as the system output. Well, then, the following can go into the core content-convolution!
If you go to the underground every day to play billiards, then the boss will slap you every day, but when the boss slapped you, you 5 minutes on the swelling, so long, you even adapt to this life ... If one day, the boss is unbearable, Start your process with a 0.5-second interval of uninterrupted fan this is the problem, the first fan you drum up the package has not swelling, the second slap on the face of the package may be up to twice times high, the boss constantly fan you, pulse constantly in your face, the effect is constantly superimposed, so that these effects can be summed, the result is the height of the bag on your face over time A function of the change (to be understood); If the boss is a bit more aggressive, the frequency is getting higher, so that you can not tell the time interval, then the sum becomes integral. Can you understand, at some fixed moment in the process, how much the pack on your face is blowing up and what is it about? And before each hit you are related! But the contribution is not the same, the earlier The slap, the smaller the contribution, so that is to say, the output of a moment is a few times before the input multiplied by the respective attenuation coefficient after the superposition to form a certain point of output, and then put the output points of different moments together to form a function, this is convolution, The function after convolution is a function of the size of the package on your face that changes over time. Originally your bag can be swelling in a few minutes, but if the continuous dozen, a few hours can not eliminate the swelling, this is not a smooth process? Reflected to the Cambridge University formula, F (a) is the first slap, G (x-a) is the first slap in the X-moment of the role of the degree, multiply it and then stack it OK, people say that is not the reason? I think this example is already very image, you have a more specific and profound understanding of the convolution? (Transferred from Gsdzone forum)
In fact, in the digital signal processing, convolution is the signal B and signal a stagger the internal product of time, stagger the length of time is the self-variable convolution results . However, the role of convolutional operations in CNN is to highlight features and extract more salient features . So are the two convolution the same? In fact, in CNN (especially in the process of natural language processing), the convolution operation is represented by a formula:
$_i = f (\sum \omega \cdot x + bias) $
Convolutional Neural network CNN in natural language processing applications