Visualization of convolution neural networks using deconvolution (deconvnet)

Source: Internet
Author: User

visual understanding of convolution neural networks

Original address : http://blog.csdn.net/hjimce/article/details/50544370

Author : HJIMCE

I. Related theories

This blog post focuses on the 2014 ECCV of a classic literature: "Visualizing and understanding convolutional Networks", can be described as a visual understanding of the CNN field of the Mountain, This document tells us what the characteristics of each layer of CNN have learned, and then the author adjusts the network through visualization to improve the accuracy. The recent two years of deep convolution neural network, the progress is very amazing, in computer vision, recognition accuracy of breakthroughs, cvpr on CNN on a lot of literature. However, many scholars do not understand, why through some kind of modulation, change network structure, etc., precision will improve. Maybe one day, when we were doing a project on CNN, you tuned some parameters and the results were soaring, but if someone asked you why the accuracy of the parameter was going to soar, what kind of features did you design for CNN to learn? PS: Before the leadership has been despised me, I want to explain what each layer of CNN to learn what characteristics, the answer is not up, be despised a little, finally to learn this article.

The purpose of this document is to show how you can improve your accuracy by visualizing the features, and you're really pretty cool at designing the features that CNN learns. This document is a classic must-read, only published more than a year, the number of citations has reached hundreds of, learning this document for our future in-depth understanding of CNN, has a very important significance. In short, this article, the cool coax.

second, the use of deconvolution to achieve feature visualization

To explain why convolution neural networks are work, we need to explain what each layer of CNN has learned. In order to understand each layer in the middle of the network and extract the feature, the paper is visualized through Deconvolution method. Deconvolution network can be regarded as the inverse process of convolution network. The Deconvolution network is proposed in the document "Adaptive deconvolutional Networks for Mid" and "feature learning" for unsupervised learning. However, the deconvolution process does not have the ability to learn, just for visualization of a already trained convolution network model, there is no learning training process.

Deconvolution visualization takes the feature map obtained from each layer as input, carries out deconvolution and obtains deconvolution results to verify the feature map extracted from each layer. For example: If you want to see what Alexnet's conv5 extracts, we use the CONV5 feature map followed by a deconvolution network, and then through: reverse-pooling, reverse activation, deconvolution, such a process, the original 13*13 size of the feature map (conv5 size is 13 *13), zoom back, and finally get a picture of the same size as the original input picture (227*227).

1. The process of anti-pooling

We know that pooling is an irreversible process, yet we can record the maximum activation worth of coordinates by recording the process of pooling. And then, at the time of the reverse-pooling, only activate the value of the coordinates of the location where the maximum activation value is in the pool, and the other values are set to 0, of course, because we are in the process of pooling, except for the location where the maximum value is, other values are not 0. Just a few days to see the literature: "Stacked What-where auto-encoders", which has a reverse convolution sketch map of the better, all cut down the figure, using the schematic of this document to explain:


Take the picture above as an example, the left side of the picture above indicates the pooling process, and the right side represents the unpooling process. Assuming we pooling block size is 3*3, using max pooling, we can get an output neuron whose activation value is 9,pooling is a lower sampling process, is the 3*3 size, after pooling, it becomes the 1*1 size of the picture. And upooling just in contrast to the pooling process, which is a process of sampling, is a reverse operation of pooling, when we from a neuron to extend to 3*3 neurons, we need to use the pooling process, record the maximum value of the location coordinates (0,1 And then, in the unpooling process, fill in the position of the pixel (0,1), and the other neuron activation values are all 0. One more example:

When Max pooling, we not only have to get the maximum value, but also record the maximum worth of coordinates ( -1,-1), and then unpooling, directly to (-1-1) The value of this point, the other activation value of all 0.

2. Anti-activation

In our alexnet, the Relu function is used to ensure that the activation value of each layer output is positive, so for the reverse process, we also need to ensure that the feature map for each layer is positive, that is, the reverse activation process and the activation process is not different, are directly using the Relu function.

3. Anti-convolution

For the deconvolution process, the convolution process after the transfer of the filter (like the parameters, but only the horizontal and vertical direction of the parameter matrix), I am not very clear, it is estimated to use the relevant mathematical theory to prove.

The final visualization of the network structure is as follows:

The whole process of the network, starting from the right: input picture-"convolution-" relu-"maximum pooling-" Get the result feature map-"back-pooling-" relu-"deconvolution." To this side, we can say that our algorithm has been completed, the other part of the literature to explain the understanding of the CNN section, can learn not to learn.

In general, the algorithm mainly has two key points: 1, Reverse Pool 2, deconvolution, the two source of implementation methods, need to understand.

third, understanding of visualization

Feature visualization: Once our network training is complete, we can visualize and see what we have learned. But what to think. How to understand, is another thing. We use the Deconvolution network above to view the feature map of each layer.

1, Feature visualization results :


On the whole, after learning from CNN, we learned the characteristics of is a discriminating feature, such as to distinguish between the face and dog head, then through CNN Learning, the background of the activation of the basic little, we can see through the visualization of the features we extracted ignore the background, but the key information to extract. From layer 1, Layer 2 learning features are basically color, edge and other low-level features, Layer 3 began to become slightly more complex, learning is the texture characteristics, such as the above some of the grid texture; Layer 4 learned is to compare distinctive characteristics, such as dog head; layer 5 The learning is complete, with distinguishing key characteristics.

2, characteristics of the learning process . The author shows us how the characteristics of each layer are changed in the course of network training, each picture above is a network of a feature map, and then each row has 8 small pictures, respectively, indicating the network epochs number of times: 1, 2, 5, 10, 20, 30, 40, 64 feature map:


Results: (1) Look at each layer carefully, in the process of iterative changes, there have been sudden jumps; (2) from the layer and layer comparison between, we can see that the low-level in the training process of basic no changes, more easily convergent, the characteristics of high-level learning is very different. This explains that the low-level network from the beginning of training, basically not much change, because the gradient dispersion. (3) From the High level network CONV5 change process, we can see that the first few iterations, the basic changes are not very large, but to the 40~50 iteration, the change is very large, so we later in training network, do not worry about the results, see the results need to ensure network convergence.

3, Image transformation . From the literature of the picture 5 visualization results, we can see for a zoom, translation and other operations of the picture: the first layer of the network impact is relatively large, to the back of several layers, basically these transformations extracted to the characteristics of no more significant changes.

Personal Summary: I personally feel learning this document algorithm, not in the visualization, but in learning deconvolution network, if you understand the Deconvolution network, then in the future literature, you will often encounter this algorithm. Most of the CNN structure, if the output of the network is a whole picture, then we need to use deconvolution network, such as image semantic segmentation, image blur, visualization, unsupervised learning, image depth estimation, such as the output of this network is a whole picture of the task, many have related literature, and are using the Deconvolution network, achieved a cool coax results. So I think I learn this document, the greater significance is to learn to Deconvolution network.

Reference Documents:

1. "Visualizing and understanding convolutional Networks"

2, "adaptive deconvolutional Networks for mid" and high level feature learning

3, "Stacked What-where auto-encoders"


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.