Paper notes visualizing and understanding convolutional Networks

Last Update:2017-06-22 Source: Internet

Author: User

Tags switches

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Before, I knew I could visualize CNN, and just know that there was a thing going on. It is not clear as to how it is done, what its principles are, what the guiding meaning is. Frankly speaking, I know that there is "CNN visualization", just stay on the "know" level! But when you need to use, understand other CNN visualization technology, just know to paper this piece of reading.

Background

1) in many classification tasks (such as handwriting recognition, face recognition, and very challenging imagenet classification), CNN has achieved excellent performance. But how did CNN do it? What is the working mechanism of her internal work? And how to further improve its performance? A quote from the paper is "without clear understanding of what and why they work, the development of better models are reduced to Trial-and -error ".

2) In an image, does its different parts have the same effect on the accuracy of the classification?

3) Does the different layers of CNN have the same generalization capabilities?

In this paper, the author presents a visual technique for analyzing CNN's "Why and how they work" and answering question 2 through this technique, as well as a clear answer to question 3)! The key points of this visualization technique can be easily understood as follows:

Assuming B=f (a), a can be understood as an input image, B understood as feature map, F understood as CNN (for simplicity, it is considered that A and B are two-dimensional matrices, that is, the input is a grayscale image, the output is a feature map with only one channel, Please note that this is just for the sake of stating the problem). As a general rule, given input A, we get feature map B. Now we want to look at the "contribution" to each element in the bij,a of an element in B. We can directly make all B elements except bij 0, and then bij the inverse to the input space, to get a ' (if this is not good to understand, it can be considered to be reconstructed by bij a '). This a ' reflects the "contribution" of each pixel in a to bij, or in other words, bij is activated by the "a" pattern in a.

Main Points

1) First illustrate the visualization techniques proposed by the authors

The left side of Fig 1 is the visual network proposed by the author to map a activation on feature maps to the input space, corresponding to the background introduction of a '. The right side of the traditional CNN a small improvement, that is, when doing max pooling with "switches" record the position of the maximum value. We focus on the network structure on the left:

How to "counter-operate" the pool, that is, the unpooling in. In fact, the practice is very simple, when we carry on Max pooling, we through "switches" record is which position obtains the maximum value, in unpooling time, according to this "switches" Put the results of the pooling back to the original position (the remaining direct padding 0 is OK). Fig 1 Bottom is a good indication of the above process.
If we get the weight of the convolution kernel, we can do deconvolution. The weight of this convolution core can be obtained directly from the right side of the traditional CNN. Therefore, the visualization of the network is not required to train.
The left side is added Relu, this paper explains the "to obtain valid feature reconstructions at each layer (which also should is positive), we P The reconstructed signal through a relu non-linearity "

In CNN, each layer of feature maps has a number of channel, what we do is to choose a feature map from the feature map, select a activation, Then all feature map activations except the activation are 0, and after that, a ' is reconstructed with the above selection. Since each activation in the input image receptive field is limited and determined, the author also cuts out the corresponding image patch for that activation (the author chooses activation, for a given Feature map, we show the top 9 activations)

The figure can draw the conclusion

2) with the above visual analysis tools, we can track the feature of evolution in real-time during the training process. This part please refer to the original paper, I will not repeat it.

3) with the above visual analysis tools, we can also mask the patches that cause the most response to see how different patches affect the results of the classification. The conclusion is that the model istruly identifying, the location of the object inthe the image.

4 with this visual analysis tool, we can also mask the patch that caused the most response, and see if the maximum response is gone, to verify that the specific pattern activates the corresponding activation.

5) Of course, we can also use the above visual analysis tools, analysis of the shortcomings of the Alexnet model, and propose ways to improve, this paper also do such a thing.

6) In addition, the author has done experiments, by removing some layers of alexnet or changing the width of a certain layer, to see the effect of different layers and width on the classification accuracy, the conclusion is: the depth of the model is an important factor affecting performance, and increasing the width of the model can improve the performance of the network.

7) The authors also verified that small datasets are less suitable for training large networks such as alexnet.

8) The network's high-level feature generalization ability, but its poor performance in the Pascal data set, the reason may be: there is Data-bias (Imagenet and Pascal in the image of a large difference)

Summary

1) A simple visualization technology that allows us to look at "How and why CNN works" from every angle. "Small improvement, Big wisdom", perhaps this should become the guiding ideology of scientific research!

2) The author analyzes the influence of network depth, width and data set size on network performance, and also analyzes the generalization ability of the network output features and the problems in the generalization process (if another data set differs greatly from the imagenet, the reason why the generalization ability is relatively weak).

Paper notes visualizing and understanding convolutional Networks

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More