0-visualizing and understanding convolutional Networks (read translation)

Last Update:2017-03-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Visual comprehension of convolutional neural networks (visualizing and understanding convolutional Networks)

Summary (abstract)

Recently, the large convolutional neural network model has shown impressive results on the imagenet dataset, but nowadays it is not clear why they have such a good effect and how to improve its effect. In this article, we discuss both of these issues. We introduce an innovative visualization technique that allows you to drill down into the role of the intermediate feature layer function and the behavior of the classifier. As a diagnostic technique, visualization allows us to find a better model architecture than the Krizhevsky (Alexnet model).

On the imagenet categorical data set, we also performed a cobwebs work to discover the effects of different layers on the results. We see that when the Softmax classifier is trained, the results of our model on the Imagenet dataset can be well generalized to other datasets, instantly defeating the best methods of today's caltech-101 and caltech-256.

1. Introduction (Introduction)

Since 1989, when LeCun and others have researched convolutional neural networks (hereinafter called CNN), in the 1990 's, CNN has shown excellent results in a number of image applications, such as handwritten font recognition, face recognition, and so on. In the last year, many papers have indicated that they can achieve better classification results on some more difficult datasets, and Ciresan and others have achieved the best results in the 2012 Norb and CIFAR-10 datasets. More representative of the Krizhevsky and other people 2012 of the paper, in the imagenet 2012 data set classification challenges have achieved an absolute advantage, their error rate is only 16.4%, and the second place is 26.1%.

There are many factors that cause this interesting phenomenon (i) a large number of training data and annotated data; (ii) powerful GPU training (iii) better regularization methods such as dropout (Hinton et al., 2012). Nonetheless, we rarely have a deep understanding of the mechanisms in neural networks and why they can achieve such results. From a scientific point of view, this is far from enough. If the wood has a clear understanding of its nature, then it can only be turned into a whole day like a headless fly as a random trial. In this article, we describe a visualization technique to reveal how input responses get unique signatures at each level.

This also allows us to observe the evolutionary process of features during the training process in order to analyze the potential problems of the model. The visualization technique we use is a multilayer deconvolution network, proposed by Zeiler in 2011, to map features back to the original pixel layers. We have also made a very interesting study, which is to obscure part of the input image to show which part is the most influential of the classification.

Starting with the Krizhevsky model, we explored different models and discovered better models. We also explored the generalization capabilities of models on different datasets, relying only on the softmax layer of retraining. The last supervised pre-training we discuss here is different from unsupervised training for Hinton and Bengio, Vincent and others. In addition, the discussion of feature generalization capability is also mentioned in the paper of Donahue 2013.

1.1. Related jobs (related work)

It is common to get some scientific inspiration by visualizing neural networks, but in most cases people focus only on the first layer, because the first layer is more easily mapped to the pixel layer. In the higher network layer is difficult to deal with, there are some more limited methods, can infer the node's activity---Erhan and other 2009 methods, to find the maximum response stimulus for each node, which is by the image space to do a gradient drop to get the maximum response of each node. This requires careful operation and does not give a description of the node's constant properties.

There is an improved method (Le et al, 2010), based on (Berkes & wiskott,2006), to make some extensions to observe some of the stable properties of a node by calculating the Hessian matrix of a node, the problem is for high-level network nodes, These attribute variables are too complex to be described by the two-time approximation (quadratic approximation) (not understood, presumably meaning that the method is difficult to describe the nature of high-level complex network nodes).

Instead, our approach does not describe the node properties through a number of parameters, but rather to see which part of the image activates the feature (similar to Donahue et al. 2013, the visualization results show that the nodes in the model can be activated by which area. Our visualizations are different, not just a collection of input images, but a top-down mapping to reveal exactly what structure activates the feature.

2. Our approach (approach)

We use the standard convolutional neural network model to experiment (LeCun et al.,1989) and (Krizhevsky et al., 2012). These models enter a 2-D image x I, which computes the probability vector y I of the last output C category through a series of network layers. Each layer is obtained by the (i) convolution operation on the previous layer (the first layer is the original), where all the filters (convolution weights) are learned; (ii) The result is passed a linear correction function (Relu (x) = max (x,0)); (iii) [optional operation] maximum pooling (iv) of the adjacent point [optional operation] normalized feature (not understood) by (local contrast operation). More details refer to the paper (Krizhevsky et al, 2012) and the paper (Jarrett et al.,2009).

The top of the neural network is the fully connected layer, the last layer is the Softmax layer, and Figure 3 shows the most used models in our experiment. We use a large data set of n images {X,y},y is the Category tab, we use the cross-entropy cost function, because this is more suitable for image classification, through y I and y i calculation. Parameters in the network: including convolution weights and full-link weights, offset values, are obtained through gradient reverse propagation training, using a random gradient descent method. The 3rd part details the details.

2.1. Anti-convolution visualization (visualization with a deconvnet)

To understand the convolution operation to understand what is the feature activation of the middle layer first, we introduce an innovative method to map the feature response back to the initial pixel space, to see exactly what input patterns are causing the node response, and we do this through a deconvolution (Zeiler et al., 2011). Deconvolution can also be considered as a neural network model, with the same functions, including convolution filtering, pooling, but both are anti-operation, so

Instead of mapping pixels to features does the oppo-

Site. In (Zeiler et al., Deconvnets were proposed

As a the performing unsupervised learning. Here,

They is not used in any learning capacity, just as a

Probe of an already trained convnet.

To examine a convnet, a deconvnet are attached to each

of layers, as illustrated in Fig. 1 (top), providing a

Continuous path back to image pixels. To-Start, an

Input image is presented to the Convnet and features

Computed throughout the layers. To examine a given

Convnet activation, we set all and activations in the

Layer to zero and pass the feature maps as input to

The attached deconvnet layer. Then we successively

(i) Unpool, (ii) rectify and (iii) filter to reconstruct

The activity in the layer beneath this gave rise to the

Chosen activation. This is and then repeated until input

Pixel space is reached.

Unpooling:in the convnet, the max pooling opera-

tion is non-invertible, however we can obtain an ap-

Proximate inverse by recording the locations of the

Maxima within each pooling region in a set of switch

Variables. In the deconvnet, the unpooling operation

Uses these switches to place the reconstructions from

The layer above into appropriate locations, preserving

The structure of the stimulus. See Fig. 1 (bottom) for

An illustration of the procedure.

Rectification:the Convnet uses Relu non-linearities,

Which rectify the feature maps thus ensuring the fea-

Ture maps is always positive. To obtain valid fea-

Ture reconstructions at each layer (which also should

Be positive), we pass the reconstructed signal through

A Relu non-linearity.

Filtering:the Convnet uses learned filters to con-

Volve the feature maps from the previous layer. To

Visualizing and understanding convolutional Networks

Invert this, the deconvnet uses transposed versions of

The same filters, but applied to the rectified maps, not

The output of the layer beneath. In practice this means

Flipping each filter vertically and horizontally.

Projecting down from higher layers uses the switch

Settings generated by the Max pooling in the Convnet

On the the-up. As these switch settings is peculiar

To a given input image, the reconstruction obtained

From a single activation thus resembles a small piece

of the original input image, with structures weighted

According to their contribution toward to the feature

Activation. Since the model is trained discriminatively,

They implicitly show which parts of the input image

Is discriminative. Note that these projections is not

Samples from the model, since there is no generative

Process involved.

0-visualizing and understanding convolutional Networks (read translation)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

0-visualizing and understanding convolutional Networks (read translation)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

0-visualizing and understanding convolutional Networks (read translation)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support