CNN in the Eyes of the world: using Keras to explain the CNN filter

Source: Internet
Author: User
Tags theano keras

Directory

    • Source information
    • Using Keras to explore the filter for convolutional networks
    • Visualize All Filters
    • Deep Dream (Nightmare)
    • Fool the Neural network
    • The revolution has not been successful, comrades still need to work hard
Source information

This address: http://blog.keras.io/how-convolutional-neural-networks-see-the-world.html

This article Francois Chollet

The translation of this article was first published by me in the Keras Chinese documents, in order to facilitate the netizen, special move this article to CSDN.

Using Keras to explore the filter for convolutional networks

In this article we will use Keras to see what CNN is learning and how it understands the training images we send. We will use Keras to visualize the activation value of the filter. The neural network used in this paper is VGG-16, and the data set is imagenet. The code for this article can be found on GitHub

VGG-16, also known as Oxfordnet, is an convolutional neural network structure developed by the Oxford Visual Geometry Group (Visual Geometry Group). The network won the title of ILSVR (ImageNet) 2014. Today, Vgg is still considered an outstanding visual model-although its performance has actually been surpassed by later inception and ResNet.

Lorenzo Baraldi Caffe Pre-trained VGG16 and VGG19 models to Keras weight files, so we can simply load weights to experiment. This weight file can be downloaded here. Domestic students need to bring their own ladders. (Here is a net disk to keep the vgg16:http://files.heuritech.com/weights/vgg16_weights.h5 hurriedly download, the net disk what do not know when will hang. )

First, we define the structure of the VGG network in Keras:

 fromKeras.modelsImportSequential fromKeras.layersImportconvolution2d, Zeropadding2d, maxpooling2dimg_width, img_height = -, -# Build the VGG16 networkModel = sequential () Model.add (Zeropadding2d (1,1), Batch_input_shape= (1,3, Img_width, img_height)) First_layer = model.layers[-1]# This was a placeholder tensor that would contain our generated imagesInput_img = First_layer.input# Build the rest of the networkModel.add (convolution2d ( -,3,3, activation=' Relu ', name=' Conv1_1 ')) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( -,3,3, activation=' Relu ', name=' Conv1_2 ')) Model.add (Maxpooling2d (2,2), strides= (2,2)) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( -,3,3, activation=' Relu ', name=' Conv2_1 ')) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( -,3,3, activation=' Relu ', name=' Conv2_2 ')) Model.add (Maxpooling2d (2,2), strides= (2,2)) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( the,3,3, activation=' Relu ', name=' Conv3_1 ')) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( the,3,3, activation=' Relu ', name=' Conv3_2 ')) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( the,3,3, activation=' Relu ', name=' Conv3_3 ')) Model.add (Maxpooling2d (2,2), strides= (2,2)) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( +,3,3, activation=' Relu ', name=' Conv4_1 ')) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( +,3,3, activation=' Relu ', name=' Conv4_2 ')) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( +,3,3, activation=' Relu ', name=' Conv4_3 ')) Model.add (Maxpooling2d (2,2), strides= (2,2)) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( +,3,3, activation=' Relu ', name=' Conv5_1 ')) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( +,3,3, activation=' Relu ', name=' Conv5_2 ')) Model.add (Zeropadding2d (1,1))) Model.add (convolution2d ( +,3,3, activation=' Relu ', name=' Conv5_3 ')) Model.add (Maxpooling2d (2,2), strides= (2,2)))# Get the symbolic outputs of each "key" layer (we gave them unique names).Layer_dict = Dict ([(Layer.name, layer) forLayerinchModel.layers])

Note that we do not need an all-connected layer, so the network is defined to the last convolutional layer. Using the full join layer limits the input size to 224x224, which is the size of the imagenet original picture. This is because if the size of the imported picture is not 224x224, the length of the vector does not match the length specified by the model when it is over the convolution to the full link.

Below, we load the pre-trained weights into the model, in general we can load model.load_weights() , but here we only load part of the parameters, if the method is used, the model and parameter forms do not match. So we need to load it manually:

import  H5pyweights_path =  ' vgg16_weights.h5 '  f = h5py. File (Weights_path) for  k in  range (f.attrs[< Span class= "hljs-string" > ' nb_layers ' ]): if  k >= Len (model.layers): Span class= "Hljs-comment" ># we don ' t look at the last (fully-connected) layers in the savefile  break  g = F[ ' layer_{} ' . Format (k)] weights = [G[ ' param_{} ' . Format (p)] for  P in  Range (G.attrs[ ' nb_params ' ])] model.layers[k].set_weights (weights) F.close () print ( "Model loaded. ' )  

Below, we define a loss function that will be used to maximize the activation value of a given filter. With this function optimized for optimization purposes, we can really look at what makes this filter active.

Now we use Keras's backend to complete this loss function so that the code can switch between TensorFlow and Theano without modification. TensorFlow is more than a chunk of the convolution on the CPU, and so far Theano is faster on the GPU.

 fromKerasImportBackend asKlayer_name =' Conv5_1 'Filter_index =0  # can is any integer from 0 to 511, as there is filters in the that layer# Build a loss function that maximizes the activation# of the nth filter of the layer consideredLayer_output = Layer_dict[layer_name].outputloss = K.mean (layer_output[:, Filter_index,:,:])# Compute the gradient of the input picture wrt this lossGrads = k.gradients (loss, input_img) [0]# Normalization Trick:we normalize the gradientGrads/= (K.sqrt (K.mean (k.square)) +1e-5)# This function returns the loss and grads given the input pictureIterate = K.function ([input_img], [loss, grads])

Note that there is a small trick, the calculated gradient is normalized, so that the gradient is not too small or too large. This regularization allows the process of the gradient to rise smoothly.

Based on the function you just defined, you can now ramp up the activation value of a filter.

importas np# we start from a gray image with some noiseinput_img_data = np.random.random((1320128.# run gradient ascent for 20 stepsforin range(20):    loss_value, grads_value = iterate([input_img_data])    input_img_data += grads_value * step

When using TensorFlow, this operation takes about a few seconds.

Then we can extract the results and visualize:

 fromScipy.miscImportImsave# util function to convert a tensor into a valid image def deprocess_image(x):    # normalize Tensor:center on 0., ensure STD is 0.1X-= X.mean () x/= (X.STD () +1e-5) x *=0.1    # Clip to [0, 1]X + =0.5x = Np.clip (x,0,1)# Convert to RGB arrayX *=255x = X.transpose ((1,2,0) x = Np.clip (x,0,255). Astype (' Uint8 ')returnXimg = input_img_data[0]img = Deprocess_image (img) imsave ('%s_filter_%d.png '% (Layer_name, Filter_index), IMG)

Here is the result of the No. 0 filter of the 5th volume base:

Visualize All Filters

Below, we visualize the individual filter results of each layer, and see how CNN is decomposing the input layer by level.

The first layer of filter mainly completes the direction, the color coding, these colors and the direction and the basic texture combination, gradually produces the complex shape.

Each layer of the filter can be thought of as the base vector, these base vectors are generally too complete. The base vector can encode the input of the layer in a compact code. The filter is more refined and complex with the broadening of the spatial information it uses,

It can be observed that the content of many filters is actually the same, but rotates a random angle (such as 90 degrees). This means that we can significantly reduce the number of filters by making convolution filters with rotational invariance, which is an interesting research direction.

Alarmingly, the nature of this rotation can still be observed in high-level filters. such as Conv4_1

Deep Dream (Nightmare)

Another interesting thing is that if we replace the random noise image with a meaningful one, the result becomes more interesting. This is the deep Dream proposed by Google last year. By choosing a specific filter combination, we can get some interesting results. If you are interested in this, you can refer to the example of Keras
Deep Dream and Google blogs Google blog post (wall)

Fool the Neural network

What if we add the full join layer on VGG and try to maximize the activation value for a given category? Will you get a picture that looks much like that category? Let's try.

In this case our loss function looks like this:

layer_output = model.layers[-1].get_output()loss = K.mean(layer_output[:, output_index])

Let's say we're going to maximize the output of the class labeled 65, in Imagenet, this class is a snake. Soon, our loss reached 0.999, that is, the neural network has a 99.9% probability that we generate the picture is a sea snake, it looks like this:

Not much like Ah, a different category to try, this time to choose Magpie Class (18th class)

OK, our network believes that Magpie's things look not like Magpie, to say, this picture is similar to Magpie, but also is some local texture, such as feathers, mouth and so on. So, does this mean that convolutional neural networks are a bad tool? Of course not, we train it on a specific task and it will perform well on that mission. But we cannot have the illusion that the Internet "understands" a concept. We can't personalize the web, it's just a tool. A dog, for example, can recognize it as a dog simply because it can classify it correctly in a very high probability, rather than understanding any extension of the dog.

The revolution has not been successful, comrades still need to work hard

So what does the neural network really understand? I think there are two things that they understand.

First, the neural network understands how to decouple the input space into a layered convolution filter bank. Second, the neural network understands the probability mapping from the combination of a series of filters to a series of specific labels. The neural network learns the thing completely not to reach the human "the sight" the meaning, from the scientific angle, this certainly does not mean that we have solved the computer vision question. Think not too much, we just stepped on the computer vision Ladder first step.

Some people say that convolutional neural networks have learned the hierarchical decoupling of input space that simulates the behavior of the human visual cortex. This may or may not be true, but at the moment it is unknown that we have no stronger evidence to admit or deny it. Of course, some people can expect that the human visual cortex is to learn something in a similar way, and to some extent, it is a natural decoupling of our visual world (just as natural as a Fourier transform is a decoupling of periodic sound signals). Just as the Fourier transform of a sound signal expresses a sound signal of different frequencies. This is a very natural and physical understanding, we may think that our visual information recognition is layered to complete, round is the wheel, there are four wheels of the car, the cool car is a sports car, like this. However, the human perception of visual signal filtering, stratification, processing of the nature of our weak chickens is probably not the same convolution network. The visual cortex is not convolution, although they are also layered, but those layers have a cortical structure, and the true purpose of these structures is currently unknown, and this structure has not yet appeared in our artificial neural network (although Choda Geoff Hinton is working on this). In addition, humans have much more visual perceptron than the Perceptron for static image classification, which are continuous and active, not static and passive, and these receptors are controlled by complex mechanisms such as eye movement.

Next time you have a VC or a well-known CEO warns you to be wary of our deep learning threats, think of the above-mentioned dad. Today we have better tools to deal with complex information, which is cool, but ultimately they are just tools, not creatures. Any work they do is not enough in any cosmic standard to call it "thinking." Drawing a smiley face on a stone does not make the stone "happy", although your primate will tell you it is happy.

In a word, the visualization of convolutional neural networks is fascinating, and who can imagine such a beautiful layered model that can explain complex visual information simply by simply gradient descent and a reasonable loss function, plus a large database. Deep learning may not be intelligent in its practical sense, but it can still reach an effect that no one could have achieved a few years ago. Now, if we can understand why deep learning is so effective, that ... Hehe:)

@fchollet, January 2016

CNN in the Eyes of the world: using Keras to explain the CNN filter

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.