Using stochastic feedforward neural network to generate image observation network complexity __ Neural network

Source: Internet
Author: User
0. Statement

It was a failed job, and I underestimated the role of scale/shift in batch normalization. Details in the fourth quarter, please take a warning. First, the preface

There is an explanation for the function of the neural network: It is a universal function approximation. The BP algorithm adjusts the weights, in theory, the neural network can approximate any function.
Of course, to approximate the complexity of the function must not exceed the ability of neural network expression, otherwise it will produce the phenomenon of lack of fit. The function complexity of a network can usually be related to the number and depth of hidden layer nodes.
This article uses a visualization method to visually express a neural network to express its ability. second, the algorithm

It is noteworthy that the function of the neural network is the continuous function of the input. If the input is the image of the 2-dimensional coordinates, the output is 3-D RGB color, the color is the coordinates of the continuous function, which is the image is a very important standard of beauty. If we randomly generate the weights of the neural networks, the complexity of the generated images will give us a rough idea of how complex a network can express a function.

Here is the code to generate the image (based on Deeplearntoolbox, Address: Https://github.com/happynear/DeepLearnToolbox):

Layers = Randi (10,1,10) +10;% The number of hidden layer nodes, from [10 20] random sampling

nn = Nnsetup ([2 layers 3]);% build network
nn.activation_function              = ' Sigm ';% hidden layer activation function
nn.output = ' sigm ';% output layer activation function
nn.usebatchnormalization            = 0;% whether to use batch

normalization Output_h = 600;% High
output_w = 800;% wide

[i,j] = Ind2sub ([Output_h,output_w], (1:output_h*output_w) ');% get coordinates
I = (I-OUTPUT_H/2)/output_h * 2;% normalized
J = (J-OUTPUT_W/2)/output_w * 2;

nn = NNFF (nn,[i j],zeros (Size (i,1), 3));% feedforward neural network

output = Nn.a{length (Nn.a)};
Output = Zscore (output);% to do normalization, this step can be omitted if the output layer activation function is SIGM.
output = reshape (output,[output_h,output_w,3]);

Imshow (Uint8 (output*100+128)),% display image

There are three places to set up, the number of hidden nodes, the activation function used in the neural network, whether the batch normalization is used. For batch normalization, please refer to my previous blog post (link)
The addition of batch normalization here is based on the following considerations: Since we are not using a trained neural network, even if the input is normalized, the weight of the network may not be suitable for the input form we use, not only the gradient will produce a dispersion phenomenon, A similar phenomenon occurs in the Feedforward neural network, which leads to the premature saturation of the network. A normalization is done on each floor to minimize the occurrence of this phenomenon. iii. Results

Relu+batch Normalization:

Relu, no batch normalization:

Sigmoid+batch Normalization:

sigmoid, no batch normalization:

The above images are generated from 10 hidden layers of neural networks, and images generated by different layers are asked to run the code and observe them.

You can see:
1, the functions expressed by Relu+batch normalization are the most complex.
2, Sigmoid+batch normalization generated images have more of the same color area, relatively less than relu expression ability, and some even as relu without bn conditions generated images complex.
3, using the sigmoid function instead of the batch normalization algorithm, the function image is relatively simple, by observing the output of each layer can be seen, the last layer of the node response value is almost equal, at this time the network has actually degenerated into a simple no hidden layer network.

The resulting images can be downloaded from the following link:
Http://pan.baidu.com/s/1hqtkoug Four, PostScript

Because the teacher in the next room's office has a deep learning call for Paper, so I take this blog out to want to do more in-depth research. I was thinking, if you continue to leaky Relu, maxout and other structures to run a bit, and then expand this, into the form of CNN (this algorithm can be seen as a 2channel linear image, after multiple 1x1conv, output 3channel image), The Vgg, Inception, NIN and other structures to run, is a good article ah.
However, after coding a bunch of code, I found a major bug, and in the batch normalization layer, I only considered the impact of scale and forgot another key factor: shift, which actually has a greater impact on the functions that the network expresses.
Note that all lines appear to point to the midpoint of the image in the image generated by Sigmoid+batch normalization above. This is because if the shift=0 is set, the function expressed must be a singular function after bn, because of the sigmoid about (0,0.5) odd symmetry. The nesting of multiple odd functions may be odd or even function, but they all have some symmetry, which limits the use of neural networks. So shift also has to be assigned to break this symmetry.
After you assign a random value to shift, the image generated by sigmoid+bn becomes this way (hidden layer number 5):

From this picture, we can hardly say how complicated it is with the image generated by Relu, so this method only generates good-looking graphs. It's just like that, but it doesn't have any eggs.

Further analysis, in fact, the possibility of having a large number of complex images in the original artwork is shrunk to the middle of a small point, for example, if you turn the shift's random range down a little bit, you'll generate a complex middle, with a radial shape around it (hidden layer 8):

It appears that this Paper is not out, please allow me to make a sad expression.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.