**0. Statement**
It was a failed job, and I underestimated the role of scale/shift in batch normalization. Details in the fourth quarter, please take a warning. **First, the preface**

There is an explanation for the function of the neural network: It is a universal function approximation. The BP algorithm adjusts the weights, in theory, the neural network can approximate any function.

Of course, to approximate the complexity of the function must not exceed the ability of neural network expression, otherwise it will produce the phenomenon of lack of fit. The function complexity of a network can usually be related to the number and depth of hidden layer nodes.

This article uses a visualization method to visually express a neural network to express its ability. **second, the algorithm**

It is noteworthy that the function of the neural network is the continuous function of the input. If the input is the image of the 2-dimensional coordinates, the output is 3-D RGB color, the color is the coordinates of the continuous function, which is the image is a very important standard of beauty. If we randomly generate the weights of the neural networks, the complexity of the generated images will give us a rough idea of how complex a network can express a function.

Here is the code to generate the image (based on Deeplearntoolbox, Address: Https://github.com/happynear/DeepLearnToolbox):

Layers = Randi (10,1,10) +10;% The number of hidden layer nodes, from [10 20] random sampling
nn = Nnsetup ([2 layers 3]);% build network
nn.activation_function = ' Sigm ';% hidden layer activation function
nn.output = ' sigm ';% output layer activation function
nn.usebatchnormalization = 0;% whether to use batch
normalization Output_h = 600;% High
output_w = 800;% wide
[i,j] = Ind2sub ([Output_h,output_w], (1:output_h*output_w) ');% get coordinates
I = (I-OUTPUT_H/2)/output_h * 2;% normalized
J = (J-OUTPUT_W/2)/output_w * 2;
nn = NNFF (nn,[i j],zeros (Size (i,1), 3));% feedforward neural network
output = Nn.a{length (Nn.a)};
Output = Zscore (output);% to do normalization, this step can be omitted if the output layer activation function is SIGM.
output = reshape (output,[output_h,output_w,3]);
Imshow (Uint8 (output*100+128)),% display image

There are three places to set up, the number of hidden nodes, the activation function used in the neural network, whether the batch normalization is used. For batch normalization, please refer to my previous blog post (link)

The addition of batch normalization here is based on the following considerations: Since we are not using a trained neural network, even if the input is normalized, the weight of the network may not be suitable for the input form we use, not only the gradient will produce a dispersion phenomenon, A similar phenomenon occurs in the Feedforward neural network, which leads to the premature saturation of the network. A normalization is done on each floor to minimize the occurrence of this phenomenon. **iii. Results**

Relu+batch Normalization:

Relu, no batch normalization:

Sigmoid+batch Normalization:

sigmoid, no batch normalization:

The above images are generated from 10 hidden layers of neural networks, and images generated by different layers are asked to run the code and observe them.

You can see:

1, the functions expressed by Relu+batch normalization are the most complex.

2, Sigmoid+batch normalization generated images have more of the same color area, relatively less than relu expression ability, and some even as relu without bn conditions generated images complex.

3, using the sigmoid function instead of the batch normalization algorithm, the function image is relatively simple, by observing the output of each layer can be seen, the last layer of the node response value is almost equal, at this time the network has actually degenerated into a simple no hidden layer network.

The resulting images can be downloaded from the following link:

Http://pan.baidu.com/s/1hqtkoug **Four, PostScript**

Because the teacher in the next room's office has a deep learning call for Paper, so I take this blog out to want to do more in-depth research. I was thinking, if you continue to leaky Relu, maxout and other structures to run a bit, and then expand this, into the form of CNN (this algorithm can be seen as a 2channel linear image, after multiple 1x1conv, output 3channel image), The Vgg, Inception, NIN and other structures to run, is a good article ah.

However, after coding a bunch of code, I found a major bug, and in the batch normalization layer, I only considered the impact of scale and forgot another key factor: shift, which actually has a greater impact on the functions that the network expresses.

Note that all lines appear to point to the midpoint of the image in the image generated by Sigmoid+batch normalization above. This is because if the shift=0 is set, the function expressed must be a singular function after bn, because of the sigmoid about (0,0.5) odd symmetry. The nesting of multiple odd functions may be odd or even function, but they all have some symmetry, which limits the use of neural networks. So shift also has to be assigned to break this symmetry.

After you assign a random value to shift, the image generated by sigmoid+bn becomes this way (hidden layer number 5):

From this picture, we can hardly say how complicated it is with the image generated by Relu, so this method only generates good-looking graphs. It's just like that, but it doesn't have any eggs.

Further analysis, in fact, the possibility of having a large number of complex images in the original artwork is shrunk to the middle of a small point, for example, if you turn the shift's random range down a little bit, you'll generate a complex middle, with a radial shape around it (hidden layer 8):

It appears that this Paper is not out, please allow me to make a sad expression.