Written in front: Thank you @ challons for the review of this article and put forward valuable comments. Let's talk a little bit about the big hot neural network. In recent years, the depth of learning has developed rapidly, feeling has occupied the entire machine learning "half". The major conferences are also occupied by deep learning, leading a wave of trends. The two hottest classes in depth learning are convolution neural networks (CNN) and recurrent neural Networks (RNN), starting with the two models.
Of course, the two models involved in the concept of the content is too much to write more, so in order to be able to speak more clearly, here from a number of basic concepts, the great God do not feel bored ah ...
Today we are all connected layers and also an important part of the neural network. about how the neural network was invented here is not to say. The full connection layer is generally composed of two parts, for the later formula to be more clearly described, the following variables in the name of the upper-level representation of the layer, subscript represents a vector or the number of columns in the matrix: Linear part: The main to do linear conversion, input with x, output in Z-expression
Nonlinear parts: Of course, it is a non-linear transformation, input with the output of the linear part of Z, the output is represented by X.
Linear part
The linear part of the operation method is basically the sense of linear weighted summation, if for an input vector, the linear part of the output vector is, then the linear part of the parameter can imagine a m*n matrix W, plus a bias, so there are:
The linear part did something. In a nutshell, the input data is analyzed in different angles, and the whole input data is judged in this angle.
So that's a bit abstract, for example of an actual point, take CNN's introductory case--mnist. Mnist's example here is not to say, it is a handwritten numeral recognition project, input is a 28*28 two-value graph, output is 0-9 this is a number, assuming that we use a fully-connected model, then our input is 28*28=784 pixel points. The data is displayed on the screen about this way:
For us, this pixel is too abstract, we cannot judge the value of these pixels and the final recognition of the relationship:
Whether they are positively or negatively correlated.
It is clear that there is a correlation between pixel points, the relationship is exactly what we say later, but the relationship between the matter is guaranteed. So just give each pixel point a weight is not solve the problem, we need more than a group of weights.
We can
1 gives the first pixel a positive number in the first set of weights, and the second is a positive number,
2 in the second group of weights to give the first pixel negative number, and the second is a positive number ...
In this way, we can analyze and summarize the input data from multiple angles, and get multiple output results, that is, multiple evaluations of the data. Nonlinear part
There are some "routines" functions in the nonlinear section, only one of which is a classical function--sigmoid. Its function form is as follows:
The image looks like this:
The input of this function is exactly the output Z of the linear part of our previous step, at which point Z takes the range of values and passes through this function to become.
Why does the non-linear part have to do this function conversion? In my superficial understanding, one of the functions is to make data normalization. Regardless of what the previous linear part does, to the nonlinearity, all the values are limited to a range so that the subsequent network layer is relatively controllable if it is to continue to compute based on the data in the front layer. Otherwise, if the value of each layer is different, some range in (0,1), and some in (0,10000), do optimize the time to optimize the size of the set will have trouble.
Another effect is to break the linear mapping relationship that preceded it. If there is no nonlinear part of the fully connected layer, only the linear part, we are not meaningful to overlay the multilayer neural network in the model, we assume that there is a 2-layer fully connected neural network, which has no nonlinear layer, then for the first layer:
For the second layer there are:
Two-merge, with
So we can use a layer of neural network to represent the previous two-layer neural network. Therefore, the addition of nonlinear layer makes the existence of multilayer neural network meaningful.
There is also a more well-known non-linear function, called hyperbolic tangent function. Its function form is as follows:
The range of this very complex function is ( -1,1). As you can see, its function range is different from the previous sigmoid, it has positive and negative, and sigmoid is all positive. The shape of a neural network
In fact, for a neural network with only one layer and only one output, if its non-linear part also uses the Sigmoid function, its form is the same as that of the logistic regression. So it can be imagined that the neural network model is conceptually more complex than the logical regression. So what is the complexity of it? Here is a section of the fully connected layer of code, start experimenting:
Class FC:
def __init__ (self, in_num, out_num, lr = 0.01):
self._in_num = in_num self._out_num
= Out_num
s ELF.W = Np.random.randn (Out_num, in_num) *
self.b = Np.zeros (out_num)
def _sigmoid (self, in_data):
Return 1/(1 + np.exp (in_data))
def forward (self, in_data): Return
self._sigmoid (Np.dot (SELF.W, in_data) + self. b
There's not much to see in the code, and notice that we randomly initialize the W in the parameter, and sometimes we let God randomly give us a neural network, and we can also look at random the great.
For the convenience of visualization, this is done with data input of 2 and output of 1. Okay, let's see number 1th first:
x = Np.linspace ( -10,10,100)
y = np.linspace ( -10,10,100)
x, y = Np.meshgrid (x,y)
x_f = X.flatten () y_f
= y . Flatten ()
data = Zip (X_f, y_f)
FC = FC (2, 1)
Z1 = Np.array ([Fc.forward (d) for D in data])
Z1 = Z1.reshape ((100,100))
Draw3d (X, Y, Z1)
Look intently this is actually a standard logistic regression. His image is shown below:
After many random tests, basically it is the shape, but with the weight of random numerical changes, this "step" to rotate to different directions, but in the final analysis is a step.
This also shows that 1-layer neural network is not a way out, it is essentially a linear classifier of the strength, then the small partners to add a layer to it:
FC = FC (2, 3)
FC.W = Np.array ([[[0.4, 0.6],[0.3,0.7],[0.2,0.8]])
fc.b = Np.array ([0.5,0.5,0.5])
FC2 = FC (3 , 1)
FC2.W = Np.array ([0.3, 0.2, 0.1])
fc2.b = Np.array ([0.5])
Z1 = Np.array ([Fc.forward (d) for D-data])
Z2 = Np.array ([Fc2.forward (d) for D in Z1])
Z2 = Z2.reshape ((100,100))
draw3d (X, Y, Z2)
This time we do not use random weights, but they set a number of values, you can see that the parameters set very carefully. The two layers are all positive ... and the image.
It looks softer than the previous steps, but it's like a step in the end ... Okay, so let's add a negative weight and let's analyze the input data from two aspects:
FC = FC (2, 3)
FC.W = Np.array ([[ -0.4, 1.6],[-0.3,0.7],[0.2,-0.8]])
fc.b = Np.array ([ -0.5,0.5,0.5])
FC2 = FC ( 3, 1)
FC2.W = Np.array ([ -3, 2,-1])
fc2.b = Np.array ([0.5])
Z1 = Np.array ([Fc.forward (d) for D in data])
Z 2 = Np.array ([Fc2.forward (d) for D in Z1])
Z2 = Z2.reshape ((100,100))
draw3d (X, Y, Z2)
Hurry up figure:
After adding a negative weight, it finally looks less like a step, and the non-linear ability of the 2-layer neural network begins to appear. The weight is given to the random Emperor:
FC = FC (2)
FC2 = FC (1)
Z1 = Np.array ([Fc.forward (d) for D-in data])
Z2 = Np.array ([Fc2.forward (d) For d in Z1])
Z2 = Z2.reshape ((100,100))
draw3d (X, Y, Z2, (75,80))
Image above:
This time the nonlinearity is very obvious, but there seems to be a small problem, that is, the function seems to be the origin of the "Central symmetry" division. The two points of center symmetry must fall into different categories. Non-linear not thorough ah ...
In that case, keep adding layers:
FC = FC (2) FC2 = FC (m) FC3 = FC (m) FC4 = FC (+)
fc5 = FC (1)
Z1 = Np.array ([fc.fo Rward (d) for D in data])
Z2 = Np.array ([Fc2.forward (d) to D in Z1])
Z3 = Np.array ([Fc3.forward (d) for D-in Z2])
Z4 = Np.array ([Fc4.forward (d) for D in Z3])
Z5 = Np.array ([Fc5.forward (d) for D in Z4])
Z5 = Z5.reshape ((100,10 0)
draw3d (X, Y, Z5, (75,80))
This picture is a bit ...
From the above experiment can be seen, the higher the layer, the Non-linear "ability" is indeed stronger, the brain open up the greater.
Knowing his bad, next time we'll talk about it in detail--reverse propagation (back propagation).
Article code can be found in HTTPS://GITHUB.COM/HSMYY/ZHIHUZHUANLAN/BLOB/MASTER/FCLAYER.IPYNB
Author: Feng
Link: https://zhuanlan.zhihu.com/p/21525237
Source: Know
Copyright belongs to the author. Commercial reprint please contact the author to obtain authorization, non-commercial reprint please indicate the source.