The algorithm of machine learning from logistic to neural network

Last Update:2018-08-16 Source: Internet

Author: User

Tags rand

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the first two sections, the logistic regression and classification algorithms were introduced, and the linear and nonlinear data sets were classified experimentally. Logistic uses a method of summation of vector weights to map, so it is only good for linear classification problem (experiment can be seen), its model is as follows (the detailed introduction can be viewed two times blog:

linear and nonlinear experiments on logistic classification of machine learning (continued)):

That being the case, can we make a few more mappings before y comes out? The answer is yes, this leads to multilayer network, each layer of network output and then SIGMOD mapping to 0-1, then it is the neural network system. For example, add a few layers above can be expressed as:

The same sample input starts from the leftmost, the first layer y value is computed by the weighted w matrix, and the result is the SIGMOD function mapping as the input of the second layer, then the weighted matrix u gets Z, maps, and so on, and finally gets the output y. So what is the difference between this multi-layer network and the single layer above?

First you can see that the final output y and the input (X1,X2) are no longer simple weights multiplied and then mapped, but the weights are multiplied and mapped, and then the weights are multiplied and mapped, multiplied by the mappings, and then the Y is no longer a linear relationship with the (X1,X2) relationship before and after the network, and who knows what the relationship is.

It may be said that a network like this, why add two layers. Is it possible to continue to increase? Why is there 3 nodes in each layer? Is it possible to increase it? Yes, the above layers and the number of each layer can be increased, this is the need for the design of neural networks, each additional layer and the number of nodes per layer, the relationship between the network will change, as to what the appearance, without tube, as to exactly how many layers of how many nodes, then see the actual effect. This involves the deep discussion category of neural network.

First say the network, after a network is determined (what is called OK.) Is that the value coefficient of ownership on the network is known) obviously Y and (X1,X2) is a non-linear relationship, you can simply see that the above y1,y2,y3 are related to (X1,X2) respectively, Z1,Z2,Z3 is related to the Y1,y2,y3, and the final Y and z1,z2,z3, respectively, After multiple iterations, Y can be represented by a nonlinear composite relationship (X1,X2). Generally speaking, this network can represent any non-linear relation. After all the parameters of such a network are known, then given an input, the output is very fast, is always forward calculation, and the computer is easy to do this, so the training of the neural network is a very fast classification method. However, the training of this network parameter is not so easy and enjoyable.

By the above instructions, we have already known that neural networks are powerful (representing arbitrary non-linear relationships). So the next question is how to train the network.

In the logistic classification algorithm, we know that the weight parameter of one layer network is adjusted by the gradient descent method through the error value of the result and the predicted result. So what about this multilayered network. This is the same way to do it, and the difference is that it requires a layer of computation.

First we set the hidden layer and the number of nodes per layer, then construct a weighted random network, for each sample input of the training sample will have an output value O, then this o and the actual value T will have an error E (since the training sample, there must be a target classification result of T), According to the results of the above diagram, this e will be directly related to the last level of Input Z ' (sigmod value) and weight V, so that we can update V with this E, and we can assign this error E to the error of the previous layer according to the size of the input z ' and the weight v. The error value of the system after Z is e1,e2,e3. (This e1,e2,e3 can be computed by E, Z, v). In this way we can update the weight of U according to E1,e2,e3, the same as a layer of Y layer, we also have a set of error e4,e5,e6, and E4,e5,e6 can be expressed by the value of the network behind it to update the weight of W, if there is, and then continue to spread forward. for the updating of weights and errors, the method of transmitting the results of the network to the front is the reverse propagation algorithm in the neural network.

Here is a brief account of how the error spreads to the back, and how the formula for calculating the weights is updated. This part of a book
"Machine learning"
P74-p75 has a detailed derivation of the process can be seen (considering the two-page formula, editing the formula is not easy to save it). Read the book, in the attached a few blog to see (the most detailed in the book):
Neural Network and reverse propagation algorithm
BPNN Neural Network in reverse communication

This gives only the final weight update result: ∇wij=−η∂ed∂wij=η (TJ−OJ) OJ (1−oj) Xji \nabla w_{ij}=-\eta \dfrac{\partial e_d}{\partial W_{ij}}=\eta (t_ J-o_j) O_j (1-o_j) X_{ji}
Wij=wij+∇wij W_{ij} = W_{ij}+\nabla W_{ij}

Where Η\eta is a step coefficient, T is the output of the layer, O is the target value of the layer, X is the input to the layer. Then for each layer of the network use this formula to update the weights, after the update of the new weights will be iterated to calculate the network output error, and then the reverse transmission of the error, update the weight, so it can continue.
The following is the pseudo code of the algorithm (just the structure of the two-layer network, multilayer words have to increase the cycle):

This has been in the MATLAB BP neural network reverse propagation algorithm has also been discussed.

I've said so much that we can do the experiment. The experiment was preceded by a sample selection, where two sets of data were created: linear and non-linear, drawn as follows (100 samples per class, two categories):

It can be seen that there are some overlapping between the two types of interfaces between linear and non-linear (these overlaps are generally impossible to delimit).

Below constructs the network, the network uses the front that contains 2 hidden layers, each layer 3 nodes network, the input is the two-dimensional data exactly, the output is one dimension data (the category label, also is just right).

The code is as follows:

%% * Neural network classification Design% * Simple 0-1 categories--linear and nonlinear classification%% CLC Clear close all% Load data% * Preprocessing--two types of case% and the label reset to 0 and 1, convenient sigmod
function Application data = Load (' Data_test1.mat ');
data = Data.data ';
% tab Set 0,1 data (:, 3) = Data (:, 3)-1;
% Select Training Sample number Num_train = 50;
% construct random selection sequence choose = Randperm (Length (data)); Train_data = data (choose (1:num_train),:); Gscatter (Train_data (:, 1), Train_data (:, 2), Train_data (:, 3)); Label_train =
Train_data (:, end); Test_data = data (choose (num_train+1:end),:); label_test = Test_data (:, end); percent% initial parameter% input-output dimension num_in = Siz
E (train_data,2)-1;  Num_out = 1;% Output only label--1 dimension% Network value parameter M = 2;% define implied network layer n = 3;% define each layer implied network node INTA = 0.1;% Learning Step percent percent initialization random assignment network weights for i = 1:m+1 if I
        = = 1 input layer W{i} = rand (N,NUM_IN);
    Continue
        End If i = = m+1% output layer W{i} = rand (NUM_OUT,N);
    Continue
End W{i} = rand (N,N); End percent Training Network for gen = 1:1000 for i = 1:length (train_data)%% forward compute output value of each layer node Data_simple = Train_data (i,
        1:END-1); Net1 = daTa_simple*w{1} ';% first layer H1 = 1./(1+exp (-NET1));% get each node output Net2 = h1*w{2} ';% and layer hidden layer each node input h2 = 1./
        (1+exp (-net2));% gets output Net3 = h2*w{3} ';% output layer input z = 1./(1+exp (-net3)); % percent error in inverse calculation of each output layer delta Delta3 = (Label_train (i)-Z) *z* (1-z);% output layer error for j = 1:3 Delta2 (j) = (delt
        A3*w{3} (j)) *h2 (j) * (1-h2 (j));
        End for j = 1:3 Delta1 (j) = (Delta2*w{2} (:, J)) *h1 (j) * (1-h1 (j));
        End percent each successive update of the network's weights to the second level to the output layer weights for j = 1:3 W{3} (j) = W{3} (j) + INTA*DELTA3*H2 (j); End% Layer 1th to 2nd level weights for j = 1:3 for k = 1:3 w{2} (j,k) = w{2} (j,k) + inta*delta2
            (j) *h1 (k);  End to% input layer to 1th layer weights for j = 1:3 for k = 1:num_in W{1} (j,k) = W{1} (j,k)
            + inta*delta1 (j) *data_simple (k);
End ended end% gets the output of each node output ends percent prediction Classification result predict = zeros (1,length (test_data)); For i =1:length (test_data)% forward compute the output value of each layer node Data_simple = test_data (i,1:end-1); Net1 = data_simple*w{1} ';% first layer of hidden layer input h1 = 1./(1+exp (-NET1)),% to get each node output Net2 = h1*w{2} ';% and layer hidden layer each node input h2 = 1./(1 +
    Exp (-net2)),% get the hidden layer of each node output Net3 = h2*w{3} ';% output layer each node input z = 1./(1+exp (-net3));
    If z > 0.5 predict (i) = 1;
    else predict (i) = 0;
End of percent shows the result figure;
Index1 = Find (predict==0); Data1 = (Test_data (index1,:)) '; Plot (data1 (1,:), data1 (2,:), ' or '); hold on index2 = Find (predict==1); data2 = (Test_data (i
Ndex2,:)) '; Plot (data2 (1,:), data2 (2,:), ' * '); hold on INDEXW = find (Predict ' ~= (label_test)); Dataw = (Test_data (INDEXW,:)) '; Plot ( Dataw (1,:), Dataw (2,:), ' +g ', ' linewidth ', 3); accuracy = Length (find (Predict ' ==label_test))/length (Test_data); Title (
 [' predict the training data and the accuracy is: ', num2str (accuracy)]);

Related comments in the program, updating the iteration part might have to be interpreted against the formula (pseudo code) to understand what it means.

All right, first load the linear data results:

The green expression was wrong . You can see that the neural network is easy for linearity, set 1000 for this iteration enough. As for why it is wrong, as we said, I deliberately set up some mixed data on the interface when structuring the data, which is generally impossible to separate.

The following non-linear experiment:

This is 50 training under the 150 test samples after 10,000 iterations of the results, the quality and the number of training samples, weight learning step, the number of iterations have a great relationship. To be honest, the result is not ideal, but at least the non-linear effect is there. I have adjusted the parameters, also increased the number of iterations, has not been the accuracy rate of more than 80%. I think the possible reason is that first of all, this program is the simplest basis of the neural network, learning step size is fixed value, the weight of the update without memory, these can be further optimized, take the learning step, iterative early learning step can be larger, later, learning step size can be smaller. Weight update can also add a memory, that is, the last time the results in a certain proportion to the results of this time. There is also a problem with the output value of the sample, you can see that the final output value is also passed through the SIGMOD function to 0-1, that is, if the target value of all your samples is not between 0-1, the network will not have positive or negative feedback (either positive feedback, or all negative feedback), It is not possible to find the correct network parameters. Of course I'm here sample target output is changing to 0, 1, that is, the class label, very limited, do not know if there is no reason to cause the accuracy of the results, in fact, I feel the ideal result is the classification of the surface will come out a round, although the basic neural network, but after all is a neural network, as for why so little, Do you know the reason?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More