Learning Notes for machine learning (II): Neural networks

Last Update:2015-03-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Linear regression and logistic regression are sufficient to solve some simple classification problems, but in the face of more complex problems (such as identifying the type of car in the picture), using the previous linear model may not result in the desired results, and due to the larger data volume, the computational complexity of the previous method will become unusually large. So we need to learn a nonlinear system: neural networks.

When I was studying, I was mainly through the network provided by Professor Andrew Ng, and in many of these I had borrowed information from Professor Andrew Ng at Mooc.

Reprint Please specify source: http://blog.csdn.net/u010278305

Neural network is proved to be a better algorithm than linear regression and logistic regression when solving some complex nonlinear classification problems. In fact, neural networks can also be seen as a combination of logistic regression (superposition, cascade, etc.).

The model for a typical neural network is as follows:

The above model consists of 3 parts: input layer, hidden layer, output layer. Input layer input characteristic value, output layer output as the basis for our classification. For example, a 20*20-size handwritten digital Image recognition example, then input layer input can be 20*20=400 pixels pixel value, that is, the model A1, output layer output can be seen as the picture is 0 to 9 of the probability of one of the numbers. In fact, each node in the hidden layer and output layer can be seen as a logistic regression. The model of logistic regression can be seen as such (as shown):

With the model of neural network, our aim is to solve the parameter theta inside the model, and we need to know the cost function of the model and the "gradient value" of each node.

The cost function is defined as follows:

Cost function The gradient of theta at each node can be computed by the inverse propagation algorithm. The idea behind the inverse propagation algorithm is that we cannot intuitively get the output of the hidden layer, but we know the output of the output layer, by reverse propagation, backwards its parameters.

We use the following model examples to illustrate the idea of reverse propagation, process:

The model differs from the first model given in that it has two hidden layers.

To familiarize ourselves with this model, we need to first understand the process of forward propagation, and for this model, the forward propagation process is as follows:

Among them, the meaning of a1,z2 and other parameters can be referred to the first neural network model, the analogy is obtained.

Then we define the error delta symbol to have the following meanings (then deduce the gradient to be used):

The error delta is calculated as follows:

Then we obtain the node gradient by the inverse propagation algorithm, the process of the inverse propagation algorithm is as follows:

With the cost function and the gradient function, we can first test our gradient results with a numerical method. Then we can call Matlab's Fminunc function to obtain the optimal theta parameter as before.

It is important to note that when initializing the theta parameter, you need to give theta random values, not fixed to 0 or whatever, which avoids the same parameters for each node after training.

Here's the code for calculating the cost and the gradient:

function [J Grad] = nncostfunction (Nn_params, ... input_layer_size, ...                                   Hidden_layer_size, ... num_labels, ... X, Y, Lambda)%nncostfunction Implements The neural network cost function for a, and Layer%neural network which performs classification% [J Grad] = Nncostfuncton (Nn_params, Hidden_layer_size, num_labels, ...% X, y, Lambda) computes the Cos T and gradient of the neural network. the% parameters for the neural network is "unrolled" into the vector% nn_params and need to being converted back into th E weight matrices. % The returned parameter grad should be a "unrolled" vector of the% partial derivatives of the neural network.%% Res Hape Nn_params back to the parameters Theta1 and Theta2, the weight matrices% for our 2 layer neural networkTheta1 = Res Hape (Nn_params (1:hidden_layer_size * (input_layer_size + 1)), ... Hidden_Layer_size, (input_layer_size + 1)); Theta2 = Reshape (Nn_params ((1 + (Hidden_layer_size * (input_layer_size + 1)): End), ... num_labels, (Hidde         N_layer_size + 1));% Setup some useful variablesm = Size (X, 1); % need to return the following variables correctly J = 0; Theta1_grad = zeros (Size (Theta1)); Theta2_grad = zeros (Size (Theta2));% ====================== YOUR CODE here ======================% instructions:you Should complete the code by working through the% following parts.%% part 1:feedforward the neural network A nd return the cost in the% variable J.         After implementing Part 1, you can verify this your% cost function computation are correct by verifying the cost% Computed in ex4.m%% part 2:implement the backpropagation algorithm to compute the gradients% Theta1_grad a nd Theta2_grad. You should return the partial derivatives of% the cost function with respect to THETA1 and Theta2 in Theta1_grad a     nd%    Theta2_grad, respectively.         After implementing Part 2, you can check% that your implementation are correct by running checknngradients%% Note:the Vector y passed into the function was a vector of labels% containing values from 1..K.               You need to map this vector into a% binary vector of 1 ' s and 0 's to being used with the neural network% Cost function.%% Hint:we recommend implementing backpropagation using a for-loop% over the Training examples If you is implementing it for the first time.%% part 3:implement regularization with T  He cost function and gradients.%% hint:you can implement this around the code for% backpropagation.                That's, you can compute the gradients for% the regularization separately and then add them to theta1_grad%    And Theta2_grad from part 2.%j_tmp=zeros (m,1), for I=1:m Y_vec=zeros (num_labels,1);    Y_vec (Y (i)) = 1; A1= [Ones (1, 1) X (i,:)] ';    Z2=THETA1*A1;    A2=sigmoid (Z2);    A2=[ones (1,size (a2,2)); A2];    Z3=THETA2*A2;    A3=sigmoid (Z3);    HTHETAX=A3; J_tmp (i) =sum (-y_vec.*log (Hthetax)-(1-y_vec). *log (1-hthetax)); Endj=1/m*sum (j_tmp); j=j+lambda/(2*m) * (SUM (SUM (Theta1 (:, 2:end). ^2)) +sum (SUM (THETA2 (:, 2:end). ^2));D elta1 = Zeros (Hidden_layer_size, (    Input_layer_size + 1));D elta2 = Zeros (Num_labels, (hidden_layer_size + 1)); for T=1:m Y_vec=zeros (num_labels,1);    Y_vec (Y (t)) = 1;    a1 = [1 X (T,:)] ';    Z2=THETA1*A1;    A2=sigmoid (Z2);    A2=[ones (1,size (a2,2)); A2];    Z3=THETA2*A2;    A3=sigmoid (Z3);    Delta_3=a3-y_vec;    Gz2=[0;sigmoidgradient (z2)];    Delta_2=theta2 ' *delta_3.*gz2;    Delta_2=delta_2 (2:end);    Delta2=delta2+delta_3*a2 '; Delta1=delta1+delta_2*a1 '; endtheta1_grad=1/m*delta1; THETA2_GRAD=1/M*DELTA2; Theta1 (:, 1) = 0; Theta1_grad=theta1_grad+lambda/m*theta1; THETA2 (:, 1) = 0; theta2_grad=theta2_grad+lambda/m*theta2;%-------------------------------------------------------------% =========================================================================% unroll Gradientsgrad = [Theta1_grad (:); Theta2_grad (:)];end

Finally, to summarize, for a typical neural network, the training process is as follows:

Following this procedure, we can obtain the parameter theta of the neural network.

Reprint Please specify source: http://blog.csdn.net/u010278305

Learning Notes for machine learning (II): Neural networks

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Learning Notes for machine learning (II): Neural networks

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Learning Notes for machine learning (II): Neural networks

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support