Linear regression and logistic regression are sufficient to solve some simple classification problems, but in the face of more complex problems (such as identifying the type of car in the picture), using the previous linear model may not result in the desired results, and due to the larger data volume, the computational complexity of the previous method will become unusually large. So we need to learn a nonlinear system: neural networks.
When I was studying, I was mainly through the network provided by Professor Andrew Ng, and in many of these I had borrowed information from Professor Andrew Ng at Mooc.
Reprint Please specify source: http://blog.csdn.net/u010278305
Neural network is proved to be a better algorithm than linear regression and logistic regression when solving some complex nonlinear classification problems. In fact, neural networks can also be seen as a combination of logistic regression (superposition, cascade, etc.).
The model for a typical neural network is as follows:
The above model consists of 3 parts: input layer, hidden layer, output layer. Input layer input characteristic value, output layer output as the basis for our classification. For example, a 20*20-size handwritten digital Image recognition example, then input layer input can be 20*20=400 pixels pixel value, that is, the model A1, output layer output can be seen as the picture is 0 to 9 of the probability of one of the numbers. In fact, each node in the hidden layer and output layer can be seen as a logistic regression. The model of logistic regression can be seen as such (as shown):
With the model of neural network, our aim is to solve the parameter theta inside the model, and we need to know the cost function of the model and the "gradient value" of each node.
The cost function is defined as follows:
Cost function The gradient of theta at each node can be computed by the inverse propagation algorithm. The idea behind the inverse propagation algorithm is that we cannot intuitively get the output of the hidden layer, but we know the output of the output layer, by reverse propagation, backwards its parameters.
We use the following model examples to illustrate the idea of reverse propagation, process:
The model differs from the first model given in that it has two hidden layers.
To familiarize ourselves with this model, we need to first understand the process of forward propagation, and for this model, the forward propagation process is as follows:
Among them, the meaning of a1,z2 and other parameters can be referred to the first neural network model, the analogy is obtained.
Then we define the error delta symbol to have the following meanings (then deduce the gradient to be used):
The error delta is calculated as follows:
Then we obtain the node gradient by the inverse propagation algorithm, the process of the inverse propagation algorithm is as follows:
With the cost function and the gradient function, we can first test our gradient results with a numerical method. Then we can call Matlab's Fminunc function to obtain the optimal theta parameter as before.
It is important to note that when initializing the theta parameter, you need to give theta random values, not fixed to 0 or whatever, which avoids the same parameters for each node after training.
Here's the code for calculating the cost and the gradient:
function [J Grad] = nncostfunction (Nn_params, ... input_layer_size, ... Hidden_layer_size, ... num_labels, ... X, Y, Lambda)%nncostfunction Implements The neural network cost function for a, and Layer%neural network which performs classification% [J Grad] = Nncostfuncton (Nn_params, Hidden_layer_size, num_labels, ...% X, y, Lambda) computes the Cos T and gradient of the neural network. the% parameters for the neural network is "unrolled" into the vector% nn_params and need to being converted back into th E weight matrices. % The returned parameter grad should be a "unrolled" vector of the% partial derivatives of the neural network.%% Res Hape Nn_params back to the parameters Theta1 and Theta2, the weight matrices% for our 2 layer neural networkTheta1 = Res Hape (Nn_params (1:hidden_layer_size * (input_layer_size + 1)), ... Hidden_Layer_size, (input_layer_size + 1)); Theta2 = Reshape (Nn_params ((1 + (Hidden_layer_size * (input_layer_size + 1)): End), ... num_labels, (Hidde N_layer_size + 1));% Setup some useful variablesm = Size (X, 1); % need to return the following variables correctly J = 0; Theta1_grad = zeros (Size (Theta1)); Theta2_grad = zeros (Size (Theta2));% ====================== YOUR CODE here ======================% instructions:you Should complete the code by working through the% following parts.%% part 1:feedforward the neural network A nd return the cost in the% variable J. After implementing Part 1, you can verify this your% cost function computation are correct by verifying the cost% Computed in ex4.m%% part 2:implement the backpropagation algorithm to compute the gradients% Theta1_grad a nd Theta2_grad. You should return the partial derivatives of% the cost function with respect to THETA1 and Theta2 in Theta1_grad a nd% Theta2_grad, respectively. After implementing Part 2, you can check% that your implementation are correct by running checknngradients%% Note:the Vector y passed into the function was a vector of labels% containing values from 1..K. You need to map this vector into a% binary vector of 1 ' s and 0 's to being used with the neural network% Cost function.%% Hint:we recommend implementing backpropagation using a for-loop% over the Training examples If you is implementing it for the first time.%% part 3:implement regularization with T He cost function and gradients.%% hint:you can implement this around the code for% backpropagation. That's, you can compute the gradients for% the regularization separately and then add them to theta1_grad% And Theta2_grad from part 2.%j_tmp=zeros (m,1), for I=1:m Y_vec=zeros (num_labels,1); Y_vec (Y (i)) = 1; A1= [Ones (1, 1) X (i,:)] '; Z2=THETA1*A1; A2=sigmoid (Z2); A2=[ones (1,size (a2,2)); A2]; Z3=THETA2*A2; A3=sigmoid (Z3); HTHETAX=A3; J_tmp (i) =sum (-y_vec.*log (Hthetax)-(1-y_vec). *log (1-hthetax)); Endj=1/m*sum (j_tmp); j=j+lambda/(2*m) * (SUM (SUM (Theta1 (:, 2:end). ^2)) +sum (SUM (THETA2 (:, 2:end). ^2));D elta1 = Zeros (Hidden_layer_size, ( Input_layer_size + 1));D elta2 = Zeros (Num_labels, (hidden_layer_size + 1)); for T=1:m Y_vec=zeros (num_labels,1); Y_vec (Y (t)) = 1; a1 = [1 X (T,:)] '; Z2=THETA1*A1; A2=sigmoid (Z2); A2=[ones (1,size (a2,2)); A2]; Z3=THETA2*A2; A3=sigmoid (Z3); Delta_3=a3-y_vec; Gz2=[0;sigmoidgradient (z2)]; Delta_2=theta2 ' *delta_3.*gz2; Delta_2=delta_2 (2:end); Delta2=delta2+delta_3*a2 '; Delta1=delta1+delta_2*a1 '; endtheta1_grad=1/m*delta1; THETA2_GRAD=1/M*DELTA2; Theta1 (:, 1) = 0; Theta1_grad=theta1_grad+lambda/m*theta1; THETA2 (:, 1) = 0; theta2_grad=theta2_grad+lambda/m*theta2;%-------------------------------------------------------------% =========================================================================% unroll Gradientsgrad = [Theta1_grad (:); Theta2_grad (:)];end
Finally, to summarize, for a typical neural network, the training process is as follows:
Following this procedure, we can obtain the parameter theta of the neural network.
Reprint Please specify source: http://blog.csdn.net/u010278305
Learning Notes for machine learning (II): Neural networks