About Andrew Ng's machine learning course, there is a chapter devoted to logistic regression, and the specific course notes are in another article.
Here is a simple summary of logistic regression:
Given a sample to be classified x, using the logistic regression model to determine the class of the input sample, it is necessary to do the following two steps:
① calculates the value hθ (x) of the logical regression hypothesis function, where n is the characteristic dimension of the sample
② if hθ (x) >=0.5, then x enters the positive class, otherwise x is a negative class
or directly using discriminant boundary to judge, namely: if θ ' x>=0, then x input positive class, otherwise, x belongs to negative class
Therefore, logistic regression can only solve two kinds of classification problems.
************************
A very important problem here is that before classifying a classified sample, it is necessary to use the training sample to solve the parameter θ=[θ1,θ2, ..., partθn] of the logistic regression model.
The optimal model parameters are the parameters that make the cost function get the minimum value, so as long as the optimization is used, the parameters that make the cost function have the minimum value are found.
(1) The initial value of the given parameter theta
(2) using function Fminunc to optimize the cost function to obtain the optimal theta value
This function requires input of the initial value of the theta, the calculation function of the gradient and cost function, and other settings
(3) After the optimization of the theta value, for a sample to be classified, the calculation of its hypothetical function value or the determination of the value of the boundary, so that its classification can be distinguished
************************
The following starts with the Softmax regression problem
Section 0 Step: Initialize Parameters
① dimension of sample feature N (inputsize)
② number of sample categories K (numclasses)
③ Attenuation Item Weights (Lambda)
First step: manned mnist Data set
① Loading the Mnist dataset
② Loading tags for mnist datasets
In the tag set, label 0 represents the number 0, for subsequent processing convenience, the label of the number 0 is changed to 10
Note 1: In the experiment, sometimes for debugging convenience, in the debugging, you can add synthetic data;
Note 2: In the program folder, in order to classify conveniently, the Mnist dataset and the Mnist data set operation functions are stored in the Mnist folder, before use, add the statement Addpath mnist/can;
initial value of randomly generated parameter theta
Its dimension is k*n,k is the number of categories Numclasses,n is the feature dimension of the input sample Inputsize
Step Two: Write the Softmax function
The function is to calculate the cost function and the gradient of the cost function.
(1) Calculating the cost function
The ① cost function is calculated as follows:
The Vectorization calculation method in the ② program is as follows:
(2) Calculate gradient
The ① gradient is calculated as follows:
In the ② program, the Vectorization method is calculated as follows:
Step three: Gradient Test
After the Softmax function is written for the first time, the gradient test algorithm is used to verify the correctness of the written softmaxcost function. Call checknumericalcradient function directly
Fourth Step: Learning the regression parameters of Softmax using the training sample set
Fifth Step: For a sample to be classified, the Softmax regression model is used to classify it.
On the derivation of the formula for vectorization ******
(1) The form of a known parameter
The form of the ① parameter θ
② the form of input data x
③ The product form of θx , recorded as Matrix M
(2) Cost function for a single sample
① cost function for a single sample
That
② hypothetical function of a single sample
However, in practice, in order to avoid the overflow of the computation, we need to make the following adjustments to each of the hypothetical functions.
which
Then there are:
(3) Cost function form samples
Cost function for ① m samples
Hypothetical function of ② m samples
which
(4) Calculate gradient
① gradient calculation of a single sample
Note 1:
NOTE 2:
Calculation of ②m sample gradients
Then there are:
Softmax Regression of UFLDL tutorial