Recently reviewed the machine learning knowledge, here to summarize, there are many online, most of them are not full or wrong. The following is mainly to see me briefly summarize the knowledge points of regression analysis.
1. Summary of Contents
(1) Linear regression
(2) Logistic regression (logistic)
(3) Maximum likelihood estimation
(4) Gradient descent
2. Linear regression
(1) We used to learn the linear function y=a*x+b in junior high School, we all know that this is a linear function, and its image is a straight line. As shown: (This is actually a single variable case in machine learning.) )
(2) If we extend a variable of the above dimension to a multidimensional variable, we will form the formula as follows:
Y=wt*x +b Here w is the column vector, and X is the column vector. Expands to represent the following:
Y=w0+w1*x1+w2*x2+w3*x3+...+wn*xn =∑WTX
The book is written in the above: then we will follow the book to the good.
We know that in linear regression, the problem to be solved is to construct a linear model based on the existing samples, and then use the model to predict unknown samples. So let's analyze this: the main problem with linear regression is how to construct such a linear model. If you want to build a linear model, is it essential that you just get a set of θ T? Yes. So how do you get a set of θ T?
(3) Use likelihood function to push and export least squares
So now that we have a training sample of M, we can assume that our model looks like this:
I represents the error between 1 and M samples, and ε represents the actual sample. We know that the error ε should be a Gaussian distribution with a mean value of 0 and a variance of σ2. Then we can write the error in the form of a Gaussian function as follows:
The generation of our model into the distribution function of the error is as follows:
The logarithm of the error distribution function above is obtained, and the likelihood function is calculated as follows:
According to the maximum likelihood estimation, to make our likelihood function the largest, that is, we should be the following minimum equation.
And look at the formula above, isn't it our least squares? Yes. That's it. The whole problem is now transformed to how to solve the following minimum equation:
(4) Method of solving least squares
1) can be directly solved
The value of θ can be obtained directly from the above method.
2) Gradient descent
i) initialize θ (randomly initialize θ)
II) iterate in the direction of the negative gradient, with the following formula:
III) gradients can be obtained by the following formula:
Note: The gradient of the linear regression is exactly the opposite of the gradient of the logistic regression.
3. Logistic regression (logistic)
In fact, logistic regression is essentially a phenomenon regression, and its main difference from linear regression is that the value of the phenomenon model is put into the SIGMOD function. The rest is no different. So let's take a look at what the Sigmod function really is:
(1) Sigmod function
1) Expression:
2) This is the image of it:
As can be seen from the image, the SIGMOD function can give a value between [0,1] for any response. Therefore, logistic regression is only to deal with two kinds of problems, if we need to deal with many kinds of problems is our Softmax regression, will be introduced later.
(2) Model of logistic regression
As we can see from this equation, it simply θtx the linear model into the SIGMOD function.
The derivation of the SIGMOD function we can get as follows:
(3) The parameter estimation of logistic regression, according to the maximum likelihood estimation, has the following derivation:
Note: The gradient of the linear regression is exactly the opposite of the gradient of the logistic regression.
(4) Iteration of the parameter (the logistic regression is iterative in a square direction along the gradient of the likelihood function )
1) The parameter learning rules for logistic regression are as follows:
2) Compare the iterative rules for linear regression and logistic regression parameters: They are the same in the form of formulas. (left edge regression, right logistic regression)
4. The difference between linear regression and logistic regression
(1) different points
i) the linear regression is iterative in the negative direction of the gradient along the likelihood function, and the logistic regression is iterative on the square of the gradient of the likelihood function.
II) linear regression solves the problem of regression, and logistic regression solves the problem of classification.
(2) Same point
i) the iteration formula for the parameter is the same
II) is essentially linear regression
5. Logistic regression model--softmax regression model for solving multi-class problems
(1) The Softmax regression model is actually extended to the logistic regression model, and the Softmax model is as follows:
where k represents the K-class problem, P (c=k|x;θ) represents the corresponding parameter θ, for the probability that the sample X is K class.
Pay attention to observing Softmax's formula model, discovering that it is actually a model for each category that corresponds to exp (ΘTKX), by training the probability of a model such as K, by calculating each model. Finally, the result is classified as the model of the category with the greatest likelihood. This is the essential thought of Softmax.
(2) to look at the likelihood function and logarithmic likelihood function of Softmax
1) Likelihood function
2) Logarithmic likelihood function
(3) Iteration of the parameter
1) Calculate the gradient of the logarithmic likelihood function, which is the gradient of the θk, that is, how many categories, corresponding to how many gradient, that corresponds to how many θ update formula.
2) Update formula
Question: Why is the iteration of the linear regression parameter the negative direction of the gradient, the logistic regression and the Softmax regression are the positive direction of the gradient?
A: In fact, this is because the parameter estimation used is the cause of the maximum likelihood estimation. In linear regression, to make the logarithm likelihood function maximum must be the least squares of the minimum, and that the formula is a convex function, the minimum value of the convex function is to iterate along the gradient descent method, in the logistic regression and Softmax regression, in order to make the logarithm likelihood function maximum, That is, after the derivation of the logarithmic likelihood function, the direction of the gradient rise to find the maximum value.
The return of machine learning