Machine Learning Study Notes (1)--linear regression and logistic regression

Source: Internet
Author: User

"Total Catalog" http://www.cnblogs.com/tbcaaa8/p/4415055.html

1. Gradient Descent method

Gradient Descent method is an algorithm used to find the minimum value of a function. The idea of the algorithm is very simple: take a small step each time along the opposite direction of the current gradient and repeat the process. Examples are as follows:

[Example] use the gradient descent method to find the minimum value.

First step: solve the iterative format. According to the idea of "taking one small step at a time in the opposite direction of the current gradient",

Step Two: Select the initial value of the iteration. The initial values are generally optional, but the appropriate initial values help to improve the convergence rate. In this example, select

The third step: iterative solution based on iteration format and initial value. The iterative process is as follows:

K X (k) Y (k) Z (x (k), Y (k))
0 1.00 1.00 2.7000
1 0.40 0.20 2.0640
2 0.16 0.04 2.0083
3 0.06 0.01 2.0013
4 0.03 0.00 2.0002
5 0.01 0.00 2.0000
6 0.00 0.00 2.0000

Conclusion: It can be found that the algorithm converges after the 6th iteration. The minimum value to be calculated is 2.

How does the gradient descent algorithm make convergence judgment? A common method is to determine whether the absolute value of the change in target values is small enough in the next two iterations. Specific to the above examples, is to determine whether the establishment. is a positive real number that is small enough to be selected according to the desired precision, in this case.

It is important to note that the gradient descent method is likely to fall into the local optimal solution. It can be improved by randomly selecting initial values and increasing the number of impulses, which may be covered in subsequent articles in this series.

2. Linear regression

Linear regression is a regression analysis that models the relationship between an independent variable and a dependent variable, and the regression function satisfies the following form:

  

We use the number of representations of data, the number of dimensions that represent the data, and the arguments and dependent variables that represent the group of data, using the first component that represents the set of data arguments. The derivation process is based on the following assumptions:

That is, the error items of each set of data are independent of each other, and are subjected to a normal distribution with a mean value of 0 and variance. In turn, we can get the likelihood function:

Logarithmic likelihood function:

Simplification, you can get:

Define the loss function:

To make the likelihood function maximum, simply minimize the loss function. We use the minimum value of the loss function instead of the smallest value, only for each partial derivative:

Finally, iterative solution using gradient descent method:

Among them, for the learning rate, is a constant greater than 0. The learning rate should be chosen carefully, leading to the algorithm not convergence, too small will lead to slow convergence. In practical application, the learning rate can be adjusted according to the specific situation. There is data to show that at that time, the above algorithm converges. Because it is difficult to calculate efficiently, it is often used instead.

3. Logistic regression

The linear regression model is no longer suitable when the dependent variable can only be evaluated in {0,1}, because the presence of extreme data makes the selection of the threshold difficult. We can use logistic regression to model the data. The regression function satisfies the following form:

which

The sigmoid function has the following properties:

The derivation process is based on the following assumptions:

Considering the particularity of the value, the above hypothesis is equivalent to the following form:

The likelihood function is then obtained:

Logarithmic likelihood function:

Simplification, get:

Define the loss function:

To make the likelihood function maximum, simply minimize the loss function. We use the minimum value of the loss function instead of the smallest value, only for each partial derivative:

Simplification, get:

Finally, iterative solution using gradient descent method:

meaning ibid.

Machine Learning Learning Notes (1)--linear regression and logistic regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.