Machine Learning Study Notes (1)--linear regression and logistic regression

Last Update:2015-04-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Total Catalog" http://www.cnblogs.com/tbcaaa8/p/4415055.html

1. Gradient Descent method

Gradient Descent method is an algorithm used to find the minimum value of a function. The idea of the algorithm is very simple: take a small step each time along the opposite direction of the current gradient and repeat the process. Examples are as follows:

[Example] use the gradient descent method to find the minimum value.

First step: solve the iterative format. According to the idea of "taking one small step at a time in the opposite direction of the current gradient",

Step Two: Select the initial value of the iteration. The initial values are generally optional, but the appropriate initial values help to improve the convergence rate. In this example, select

The third step: iterative solution based on iteration format and initial value. The iterative process is as follows:

K	X (k)	Y (k)	Z (x (k), Y (k))
0	1.00	1.00	2.7000
1	0.40	0.20	2.0640
2	0.16	0.04	2.0083
3	0.06	0.01	2.0013
4	0.03	0.00	2.0002
5	0.01	0.00	2.0000
6	0.00	0.00	2.0000

Conclusion: It can be found that the algorithm converges after the 6th iteration. The minimum value to be calculated is 2.

How does the gradient descent algorithm make convergence judgment? A common method is to determine whether the absolute value of the change in target values is small enough in the next two iterations. Specific to the above examples, is to determine whether the establishment. is a positive real number that is small enough to be selected according to the desired precision, in this case.

It is important to note that the gradient descent method is likely to fall into the local optimal solution. It can be improved by randomly selecting initial values and increasing the number of impulses, which may be covered in subsequent articles in this series.

2. Linear regression

Linear regression is a regression analysis that models the relationship between an independent variable and a dependent variable, and the regression function satisfies the following form:

We use the number of representations of data, the number of dimensions that represent the data, and the arguments and dependent variables that represent the group of data, using the first component that represents the set of data arguments. The derivation process is based on the following assumptions:

That is, the error items of each set of data are independent of each other, and are subjected to a normal distribution with a mean value of 0 and variance. In turn, we can get the likelihood function:

Logarithmic likelihood function:

Simplification, you can get:

Define the loss function:

To make the likelihood function maximum, simply minimize the loss function. We use the minimum value of the loss function instead of the smallest value, only for each partial derivative:

Finally, iterative solution using gradient descent method:

Among them, for the learning rate, is a constant greater than 0. The learning rate should be chosen carefully, leading to the algorithm not convergence, too small will lead to slow convergence. In practical application, the learning rate can be adjusted according to the specific situation. There is data to show that at that time, the above algorithm converges. Because it is difficult to calculate efficiently, it is often used instead.

3. Logistic regression

The linear regression model is no longer suitable when the dependent variable can only be evaluated in {0,1}, because the presence of extreme data makes the selection of the threshold difficult. We can use logistic regression to model the data. The regression function satisfies the following form:

which

The sigmoid function has the following properties:

The derivation process is based on the following assumptions:

Considering the particularity of the value, the above hypothesis is equivalent to the following form:

The likelihood function is then obtained:

Logarithmic likelihood function:

Simplification, get:

Define the loss function:

To make the likelihood function maximum, simply minimize the loss function. We use the minimum value of the loss function instead of the smallest value, only for each partial derivative:

Simplification, get:

Finally, iterative solution using gradient descent method:

meaning ibid.

Machine Learning Learning Notes (1)--linear regression and logistic regression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine Learning Study Notes (1)--linear regression and logistic regression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine Learning Study Notes (1)--linear regression and logistic regression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support