Logistic regression and gradient descent

Last Update:2016-04-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, linear regression (direct)

As shown, judging by the tumor size data. The hypothesis function is based on the ability to see that the linear h (x) can effectively classify the above data, when H (x) >0.5, then the tumor patient, when H (x) <0.5, is normal. But the linear model will have one of the following conditions

At this time by adjusting the parameters of the linear model, the resulting linear model is a blue line, it will be found that the right side of the Red Cross is predicted to be normal, which is obviously unreasonable, and the consequences are serious (others sick, you predict normal, affect treatment ...) ), in addition to the two classification as an example, suppose label={0,1}, but we use the linear model eventually to predict Y may be very large or very small, which is obviously unreasonable. This introduces the so-called logistic regression (logistic regression).

2logistic regression (Logistic regression)

Logistic regression is actually changing our hypothesis (as shown)

θ0+θ1x1+θ2x2 θ= [- 3,1,1]t, there is

Predict Y=1, if-3+x1+x2>=0

Predict Y=0, if-3+x1+x2<0

Just to be able to classify the datasets shown in the diagram nicely

There are also non-linear decision boundaries that are similar.

The cost function of the logistic regression

Recall that the costfunction of linear regression is as follows

At this point, we can no longer use the cost function of the linear model to design the cost function of the logistic regression, because it involves the gradient descent of non-convex functions (easy to get into local minima), as shown in The graph at the bottom left is the hypothesis function of the logistic regression directly using the cost function of the linear model to get the costfunction graph, because the hypothesis function of the logistic regression itself is a nonlinear one, So the final cost function in this way is definitely a non-convex function, if the gradient descent method is used to optimize the parameters, it is easy to get into the local minimum value, affecting the final classification results.

It's time to design the cost function. As shown

At first I didn't understand why so design costfunction, later see Andrew's video has a detailed explanation, the coordinates of the horizontal axis represents the H (x), the vertical shaft represents the cost function, note that the above-mentioned coordinate chart is in the case of the y=1 of the present time, when H ( x) =1 is our predicted value of 1, and Y=1 (actual tag Value =1), this time we can predict correctly, and cost function=0, corresponds to the point in the coordinate chart (1,0), the intersection of the curve and H (x) axis is (1,0), when our h (x) = 0, that is, the prediction is 0, and Y=1, that is, the actual corresponding label should be 1, this is to indicate that the judgment is wrong, there is error, corresponding to the H (X) =0, the cost tends to infinity, that is y=1 conditions, if we can predict correctly, then cost=0, corresponding curve and H ( x) axis of intersection (1,0), when predicting errors (h (x) =0), at this time the cost is approaching infinity, that is to say, we will punish this error situation, give the cost function a very large number, and then adjust when the gradient drops. Similarly, when y=0, the situation is similar. As shown (the analysis process is similar to y=1)

The cost function explained above for the logistic is from the short tutorial of Coursera, because it is a short tutorial, so Professor Andrew did not make a detailed formula deduction proof, want to watch the small partners can go to NetEase Open class to find detailed tutorials and deduction. I have taken the time to see the details of Andrew's course below to derive the specific derivation and origin of the cost function of the logistic.

First of all our hypothesis for now we make the following assumptions: that is, h (x) represents the probability of y=1 in cases where x is a random variable and theata is a parameter. Thus we can exit the maximum likelihood function (the maximum likelihood function of y under the condition that X is a random variable, theata is a parameter) is as follows

All we have to do is find a suitable theata so that the value of L (Theata) is the largest, using the knowledge of high number inside, first set L (theata) =log (L (theata)), then use the inverse gradient descent method, then step by step iteration Update the value of Theata, Thus finding the maximum L (theata) theata value. As shown below (The L (theata) here is somewhat similar to the cost function above us, which is actually it):

Notice here that the Theataj, which is surprisingly similar to our linear model, has nothing to do with the logistic? Like the liner model? In fact, this is not the case, where H (x) is not linear, this is important.

OK, to here basically the logistic already finished, follow up continue to add, there is also a more efficient method, directly through the matrix operation to get updated theata, avoid so many iterations, behind continue to add!

Reference

http://blog.csdn.net/pakko/article/details/37878837

http://blog.csdn.net/abcjennifer/article/details/7716281

Http://open.163.com/movie/2008/1/E/B/M6SGF6VB4_M6SGHM4EB.html

Logistic regression and gradient descent

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Logistic regression and gradient descent

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Logistic regression and gradient descent

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support