Machine learning-Logistic regression

Last Update:2017-05-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are many classification problems in real life, such as normal mail/spam, benign tumors/malignant tumors, recognition of hand writing and so on, which can be solved by logistic regression algorithm.

One or two classification problems

The so-called two classification problem, that is, the result has only two classes, Yes or No, so the result {0,1} sets to represent the range of values for Y.

As mentioned before, the model of linear regression is H (x) =θ0+θ1x1+θ2x2+ ..., the value of this regression model is in the whole natural number space, for 0, 1 problems, we must find a way to compress the model value between 0~1, here we introduce a sigmoid function: g (z) =1/( 1+E-Z)

So hθ (x) =g (ΘTX), which means that for a given x value, Y takes 1 probability, that is P (y=1|x), our task is to use the existing sample data set to find a set of parameters θ, to get an introduction to the distribution function P (y=1|x).

Sub-Interface

For the classification problem, there should be a boundary to distinguish, but we get the value through hθ (x) =g (ΘTX) is the values between [0,1], then we think when G (ΘTX) >=0.5, Y=1, that is, probability 0.5 corresponds to the demarcation point, at this time θtx=0.

Cost function

The significance of the cost is to solve θ, to measure the gap between the model hθ (x) and the truth value, to minimize it and to obtain θ.

For the linear regression model, we use the squared form of difference to represent the cost function, but this form is not applicable to the logistic regression model, we introduce the logarithmic function here

Log this is a wonderful function, hθ (x) value is [0,1],

If y=1, the cost function is-log (hθ (x)) and the value is [0,+∞]. At this time if hθ (x) →0,-log (hθ (x)) →+∞, that is, the cost function →+∞, conversely, the cost function →0.

Similarly, in the case of y=0. Therefore, the cost function in the form of logarithmic function shows the difference between the predicted value of the model and the truth. To further simplify the model, the following functions can be used to cover this segmented function at the same time

Cost (hθ (x), y) =-log (hθ (x) y)-log ((1-hθ (x)) (1-y))

So, for a dataset of M samples, we can use the following function to represent its cost function average (i.e. empirical risk)

The best model is to calculate a set of θ values so that J (θ) is the smallest, and the gradient descent method can be used here as well, and it is amazing that the gradient function here is the same as the linear regression model. I have specifically proved that interested students point here: Machine learning-logic regression gradient descent formula derivation

In NG video, an advanced algorithm for calculating the minimum value of the computational cost function is also introduced, which is not expanded here.

II. Multi-classification issues

In fact, in addition to the Yes No classification problem, there are many multi-classification problems, it is typical to recognize the Arabic numerals, from 0-9 a total of 10 numbers. The solution is similar, but one dimension more.

For the two classification problem, θ is a vector, a set of numbers, the problem contains only one model, and the resulting result is a probability value.

For the multi-classification problem (assuming there is k class), θ is a (n+1) *k matrix, quite a combination of K two classification problems, including K models, the final result is a k-dimensional vector, K probability value, which is the largest description of which category.

So how do you get this matrix θ? Computed in one column and one column in a loop.

Take the 0-9-digit handwritten figure in the NG class as an example, there are 10 categories. With the pixel value the most input parameter, false with M samples, each sample corresponding to the Y value is one of 1-10 (here with y=10 instead of y=0).

To build such a cycle,

For I=1 to 10

The y of all y=i in the sample is 1 and the remainder is 0, which becomes a two classification problem, and y in the sample is not 0 or 1

Find the corresponding θ vector

End

By combining all the vectors into a matrix, the result of hθ (x) is the vector of the 10*1, for example, the third value is the largest, indicating that the model thinks that hand writing is the most probable probability of 3.

Machine learning-Logistic regression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine learning-Logistic regression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine learning-Logistic regression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support