Machine learning-Logistic regression

Source: Internet
Author: User

There are many classification problems in real life, such as normal mail/spam, benign tumors/malignant tumors, recognition of hand writing and so on, which can be solved by logistic regression algorithm.

One or two classification problems

The so-called two classification problem, that is, the result has only two classes, Yes or No, so the result {0,1} sets to represent the range of values for Y.

As mentioned before, the model of linear regression is H (x) =θ0+θ1x1+θ2x2+ ..., the value of this regression model is in the whole natural number space, for 0, 1 problems, we must find a way to compress the model value between 0~1, here we introduce a sigmoid function: g (z) =1/( 1+E-Z)

So hθ (x) =g (ΘTX), which means that for a given x value, Y takes 1 probability, that is P (y=1|x), our task is to use the existing sample data set to find a set of parameters θ, to get an introduction to the distribution function P (y=1|x).

Sub-Interface

For the classification problem, there should be a boundary to distinguish, but we get the value through hθ (x) =g (ΘTX) is the values between [0,1], then we think when G (ΘTX) >=0.5, Y=1, that is, probability 0.5 corresponds to the demarcation point, at this time θtx=0.

Cost function

The significance of the cost is to solve θ, to measure the gap between the model hθ (x) and the truth value, to minimize it and to obtain θ.

For the linear regression model, we use the squared form of difference to represent the cost function, but this form is not applicable to the logistic regression model, we introduce the logarithmic function here

Log this is a wonderful function, hθ (x) value is [0,1],

If y=1, the cost function is-log (hθ (x)) and the value is [0,+∞]. At this time if hθ (x) →0,-log (hθ (x)) →+∞, that is, the cost function →+∞, conversely, the cost function →0.

Similarly, in the case of y=0. Therefore, the cost function in the form of logarithmic function shows the difference between the predicted value of the model and the truth. To further simplify the model, the following functions can be used to cover this segmented function at the same time

Cost (hθ (x), y) =-log (hθ (x) y)-log ((1-hθ (x)) (1-y))

So, for a dataset of M samples, we can use the following function to represent its cost function average (i.e. empirical risk)

The best model is to calculate a set of θ values so that J (θ) is the smallest, and the gradient descent method can be used here as well, and it is amazing that the gradient function here is the same as the linear regression model. I have specifically proved that interested students point here: Machine learning-logic regression gradient descent formula derivation

In NG video, an advanced algorithm for calculating the minimum value of the computational cost function is also introduced, which is not expanded here.

II. Multi-classification issues

In fact, in addition to the Yes No classification problem, there are many multi-classification problems, it is typical to recognize the Arabic numerals, from 0-9 a total of 10 numbers. The solution is similar, but one dimension more.

For the two classification problem, θ is a vector, a set of numbers, the problem contains only one model, and the resulting result is a probability value.

For the multi-classification problem (assuming there is k class), θ is a (n+1) *k matrix, quite a combination of K two classification problems, including K models, the final result is a k-dimensional vector, K probability value, which is the largest description of which category.

So how do you get this matrix θ? Computed in one column and one column in a loop.

Take the 0-9-digit handwritten figure in the NG class as an example, there are 10 categories. With the pixel value the most input parameter, false with M samples, each sample corresponding to the Y value is one of 1-10 (here with y=10 instead of y=0).

To build such a cycle,

For I=1 to 10

The y of all y=i in the sample is 1 and the remainder is 0, which becomes a two classification problem, and y in the sample is not 0 or 1

Find the corresponding θ vector

End

By combining all the vectors into a matrix, the result of hθ (x) is the vector of the 10*1, for example, the third value is the largest, indicating that the model thinks that hand writing is the most probable probability of 3.

Machine learning-Logistic regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.