Logic Regression _ Machine learning

Source: Internet
Author: User

"The paper on the end of the light, do not know this matter to preach." The recent use of logical regression (LR) in the work of classification, using the Sklearn in the existing algorithm package. In this process encountered some problems, stimulated my interpretation of the Sklearn source code, which has a further understanding of the logic of regression, some of the pits left before also have a new understanding.

The Bowen first from the linear regression, so that the transition to LR, and finally the Sklearn to implement the LR source code to do some explaining. Here are two points to note:

First: A lot of information about LR is essentially linear regression, but it is only so described, and did not go into depth.

Second: LR from the formal view is non-linear, how to understand the LR is a linear classifier, this is also the article to be said.

Third: Under normal circumstances, taking into account the feasibility of the project implementation, the actual application of the algorithm package and the theory is not the same, here is the key to explain the Sklearn two classified LR implementation method.


1. Linear return

Linear regression is a simpler algorithm in machine learning (ML), and we focus on the simple mathematical ideas and intuitive explanations behind them, followed by mathematical deduction. Linear regression Popular point is: In the two-dimensional plane there are a pair of points, we can find a line to fit (measure) These points, if the line is found to be a straight line, then the process is a linear regression, this concept can be extended to high-dimensional space. We cite an example of housing prices, the price of a house will be affected by a number of factors, such as area, location, direction, the number of rooms, such as independent variable, or predictor, in ML, generally become a feature (feature) The price of a house becomes dependent variable, or response, commonly called a label in ML. The price of the house here is a continuous value, that is to say, the price of the house can be changed between the value of continuous, this and LR is different. Therefore, the prediction of house prices can be seen as a linear regression process, how to proceed, the simple idea behind is: The price of the house is affected by a number of factors, and each factor on the price of the house is different, then the price of the house should be the weighted sum of these factors (fearture). This is the simple idea behind the linear regression, this idea is also used for LR, but the LR label does a lot of transformation. Can be expressed in mathematical formula as follows:


Where y is the price of the House (label), X1,X2, etc. is the house of feature, such as area, lots, etc., W1,W2, etc. are the weights of each feature, W0 is the adjustment parameters. Next, if you know the specific value of parameters such as W0,W1,W2, then for a house to be sold, you can predict its price, the problem becomes how to find the values of these parameters. This will require some knowledge of mathematics.

The feature of a house is generally a vector to represent


where x (1) Represents a sample, is the house that needs to predict the price, {x1_1, x1_2, ...} This is the sample corresponding to the feature, such as area, location and so on. The price of a house is expressed in a scalar.


Y (1) represents the price corresponding to the sample X (1). The estimation of parameters requires the use of past housing sales records to fit these data. For the expression of convenience, suppose the House feature only one, the area of the House x1, the corresponding weight of W1. The table below is a list of sales prices for a number of houses


As you can see, prices are changing as the size of the house changes. These points are depicted on a two-dimensional plane, as follows


Linear regression is to find a straight line to fit this writing point. The line found is the W1 of the weight of the face of the house as a parameter of the line, i.e.


The equation for the red line in the figure is



What is the exact value of the parameter w1? And from the perspective of the always intuitive explanation. According to the data in the house sales record, the line is drawn into one in fact, to find a straight line, is all the points in this line, when predicting the price of a house, only need to find the house area in the line corresponding to the ordinate can be, this time the parameters of the line is also easy to solve (calculate a slope,  Find another point to solve it. However, this is very difficult in practice, because the price of the house is volatile, and not strictly located in a straight line, but in the vicinity of the line distribution, as above. At this point, you need to reduce the need to deal with straight lines: the points are not needed in the line, as long as all the points are as close to the line as possible.


For example, in the above picture, the red line is better than the black line fitting, because the data points fluctuate as much as possible on the red line. The idea is expressed in mathematical language: find a straight line so that the data points are the smallest distance from the line. From the image above you can also see that the green data point to the red line and the distance between the sum is significantly smaller than the black line, so in the fitting degree, the red Line better. The idea is further translated into mathematical formulas, as follows:


of which, F (xi) =w0+w1*x1. Here a replacement is used, which is about to


The main purpose of the substitution is that the absolute number is more difficult to deal with in the calculation, after the substitution of the square number, the search for a straight line from the search for the sum of the minimum to find the sum of the distance, because the distance is positive, so the two in the expression of the coincidence degree of the

How to solve the above formula, involving derivation, gradient descent and other optimization algorithms, here do not repeat, detailed can see the relevant blog [1].


2. Logistic regression

Logical regression and linear regression, on the surface, the difference is: the logical regression of the label is a discrete real number (such as 0, 1), and the linear regression of the label is a continuous real number. This difference leads directly to the difference in the form of the formula, but the idea behind them is the same: using the weighted sum of the feature to predict the value of the label, and the logical regression does a lot of deformation, making the label of the direct prediction and the form of the formula is very different from the linear regression. The reason for this is that some computational needs in mathematics, but still can find a visual explanation behind. The next three sections explain the logical regression. 1. Linear regression label changes with feature

The formula of linear regression

Y=w0+w1*x1+w2*x2

In the upper formula, when the coefficients are w0,w1,w2 fixed, the value of Y is changed continuously when the x1,x2 continuously changes.





2. How to get the training set of logistic regression

Now examine the question of how the sample used to train the logistic regression model is obtained. Suppose the training sample set is d={(X1,y1=1), (x2,y2=1), (x3,y3=0), (X4,y4=1), (x5,y5=0) ...}, you can see that the label is discrete, not 0 or 1. This training set D is the data we know, which can be said to be a posteriori. If there is a classifier now, input data XI, get Yi,yi may be 0 or 1, when the input x1,x2,x3,x4,x5, the resulting label is 1,1,0,1,0, sometimes 1, sometimes 0, unpredictable, with a certain randomness, This can be understood as: the classifier each output output may be 1, may also be 0, with a probability. The training set is the result of the failure to predict the situation, grab a group of data, input to the classification of the resulting collection of results. Based on this result set, the output of the classifier is modeled.

Because the classifier has a certain probability line for each classification result, so, in modeling the classification, each output can be interpreted from a probability perspective: the output of the classifier is actually a continuous probability value p, which varies between [0,1], and each time we see the discrete values 0 and 1, In fact, this is the P made a judgment, such as if the p>0.5, then judge the input data is 1, otherwise 0. In this way, the output of the classifier is 0 and 1 as we see it.

A step closer, set "we see Output 1" as event A, we can see the classifier output of the continuous probability value p as the probability of event A, if this probability p>0.5, then event a occurs, otherwise event A's opposite event occurs, that is, we saw the output is 0.



3. How to realize the transition from discrete value to continuous value in logistic regression

If we modeled directly on the label we see, using a form similar to linear regression, then the left side of the equation is discrete, and the right side of the equation is not equal at all, and this method is unworkable.

Next, consider modeling the continuous probability value of the output that we can't see, that is, the output of the classification.

p = w0 + w1*x1 + w2*x2 + ...

In this case, although both sides of the equation are contiguous, but the range is different, p changes from 0 to 1, and the right-hand formula can change from positive infinity to negative infinity, and we can't see it and parse it to write its expression, so this method is not feasible.

4. Logical regression of mathematics pushed to









Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.