Logistic Regression Introduction _ Forecast

Source: Internet
Author: User

1, the main idea of linear regression is to fit a straight line through historical data, and use this line to predict new data. (For example: The A.B class is located on both sides of a linear function)

2, there are many factors in the real world, so we need to use multivariate linear function to describe an event (result)

3. Multivariate linear function: A multivariable analysis of the relationship between the two classification observations and some influential factors (x1,x2,x3,..., xn), for example, in medicine, according to some of the patient's symptoms to determine whether it suffers from a disease.

4. Multivariate Linear regression formula:

5, sigmoid function:

By bringing the multivariate linear function z into the sigmoid function, we get the generalized linear regression model

6, the function output of sigmoid is between (0,1), the median is 0.5, so we can consider the sigmoid function as the probability density function of sample data
Because the hθ (x) output is between (0,1), it also indicates that the data belongs to a certain kind of probability, for example:
hθ (x) <0.5 indicates that the current data belongs to Class A
hθ (x) >0.5 indicates that the current data belongs to Class B

7. How to use generalized linear regression model

Considering the vector x= (x1,x2,x3,..., xn) with n independent variables, the conditional rate P (y=1| X) = P is the probability of the occurrence of an event relative to the observed amount. Then the logistic regression model can be expressed as

So the ratio of the occurrence of the event to the probability of not occurring is

This ratio is called the occurrence ratio of the event, the logarithm of which is obtained

If there are m observation samples, the observed values are Y1,y2,y3,... ym, and pi = P (yi = 1| xi) is the probability of yi=1 under given conditions, then the probability of yi=0 is P (yi = 0 | xi) = 1-PI, so the probability of obtaining a set of observations is

Because each observation sample is independent of each other, their joint distribution is the product of each edge distribution. Get the likelihood function

Then our goal is to find the maximum parameter estimation of the likelihood function, the maximum likelihood estimate is to find the parameter w0,w1,w2,w3,... WN, so that L (W) gets the maximum value, and the function L (w) is taken logarithm

The final deformation is

Where Yi is the true value

is the forecast value

8, to determine the optimal regression coefficient of the process, that is, the data set training process 4.
The steps to find the best regression coefficients are as follows:
1. List classification functions: When H (x) > 0 is Class A, H (x) < 0 is class B

(Theta refers to the regression coefficient, in practice, the result will often be a sigmoid conversion)
2. Give the error estimate function corresponding to the classification function:

(M is the number of samples)
This theta vector is the best regression coefficient vector only if a theta vector makes the above error estimate function J (θ) the minimum value.
3. The value of theta when using gradient descent method or least squares to obtain the minimum value of the error function:

Last state and previous state
For the convenience of presentation, the case of the upper-type is only one sample, in practice, a sum of multiple samples needs to be combined (unless you use the random gradient rise algorithm that follows), the error function in step 2 is added to the minus sign, so the problem can be converted to a maximum value, and the gradient descent method is converted to a gradient rise method.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.