According to Andrew Ng's course, h (x, theta) = P (y = 1 | X, theta) indicates the probability.
Logistic regression (Logistic regression) is a common machine learning method used in the industry to estimate the possibility of something. For example, the possibility of a user purchasing a product, the possibility of a patient suffering from a disease, and the possibility of an advertisement being clicked by the user. (Note: "probability", not the "probability" in mathematics. The result of logisitc regression is not the probability value defined in the mathematics, and cannot be directly used as the probability value. This result is often used for weighted summation with other feature values, rather than directly multiplying them)
What kind of things does it mean, and what are the applicable and inapplicable situations?
I. official definition:
,
Figure 1. The logistic function,ZOn the horizontal axis andBytes(Z) On the vertical axis
Logistic regression is a method for learning the F: X −> Y equation or P (Y | X). Here, Y is a discrete value, x = <X1, x2 ..., XN> is a discrete or continuous value of each variable in any vector.
Ii. My explanations
It is too painful to look at the formula. Just let us talk about it separately. Logistic Regression has three main components: regression, linear regression, and logsitic equations.
1) Regression
Logistic regression is a linear regression, and linear regression is a regression. So what about the return of Xiami?
Back
It is actually an estimation of the unknown parameters of known formulas. For example, the formula is y = a * x + B, and the unknown parameters are a and B. We now have a lot of real (x, y) data (training samples ),
Regression uses the data to automatically estimate the values of A and B. The estimation method can be simply understood as that, after a training sample point and a known formula are given, the machine will automatically
All possible values of enumeration parameters (different combinations of multiple parameters must be enumerated) until the parameter (or parameter combination) that best matches the distribution of sample points is found ). (Of course, there are some optimization algorithms in the actual operation.
Will not be enumerated)
Note: The premise of regression is that the formula is known, otherwise the regression cannot be performed. In real life, where is the known formula?
What happens to me after my head is reached? Haha), so the formulas in regression are basically the data analyst's guess after reading a large amount of data (in fact, most of them come up with a head shot, um ...). Based on these
For different formulas, regression is classified into linear regression and nonlinear regression. In linear regression, formulas are all "once" (mona1 equation, binary one equation...), while non-linearity can have various forms (N yuan n ).
Process, log equation, etc ). The specific example is described in linear regression.
2) Linear Regression
The simplest one-dimensional Variable
Example: Suppose we want to find a rule between Y and X, where X is the price of the shoes, and Y is the sales volume of the shoes. (Why is this rule? In this way, we can help with pricing to make more money.
). We know that some sales data (x0, y0), (x1, Y1),... (Xn, yn) from previous years are used as sample sets, and assume that they meet linear correlation.
System: Y = A * x + B (the specific values of A and B are not sure). Linear regression is used to find the optimal values of A and B based on previous years, make y = a * x + B in
The error in a sample set is the smallest.
Maybe you will think --- dizzy! This is easy! What kind of regression is required! I draw an XY Coordinate System on the grass paper by myself. I can draw it at a few points.
Come! (Well, I admit that we were tortured by such drawing questions in junior high school ). In fact, the mona1 variable is really intuitive, but it is hard to see it intuitively if it is a multivariate variable. For example, in addition to the price of the shoes
Quality, advertising investment, and the flow of people in the block where the store is located will affect sales. We want to get this formula: vertex = A * x + B * Y + C * z + D * ZZ + E. This
In this case, it is hard to find the drawing rules, so we can leave it to linear regression. (For more information about linear regression, see the relevant literature. These are mathematical formulas.
Use it as a program command ). This is the value of Linear Regression Algorithms.
It should be noted that the premise that linear regression can achieve good results here is
Y = a * x + B makes sense at least in general (because we think the more expensive the shoes, the less the shoes we sell, and the more cheaper the shoes we sell. In addition, there are similar rules for shoe quality, advertising investment, and passenger flow.
Law), but not all types of variables are suitable for linear regression. For example, X is not the price of a shoe, but the size of a shoe, b), the error rate will be extremely high (because
If the size is too large or the size is too small, the sales volume will be reduced ). In short:If our formula is wrong, no regression will produce good results.
3) Logistic Equation
Upper
In fact, our variance is a specific real value. However, in many cases, we need to return a value of 0 ~ which is similar to the probability value ~ The value between 1 (for example, can a pair of shoes be sold today? Or a wide range
Can I be clicked by users? We hope to get this value to help us decide whether the shoes will not be shelved or displayed in the Advertisement Exhibition ). The value must be 0 ~ 1, but the limit obviously does not meet the requirements of this range.
Therefore, Logistic equation is introduced for normalization. It is explained again that this value is not a probability value defined in mathematics. So since what we get is not a probability value, why do we have to pay for it?
The value is normalized to 0 ~ Between 1? The advantage of normalization is that the values have the boundary of comparability and convergence, so that when you continue computing on them (for example, you don't only care about the sales volume of the shoes, but want to sell the shoes.
Weighted sum of multiple factors, such as energy, local security, and local transportation costs, and determines whether to open a shoot store in this region using the comprehensive addition result ), normalization ensures that the results obtained this time will not be too large because of the boundary.
/Is too small to overwrite other feature or be overwritten by other feature. (For example, if the minimum sales volume of shoes is 100, but the best time is to sell more than one shoes, the local security
The condition is 0 ~ The value between 1 is completely ignored if the two are directly summed and the security condition is ignored. This is the main reason for using Logistic regression rather than direct linear regression. Here, you may have
I began to realize that, yes,Logistic regression is a linear regression after the Logistic equation is normalized.
To
This normalization method is often more reasonable (people say they are called logistic) and can suppress results that are too large or too small (
To ensure that the mainstream results will not be ignored. For specific formulas and figures, see section 1. Official definitions. F (x) is the real number of values in the preceding example, and Y is
The result is 0 ~ The value of the possibility of selling between 1. (This section "possibility" is not "probability". I would like to thank zjtchow for pointing out in his reply)
Iii. Applicability of Logistic Regression
1) it can be used for Probability Prediction or classification.
And
Not all machine learning methods can predict the likelihood probability (for example, SVM does not work, and it can only get 1 or-1 ). The advantage of probability prediction is that results are comparable: for example, we get different ad targets
After the possibility of clicking, You can see N of the most likely clicks. In this way, even if the probability is high or the probability is low, we can get the optimal topn. Only
You need to set a threshold. The possibility is higher than the threshold, and the threshold is lower than the threshold.
2) It can only be used for linear Problems (I am not sure)
Only
Logistic regression can be used only when feature and target are linearly related (unlike SVM ). This has two guiding significance:
When you know the non-linearity of the model in advance, do not use Logistic regression. On the other hand, pay attention to the selection when using Logistic regression.
Feature with a linear relationship with target.
3) Each feature does not need to meet the condition independent hypothesis, but the contribution of each feature is calculated independently.
Bytes
Unlike Naive Bayes, Logistic Regression must meet the independent conditional hypothesis (because it does not evaluate the posterior probability ). However, the contribution of each feature is calculated independently, that is, LR will not automatically help you
Different Features in combine generate new feature (this fantasy cannot be held at all times. It's a decision tree, lsa, plsa, Lda, or something you want to do yourself.
Situation ). For example, if you need a feature such as TF * IDF, you must give it clearly. If you only give two-dimensional TF and IDF respectively, it will only get the class
Similar to the result of a * TF + B * IDF, it does not have the effect of C * tF * IDF.
From http://hi.baidu.com/hehehehello/item/40025c33d7d9b7b9633aff87