Linear regression
Regression is the estimation of unknown parameters of a known formula. For example, the known formula is y=a∗x+b, the unknown parameter is a and B, using the multi-True (x, y) The training data is automatically estimated for the values of A and B. The estimated method is that after a given training sample point and a known formula, for one or more unknown parameters, the machine automatically enumerates all possible values of the parameter until it finds the parameter (or combination of parameters) that best matches the distribution of the sample points. That is, given the training sample, the process of fitting parameters to y= A*x + b This is a feature x two parameters a B, a plurality of sample words such as y=a*x1+b*x2+ ..., with a vector representation is y = , is n features, n parameters of the fitting problem (assuming x0 Together with the last offset ).
Logistic regression
The assumption function of logistic regression is as follows, the linear regression hypothesis function is just.
Logistic regression (logistic regression) is a machine learning method that is often used in classification in the industry to estimate the likelihood of something. This model can be simply imagined as a probability distribution that obeys the parameter θ, given the direction of x, the resulting Y value is the distribution function value, and finally, according to the size of the distribution function values to determine the classification category.
For example, the likelihood of a user buying a product, the likelihood of a patient suffering from a disease, and the likelihood that an ad is clicked by a user. (Note here: "Probability", not Mathematical "probability", the result of LOGISITC regression is not the probability value in the mathematical definition, can not be used as a probability value directly.) This result is often used for weighted summation with other eigenvalues rather than direct multiplication.
Why is logistic called regression but used for classification, for example , in many cases, we need to return a value between 0~1 that produces a similar probability value (such as whether a pair of shoes can be sold today?). Or can an ad be clicked by a user? We want to get this value to help make the decision on shoes not on the shelves, and the advertising show does not show. This value must be between 0~1, but sell obviously does not meet this interval requirement. Then the logistic equation was introduced to make normalization. Again, the value is not the probability value defined in mathematics. So now that we're not getting the probability, why do we have to do this to 0~1 the value? The benefit of normalization is that the values have a comparability and convergence boundary, so that when you continue to work on them (for example, you are not only concerned about the sales of shoes, but rather on the possibility of selling shoes, local policing, local transportation costs, etc.), and using a combination of results to decide whether or not to open a shoe store Normalization ensures that the resulting results do not overlap other feature or be overwritten by other feature because the boundaries are too large/too small. (For an extreme example, if the shoe sales are at least 100, but the best can be sold infinitely many, and the local security situation is expressed in the numerical value between 0~1, if both the direct sum of the security situation is completely ignored) this is the main reason for using logistic regression rather than direct linear regression. When you get here, maybe you're starting to realize that, yes, the logistic Regression is a linear regression that is normalized by the logistic equation, and that's all.
As for so the use of logistic and not other, because this normalization of the method is often more reasonable (people say that they call the logistic well), can suppress large and too small results (often noise), to ensure that the mainstream results are not overlooked. The specific formulas and graphs are shown in the official definition section of this article. where F (X) is the real value of the sell in our example above, and y is the probability of the sale between the 0~1. (This paragraph "possibility" is not "probability", thank zjtchow classmate in reply said)
Applicability of Logistic regression
1) can be used for probability prediction, also can be used for classification.
Not all machine learning methods can predict probability probabilities (such as SVM, which can only get 1 or-1). The benefit of probability prediction is that the result is comparable: for example, when we get the possibility of a different ad being clicked, we can show the N of the most likely click. In this way, even if the probability is high, or the probability is very low, we can take the best topn. When used for classification problems, only one threshold can be set, the probability is higher than the threshold is a class, below the threshold is another category.
2) can only be used for linear problems
Only when the feature and target are linear, can the logistic Regression be used (unlike SVM to deal with nonlinear problems). This has two points of guidance, on the one hand, when the model nonlinearity is known in advance, it is decisive not to use the logistic Regression, on the other hand, when using the logistic Regression, it is important to choose feature with target linear relationship.
3) There is no need to meet the conditions of independent assumptions between feature, but the contribution of each feature is calculated independently.
The logistic regression does not need to satisfy the conditional independent hypothesis like naive Bayes (because it does not have a posteriori probability). But the contribution of each feature is calculated independently, that is, LR does not automatically help you combine different features to generate new feature (it is a matter of time not to have this illusion, that is, the decision tree, LSA, pLSA, LDA, or yourself). For example, if you need to tf*idf such a feature, it must be explicitly given, if only to give two-dimensional TF and IDF is not enough, that will only be similar to A*TF + B*IDF results, and will not have C*TF*IDF effect.
Reference:
Http://blog.sina.com.cn/s/blog_890c6aa301015mya.html
Preliminary understanding of Logistic Regression