Machine learning algorithm face question

Source: Internet
Author: User
Tags svm

The similarities and differences between linear regression and logistic regression in machine learning algorithms? What is the difference between SVM and LR (logistic regression)?

The input and output variables of the linear regression are continuous, the input variables of the logistic regression are contiguous, and the output variables are categorical (or discrete, enumerated).

SVM and LR are generally used to deal with the classification problem, the difference is the implementation principle of the two, SVM is the support vector to the classification plane to maximize the distance to optimize the target, to obtain the optimal classification plane, LR is the output category in the form of probability, commonly used is the logistic sigmoid function, Then, the optimal target is constructed by means of maximum likelihood or other method, and then the optimum parameters are solved.

[Add: One of the giant's interviewers has prompted me to say that sigmoid is a function family, not a single function, which is used in LR as a Logistic sigmoid function. About the sigmoid function family, I don't know what that means. ]

What is the difference between the first order and the Hi Jiezheng? What are the occasions for each?

The biggest difference between the two is whether the feature coefficient will be 0, first-order penalty can not only reduce the complexity of the model, but also to complete the feature screening, that is, the coefficient of the partial feature is reduced to 0, the second penalty may reduce the coefficient of some features to a small, but generally will not reduce the coefficient of the feature to 0.

For the occasion, there is no answer for the moment.

What are the indicators for evaluating machine learning classification algorithms?

Common indicators are: accuracy (precision), recall (recall), F-Value (F-score), AUC (area under ROC Curve).

Classification problems can be categorized into the following four categories based on predicted values and real values:

Predicted value
Positive Negative
Real value Positive Tp Fn
Negative Fp Tn

Explain:

Tp:true Positive, the model prediction is a positive example, and the prediction is correct;

Fp:false Positive, the model prediction is a positive example, but the prediction error;

Tn:true negative, the model prediction is the inverse example, and the prediction is correct;

Fn:false negative, model predictions are counter-examples, but prediction errors.

TP FP TN FN represents the number of test samples in four categories, which can be calculated from these four values:

Precision (P) = tp/(TP+FP) Explanation: In all samples predicted as positive examples, the correct proportion of predictions, summed up into a word is-"quasi", so also known as precision.

Recall (R) = tp/(TP+FN) Explanation: The real is a positive example of all the samples, which is predicted to account for a large proportion, summed up into a word is-"full", so also known as recall.

F = 2*p*r/(p+r) Explanation: harmonic averaging of P and R.

In general, p and R are not available, the higher the P, the lower the R, and vice versa. If we want to get as high as P and R at the same time, we can use the F value as the evaluation index. The F-value is essentially the harmonic average of P and R, and if there are different tendencies to P and r, weights can be added to the formula.

In addition, p-r curves can also be used to evaluate model performance. We use the model to score all test samples, the higher the score indicates that the sample is a positive example of the higher the probability, according to the score from large to small to sort the test samples, and then from beginning to finish, in order to predict each sample is a positive example, calculate the time P and R, you can get p-r curve.

Quote Zhou Zhihua Teacher's picture:

The top left point, which represents the high scores of the test samples P and R, because the score is high, so most of the predictions are correct, so the P is very high, and because the number of samples, R is very low, as the number of samples increased, r gradually increased, p gradually decreased, until the curve to the lower right corner of the position, all samples R must be 1, but p is very low. [Supplement: The lower right corner of the curve should not be in (0,1), should be at (p1,1), P1 is very small, this is a personal understanding. ]

What is a ROC curve? What is the meaning of the horizontal and vertical axis respectively? What does the point on the curve mean?

This is a very important question, at least three times in the interview, because this indicator is the most commonly used evaluation index in the classification problem.

The tectonic thinking of ROC curve and P-r curve is similar, but the horizontal ordinate is different. The same is according to the score of each test sample (real or probability values can be) from large to small sort, and then from start to finish, each sample is a positive example, the calculation of TPR and FPR, and then the TPR and FPR connected together, the ROC curve is obtained.

TPR (True Positive rate) = tp/(TP+FN) Explanation: is recall.

FPR (False Positive rate) = fp/(fp+tn) Explanation: Similar to recall, the inverse case as a positive example, equivalent to the recall of the inverse example.

Refer to Teacher Zhou's chart again:

The image on the left is the ideal ROC curve, and the right figure is the ROC curve obtained from the actual application. The curve starts at the lower left, when the number of test samples is very small, so both TPR and FPR are smaller, and as the number of samples increases, the synchronization increases. The more ideal situation is that TPR increases faster, FPR increases slowly, i.e., the model finds more positive cases, and less counter examples, which corresponds to the above points in the graph, understanding this helps to understand the AUC. When the sample is all judged as a positive example, the position of the upper right corner of the ROC is reached, and all positive and inverse examples are found, so both TPR and FPR are 1.0.

The ROC curve, like the p-r curve, if the ROC curve of Model A can wrap the ROC curve of Model B, indicates that model A is more excellent. If the ROC curve of Model A and Model B crosses, you can still compare the two, using the AUC, the area below the ROC curve. The larger the AUC, the better the model.

[Personal Insights] if the ROC curves of the two models have crosses, then their AUC may not be very large, such as a 0.900 and a 0.901, which is to say that a slightly larger model of AUC is better and less significant.

ROC curves have a clear starting and ending point, respectively (0,0) and (max), so the AUC can only be 1, the minimum is 0, the connection between the beginning and end of the line is the AUC 0.5 of the dividing line, which indicates that the test samples and counter-examples in pairs appear, equivalent to random guessing. As long as the model is better than random guesses, the AUC should be greater than 0.5.

What is the relationship between the ROC curve and the P-r curve? Referring to the Watercress article, the P-r curve and the ROC curve are essentially one by one corresponding relationships, and the p-r curve is more responsive to the performance of the model than the ROC curve because the sample has a class imbalance problem. We imagine that if the sample is particularly small, the inverse is particularly large, the main effect on the ROC curve is that TPR is likely to reach 1 early, so that the right half of the ROC curve is 1.0, the shape of the (a) and (c) in the darker color of the curve, the impact of the p-r curve is very small  , which may cause p to remain low or fast (because it is not easy to predict correctly), R quickly reaches near 1, similar to the two curves in figure (d), where the performance of different models can be more clearly reflected by using the P-r curve. There is a reference to read: the relationship between Precision-recall and ROC Curves, not yet read, leave a name here first.

[The understanding of the relationship between the two curves is also very shallow, the lack of more in-depth research and practice, first written here, for later to be shot bricks. ]

What are the similarities and differences between ===========k-means and Dbscan after ======= to be continued?

How to determine the K value in the K-means algorithm?

How do I select a feature?

If the classification or prediction of machine learning does not work well, how can it be improved?

What is the relationship between model complexity and generalization capability? What is the relationship between the number of samples and the ability to generalize?

Machine learning algorithm face question

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.