Machine learning interview--Algorithm evaluation index

Source: Internet
Author: User
Tags dashed line

machine learning consists of three stages :

    • First stage: Learning model . Using the learning algorithm, the classification model is obtained by inductive learning of the training set.

    • Phase two: test the Model . The classification models that have been learned are used in test sets to classify instances of unknown categories in the test set.

    • Phase III: performance evaluation . Obviously, the classification generated by the test set may not be optimal, which results in a possible error in classifying the test set. And people want to get the most out of it. A best classification model is essential for classifier performance evaluation. It is only through excellent evaluation criteria that a better performance classifier can be selected.

Evaluation indexes of different machine learning algorithms:

Regression assessment

Regression is the prediction of a continuous real value, that is, the output value is a continuous real value, and the classification is a discrete value.

    • (1) Mean absolute error (Mean Absolute error,mae) is also known as L1 norm loss (l1-norm loss)

    • (2) Mean squared error (Mean squared Error,mse) is also known as L2 Norm loss (l2-norm loss)

Classification assessment Accuracy rate (accuracy)

Calculation Formula : Accuracy = (TP+TN)/(TP+TN+FP+FN)

In the case of unbalanced positive and negative samples , the accuracy rate of this evaluation index has a great defect. For example, in the Internet advertising, the number of clicks is very small, generally only a few, if used accuracy, even if all the predictions into negative class (not click) accuracy also has more than 99%, no meaning.

Accuracy rate (precision)
    • definition : The correct classification of the number of cases is classified as a positive example of the number of instances of the ratio, also known as Precision .

    • Calculation Formula :TP/(TP+FP)

Recall rate (recall)
    • definition : Correct classification of the number of positive cases accounted for the proportion of the actual number of positive cases is also called recall .

    • Calculation Formula :TP/(TP+FN)

F-Value (f-measure)
      • definition : F-measure is a weighted harmonic average of recall and precision, also known as F-score.

      • Calculation Formula : (1)F1 = 2PR/(P+R)=2TP/(2TP+FP+FN)
        (2)F- = (α^2+1)PR/(α^2)P + R)

In general, precision is the number of items retrieved (such as documents, Web pages, etc.) that are accurate, and recall is how much of the exact entries have been retrieved. The F-value is used to

Weigh the two indicators of precision and recall.

The above evaluation criteria are more sensitive to the change of class distribution.

MCC Matthews Correlation coefficient

MCC(Matthews correlation coefficient): An indicator used in machine learning to measure the classification performance of two classifications [83]. The indicator considers true-positive, true-negative, and false-positive and false-negative, which are generally considered to be a more balanced indicator, and can be applied even when the sample content of the two categories varies widely. MCC is essentially a correlation coefficient between the actual classification and the predictive classification, its value range is [ -1,1], the value of 1 is the perfect prediction of the subject, the value of 0 indicates that the result of the prediction is not as good as the results of the stochastic prediction, 1 refers to the prediction classification and the actual classification is completely inconsistent.

Metrics that measure unbalanced datasets are better.

Logarithmic loss function (Log-loss)
?? In categorical output, if the output is no longer 0-1, but the real value, which is the probability of belonging to each category, then the classification results can be evaluated using Log-loss. This output probability represents the confidence level of the corresponding category to which the record belongs. For example, if the sample belongs to category 0, but the classifier outputs its probability of belonging to Category 1 is 0.51, then it is considered that the classifier has an error. This probability approximates the boundary probability of the classifier's classification 0.5. Log-loss is a soft classification accuracy measurement method that uses probabilities to represent the confidence level of the class to which it belongs. Log-loss specific mathematical expressions are:

Among them, Yi refers to the real class I samples belong to 0 or 1,PI indicates that the first sample belongs to the probability of category 1, so that the two parts of the formula will only choose one for each sample, because there is a certain 0, when the prediction and the actual category exactly match, then two parts are 0, which assumes 0log0=0.

AUC (area under Curve)

ClickThrough Rate for CTR (click through rate) online ads.

The CTR of the invisible exposure is about 1/1000, and for visible exposure, the click rate is around 1/100, so the majority of exposures are not clicked. Therefore, the delivery of one may be high
Probability Click ads are very important, on the one hand, the effective use of this exposure, while also realizing the flow of cash.

Therefore, it is very important to show the advertisement to complete the accurate CTR estimate.

CTR Estimation Process:

That is, using the history log to train the model, the online server loads this
Model When there is an ad presentation request, the server calculates the estimated click-through rate of the candidate ad based on the context information of the request, selecting the
The highest estimated Ctr is displayed.

The common feature are:

    • Advertiser characteristics: such as industry classification
    • Demographics: such as cookies, gender, etc.
    • Ad characteristics: such as the type of material (text chain, picture, Flash and other types), the core words of advertising.

The role of AUC in feature selection

When optimizing the model, we expect to be able to include enough features with a typical degree of differentiation. Features have a good degree of differentiation that helps in screening ads
stage for accurate sorting.

For the same page on the ad bit, the user when browsing the page, from the top of the page to the bottom, the ad bit click-through rate is abrupt, even the second position, its CTR relative to the first position
CTR will drop by 90%.

In this case, the top 2 ads (named A, b) need to be selected when the candidate ads are ranked in descending order by the estimated Ctr (PCTR)
To show. The addition or absence of a feature will affect the ranking of a and B. Assuming a has a greater yield, if the order is B, a, that's obviously
A's CTR will drop more, and B's earnings are smaller, that overall, the advertising is not the best exposure display. So, any one feature
is based on actual historical samples to assess the impact of such an impact, and the AUC is precisely the key to quantifying this
An indicator, or AUC, is an important indicator used to evaluate the ability of a model to sort.

Before introducing the AUC, there are two prerequisite concepts to be introduced: ROC and Confusion Matrix.

    • The full name of the ROC is called receiver Operating characteristic, whose main analysis tool is a curve drawn on a two-dimensional plane.
    • --roc curve. The horizontal axis of the plane is false positive rate (FPR), and the ordinate is true positive rate (TPR)

Defining the area below the ROC curve is called the AUC (areas under curve).

The total area is the unit rectangle (1*1), and the value range of the AUC is 0~1. Generally, when adding a high-sensitivity feature, the AUC of the model is usually increased, and the picture
The area under the dashed line is 0.5.
The calculation of AUC requires the introduction of the confusion matrix (confusion matrix)

A typical two-class confusion matrix is as follows:
| Positive sample number P | Negative Sample number N | Predict |
| ———— –| : ————: | --: |
| True Positive | False Positive | Y |
| False Negative | True Negative | N |

In the AUC calculation, we stand on the positive sample point of view, that is, using the relevant two data items, true positive and false
Positive In the above classification results, the positive samples are divided into the correct ratio is:

TPR = tp/p (number of real positive samples TP+FN)
, while the ratio of false positive is:

FPR = fp/n (number of real negative samples fp+tn)
The AUC can be computed by sorting all ads in descending order of pctr and then combining the actual sample data.

discrimination threshold, literal translation is "threshold value". Refers to each time a pctr on a partition, draw
This pctr is the threshold value, above which is the estimated positive sample, below this threshold, is the estimated negative sample. In this case, the accounting
To calculate the current division of TPR and FPR. Get a point pair (FPR, TPR). Each division will be given a point pair, then you can draw
The ROC is produced and the AUC is calculated.

In order to calculate a simple answer, 8 samples, 4 positive samples and 4 negative samples were given. (Of course, in practice, the ratio of positive and negative samples is much larger than this
, which is imbalance) in order to differentiate, where 1 means the actual sample is the click, and 0 indicates that the actual situation does not produce

Click. And PCTR is the estimated Ctr. Our expectation is that our estimated Ctr high ads, which should actually have click, will show me
The estimate is effective and accurate, and of course no click occurs and is understandable, and the estimate is currently less than 100% accurate.

The way in which the intervals are divided is:

    • In 0.9, it is considered that the >=0.9 are all positive samples, and the <0.9 are negative samples. Sub-division,.
    • Continue, divided in 0.8, that the >=0.8 are positive samples, that is, the high probability of the click, <0.8 is estimated as a negative sample, this row
    • Divided,,.
    • Continue, at 0.7, it is considered that the >=0.7 is a positive sample and the <0.7 is a negative sample. At this point, one is false positives.
    • , .
    • ...
    • And so on, you can calculate all the FPR, TPR, according to these points to draw a ROC curve. When the number of samples is large enough, the ROC curve becomes smoother. Calculate the area of each "small rectangle" under the curve to accumulate.

The ideal sorting ability is estimated in descending order according to PCTR
, all the positive samples are in front of the negative sample, that is, the ad that is expected to be in front of it will be click, which is divided by the previous several
FPR are FPR = 0/n

The goal of model optimization is to try to move closer to this ideal goal.

The advantages of the ROC curve and AUC are not affected by the class distribution, and are suitable for data sets that are not balanced with the evaluation and comparison of class distributions . Therefore, ROC curve and AUC have been widely used in medical decision making, pattern recognition and data mining. However, Roc and AUC are only suitable for two kinds of problems, and can not be applied directly to many kinds of problems .

Machine learning interview--Algorithm evaluation index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.