This article introduces the content of the detailed classification evaluation indicators and regression evaluation indicators and Python code implementation, has a certain reference value, now share to everyone, there is a need for friends to refer to.
1. Concept
Performance measurement (evaluation) indicators, the main divided into two major categories:
1) Classification Evaluation Index (classification), main analysis, discrete, integer. Specific indicators include accuracy (accuracy rate), precision (accuracy), recall (recall), F-value, P-r curve, Roc Curve, and AUC.
2) regression evaluation index (regression), which mainly analyses the relationship between integers and real numbers. Its specific indicators include the available variance score (explianed_variance_score), mean absolute error Mae (mean_absolute_error), mean square error MSE (MEAN-SQUARED_ERROR), RMS difference Rmse, Cross-entropy Lloss (Log loss,cross-entropy loss), R-Square value (determination factor, r2_score).
1.1. Prerequisites
Suppose that there are only two classes-the positive class (positive) and the negative class (negative), usually the class of interest is the positive class, and the other classes are negative (so many kinds of problems can be summed up as two types)
The confusion matrix (confusion matrix) is as follows
Actual category |
Forecast Category |
|
Is |
Negative |
Summarize |
|
Is |
Tp |
Fn |
P (actual positive) |
|
Negative |
Fp |
Tn |
N (actual negative) |
|
Table AB Mode: The first represents the right error for the prediction result, and the second represents the category of the prediction. As TP indicates, true Positive, that is, the correct prediction is the positive class; FN indicates that false negative, that is, the wrong prediction for the negative class.
2. Evaluation index (performance measurement)
2.1. Classification Evaluation Index
2.1.1 Value indicators-accuracy, Precision, Recall, F values
Measure |
Accuracy (accuracy rate) |
Precision (exact rate) |
Recall (recall rate) |
F value |
Defined |
The ratio of the number of samples correctly categorized to the total number of samples (predicted as the proportion of real spam messages in spam messages) |
The ratio of the true positive example to the positive example number in the positive case (all real spam messages are classified to find the correct proportion) |
is correctly judged as the ratio of the positive case number to the total positive case number |
Accuracy and recall rate of the harmonic average F-score |
Said |
accuracy= |
precision= |
Recall= |
F -score = |
1.precision is also often referred to as precision, recall called recall
2. The more commonly used is F1,
python3.6 Code implementation:
#调用sklearn库中的指标求解from Sklearn Import metricsfrom sklearn.metrics import Precision_recall_curvefrom sklearn.metrics Import average_precision_scorefrom sklearn.metrics Import accuracy_score# gives categorical results y_pred = [0, 1, 0, 0]y_true = [0, 1, 1, 1]p Rint ("Accuracy_score:", Accuracy_score (Y_true, y_pred)) print ("Precision_score:", Metrics.precision_score (Y_true, y _pred) Print ("Recall_score:", Metrics.recall_score (Y_true, y_pred)) print ("F1_score:", Metrics.f1_score (Y_true, Y_ pred) Print ("F0.5_score:", Metrics.fbeta_score (Y_true, y_pred, beta=0.5)) print ("F2_score:", Metrics.fbeta_score (Y_ True, y_pred, beta=2.0))
2.1.2 Correlation Curve-p-r curve, ROC curve and AUC value
1) p-r curve
Steps:
1, from high to low the "score" value is sorted and sequentially as the threshold value threshold;
2, for each threshold value, "score" value is greater than or equal to the threshold test sample is considered to be a positive example, the other is a negative example. Thus forming a set of predictions.
eg.
0.9 as a threshold, the 1th test sample is a positive example, 2, 3, 4, 5 are negative examples
Get
|
forecast as a positive example |
forecast negative example |
total |
normal case (score greater than threshold) |
0.9 |
0.1 |
1 |
negative (score less than threshold) |
0.2+0.3+0.3+0.35 = 1.15 |
0.8+0.7+0.7+0.65 = 2.85 |
4 |
precision = recall= |
In the section below the threshold value, as a negative example, the value of the predicted negative case is the correct prediction value, that is, if it is a positive example, then take TP, if it is negative, then take TN, which is the forecast score.
Python implements pseudo-code
#precision和recall的求法如上 # The main introduction to Python Paint Library import matplotlib.pyplot ad plt# mainly used for matrix operation of the library import numpy as np# import iris data and training see previous blog post ... #加入800个噪声特征, increase the complexity of the image # merges the 150*800 noise signature matrix with the 150*4 Iris DataSet column X = np.c_[x, np.random.RandomState (0). RANDN (N_samples, 200* N_features)] #计算precision, recall gets the array for I in range (n_classes): #计算三类鸢尾花的评价指标, _ as a temporary name using Precision[i], Recall[i], _ = Precision_recall_curve (y_test[:, I], y_score[:,i]) #plot作图plt. CLF () for I in Range (n_classes): Plt.plot (Recall[i], precision[i]) Plt.xlim ([0.0, 1.0]) Plt.ylim ([0.0, 1.05]) Plt.xlabel ("Recall") Plt.ylabel (" Precision ") Plt.show ()
The P-r curve of the iris data set is obtained by adding the above code complete
2) ROC curve
Horizontal axis: Pseudo Positive example rate FP = fp/n
Longitudinal axis: true example rate TP = tp/n
Steps:
1, from high to low the "score" value is sorted and sequentially as the threshold value threshold;
2, for each threshold value, "score" value is greater than or equal to the threshold test sample is considered to be a positive example, the other is a negative example. Thus forming a set of predictions.
Similar to the P-R curve calculation, no longer repeat
The ROC image of the iris data set is
The AUC (area under Curve) is defined as areas under the ROC curve
The AUC value provides an overall value for the classifier. Usually the greater the AUC, the better the classifier, with a value of [0, 1]
2.2. Regression Evaluation Index
1) can be released variance score
2) mean absolute error MAE (Mean absolute error)
3) Mean variance MSE (Mean squared error)
4) Logistics return loss
5) Conformance evaluation-Pearson correlation coefficient method
Python Code implementation
From Sklearn.metrics import Log_losslog_loss (Y_true, y_pred) from scipy.stats import Pearsonrpearsonr (Rater1, Rater2) From Sklearn.metrics import Cohen_kappa_scorecohen_kappa_score (Rater1, Rater2)