First, let's take a look at a set of common concepts in machine learning evaluation criteria, summarized in the following table
Terminology |
Abbreviations |
meaning |
True Positive |
Tp |
Positive samples predicted by the model |
True Negative |
Tn |
Negative samples that are predicted as negative by the model |
False Negative |
Fn |
Positive samples that are predicted as negative by the model |
False Positive |
Fp |
Negative samples predicted by the model as positive |
The above definitions are interpreted as follows
- True, False indicates the correctness of the algorithm prediction
- Positives, negatives for model prediction as positive example/negative example
When you understand the meaning of a combination, look at the second keyword meaning, and then look at the first keyword meaning
such as: True negative
- The second keyword: negative, which indicates that the model predicts the sample as a negative example
- The first keyword: true, indicates that the model prediction is correct, so the sample real tag is a negative example
That is, the sample is a negative sample that is predicted as negative by the model
?
Precision, AR
The accuracy rate (Precision) is for the predicted result, which represents how much of the sample that is predicted to be positive is the true positive sample. Then there are two possible predictions, one is to predict the positive class as a positive Class (TP), and the other is to predict the negative class as a positive Class (FP), which is
\[\BEGIN{ALIGN}\BF Precision = \frac{tp}{tp+fn} \tag{1}\\end{align}\]
The recall rate (Recall) is for the original sample, which indicates how much of the sample is predicted correctly. There are two possibilities, one is to predict the original positive class as a positive Class (TP), and the other is to predict the original positive class as a negative Class (FN).
\[\BEGIN{ALIGN}\BF Recall = \FRAC{TP}{TP+FP} \tag{2}\end{align}\]
?
F-score
Wikipedia F1_score
?
F-measure is a statistic, f-measure is also known as F-score,f-measure is Precision and Recall weighted harmonic average, is the IR (Information retrieval) field of common evaluation criteria, often used to evaluate the quality of the classification model. F-measure synthesized the results of presion and Recall, and when the f-measure was higher, it was shown that the test method was more effective. F1-measure is defined as follows
\[{\displaystyle F_{1}=\left ({\frac {\mathrm {recall} ^{-1}+\MATHRM {precision} ^{-1}}{2}}\right) ^{-1}=2\cdot {\frac} \MATHRM {Precision} \cdot \mathrm {recall}}{\MATHRM {precision} +\MATHRM {recall}}}}\]
The general definition of F-measure is as follows,
\[f_\beta = (1 + \beta^2) \cdot \frac{\mathrm{precision} \cdot \mathrm{recall}}{(\beta^2 \cdot \mathrm{precision}) + \math Rm{recall}}\]
The formula (1) and the formula (2) into the upper type
\[{\displaystyle F_{\beta}={\frac {(1+\beta ^{2}) \cdot \MATHRM {true\ Positive}} {(1+\beta ^{2}) \cdot \MATHRM {true\ posi tive} +\beta ^{2}\cdot \mathrm {false\ negative} +\MATHRM {false\ positive}}}\,} \]
Use the square of β, just to show that the factor on Presion is greater than 0
?
Precision, Recall, F1-score