For the introduction of machine learning, we need some basic concepts:
Definition of machine learning
M.mitchell the definition in machine learning is:
For a certain type of task T and performance Metric p, if a computer program is self-perfecting with experience E in the performance of P measured on T, then we call this computer program to learn from experience E.
Algorithm classification
Two pictures are a good summary of the (machine Learning) algorithm classification:
Evaluation indicator Classification (classification) algorithm indicators:
- Accuracy accuracy Rate
- Precision accuracy Rate
- Recall recall Rate
- F1 Score
The results for the classification problem can be expressed in the following table (note: True or false Indicates whether the predicted results are correct, positive and negative represent the results found by the program):
Accuracy accuracy Rate
The accuracy is defined as the ratio of the number of samples correctly categorized by the classifier to the total number of samples for a given test data set. The formula is:
Accuracy rate of the existence of the paradox of accuracy, refer to the specific instructions here.
Precision accuracy Rate
The exact rate is calculated as the proportion of the predicted result that conforms to the actual value, which can be understood as having no " false positives ", the formula is:
Recall Recall Rate
Recall rate is calculated: The correct classification of the number and all "should" be correctly classified (in line with the target label) the proportion of the number can be understood as the exact rate corresponding to the absence of " false negatives " situation. The formula is:
F1 Score
The F1 value is the harmonic mean of the accuracy and recall, defined as:
That
Application Scenarios:
Accuracy and recall are mutually influential, ideally it must be done both high, but in general the accuracy is high, the recall rate is low, the recall rate is low, the accuracy is high, of course, if both are low, that is where the problem. When both precision and recall rates are high, the value of the F1 is high. In the case of both requirements, it can be measured by F1.
- Prediction of earthquakes
What we hope for in the earthquake prediction is that the recall is very high, that is to say, every earthquake we want to predict. We can sacrifice precision at this time. 1000 alarms are preferred, 10 earthquakes are predicted correctly, and do not predict 100 times 8 leaks two times.
- Suspects convicted
Based on the principle of not blaming a good man, we hope to be very accurate about the conviction of a suspect. In time, some criminals were spared (recall low), but also worthwhile.
Regression (Regression) algorithm indicator:
- Mean Absolute Error Average absolute deviation
- Mean squared error mean square errors
- R2 Score
- Explained Variance Score
Average absolute Error
Formula:
Mean square error
Formula:
R2 Score
That is, "coefficient of determination" determines the degree to which the predicted model and the true data fit, the best value is 1, can be negative.
Yˉtˉtˉt=1N∑NI=1yi
Explained Variance Score
Reference
"1": http://scikit-learn.org
"2": Machine learning Concept Reference: http://underthehood.blog.51cto.com/2531780/577854
"3": Machine Learning Summary: Links
Classification and evaluation index of machine learning algorithms