R Language ︱ machine Learning Model Evaluation Index + four reasons for error of model and how to correct it

Last Update:2017-02-19 Source: Internet

Author: User

Tags square root

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

R Language ︱ machine Learning Model Evaluation Index + (TURN) Model error four reasons and how to correct

The author's message: the way of cross-validation in machine learning is the main model evaluation method, which indexes are used in cross-validation?

Cross-validation divides the data into training datasets, test datasets, and then trains through the training data set to test the test data set to validate the set.

The evaluation of the model prediction effect is usually measured by relative absolute error, mean absolute error, root mean variance, relative square root error and other indexes.

Only in the unsupervised model will choose some of the so-called "tall" indicators such as information entropy, complexity and Gini value and so on.

In fact, this kind of indicators just look old-fashioned but not "simple", "Data mining" in the monitoring and evaluation of the monitoring model is still some of the traditional indicators more reliable, such as mean absolute error (MAE), mean square difference (MSE), the standard mean variance (NMSE) and mean, calculation is simple, easy to understand;

Each has its advantages and disadvantages, and in the case of a single model,

——————————————————————————

Related content:

1. R Language ︱roc Curve--performance evaluation of classifier

2. Problem of overfitting in machine learning

3. R Language ︱ machine learning Model evaluation Scheme (take random forest algorithm as an example)

——————————————————————————

1. Absolute error and relative error

Absolute error (ABSOLUTEERROR) = original value-Estimated value

Relative error (Relativeerror) = (original value-estimate)/original value

2. Mean absolute error (Meanabsoluteerror, MAE)

Mean absolute error =︱ original value-Estimated value ︱/n

where n represents the number of data, equal to the weighted average of the absolute value of the error.

Because the prediction error is positive and negative, in order to avoid the negative and positive offset, the absolute value of the error is synthesized and averaged, which is one of the comprehensive index methods of error analysis.

Advantages and Disadvantages: Although the average absolute error can get a valuation, but you do not know whether the value of the model is good or bad, only by contrast to achieve the effect;

3, mean square error (meansquarederror, MSE) ≈ Variance

As with the variance, the mean square error is the average of the sum of squares of the predicted error, which avoids the problem that the positive and negative errors cannot be added.

Because of the square of the error, it strengthens the function of the large error in the index, which improves the sensitivity of the index and is a great advantage. Mean square error is one of the comprehensive index methods for error analysis.

Advantages and disadvantages: mean variance also has the same problem, and the mean variance due to the square, the value of the unit and the original predicted value is not uniform, such as the unit of observation value of meters, the unit of mean variance becomes square meters, more difficult to compare.

4, RMS error (Rootmeansquarederror, RMSE) ≈ standard deviation

This is the square root of the mean square error, which represents the discrete degree of the predicted value, also called the standard error, the best fitting case is. The RMS error is also one of the comprehensive indexes of error analysis.

Advantages: standardized mean variance is standardized to improve the mean variance, by calculating the ratio of accuracy between the model to be evaluated and the model based on the mean, the normalized mean variance value range is usually 0~1, the smaller the ratio, the better the model is superior to the mean to predict the strategy,

The value of NMSE is greater than 1, which means that the model prediction is not as simple as predicting the average value of all observations,

Cons: However, it is difficult to estimate the difference between the predicted value and the observed value through this indicator, because its units are not the same as the original variables, combining the pros and cons of each indicator, we use three indicators to evaluate the model.

5, average absolute error (Meanabsolute percentageerror, MAPE) ≈ standard deviation

A bit similar to the root mean square error above.

6. Confusion matrix (confusion matrices)

The diagonal element = The percentage that the classifier correctly recognizes, rather than the diagonal element = The percentage of false judgments.

Confusion Matrix table		Prediction class
		class =1	class =0
Actual class	class =1	A	B
	class =0	C	D

7, the subjects working characteristics (Receiver Operating characteristic,roc) curve

A very effective model evaluation method, which can give quantitative hints for selected thresholds.

The integral area size under this curve is closely related to each method, reflecting the statistical probability that the classifier is correctly classified, the closer to 1, the better the algorithm.

Can be implemented with ROCR package, can refer to the blog (R language ︱roc Curve-classifier performance evaluation)

The classifier algorithm eventually has a predictive precision, and the predictive precision will write a confusion matrix, all the training data will fall into the matrix, and the number on the diagonal represents the correct number of predictions, that is, True positive+true nagetive.

TPR (true rate or sensitivity) and TNR (true negative rate or specificity) can be calculated accordingly.

We subjectively hope that these two indicators, the bigger the better, but unfortunately they are a relationship between the elimination of the other. In addition to the training parameters of the classifier, the choice of critical point will greatly affect TPR and TNR. Sometimes it is possible to choose specific tipping points based on specific problems and needs.

————————————————————————————————————

Four reasons for the model error and how to correct it

There are many machine learning models to choose from. We can use linear regression to predict a value, use logistic regression to classify different results, and use neural networks to model nonlinear behavior.

When we modeled, we usually used a historical data to let the machine learning model learn the relationship of a set of input characteristics to predict the output. But even if the model accurately predicts a certain value in historical data, how do we know if it can predict new data exactly as well?

In short, how do you evaluate whether a machine learning model is really "good"?

In this article, we'll look at some common scenarios where seemingly good machine learning models still make mistakes, and discuss how to evaluate these model problems with metrics such as bias (bias) vs variance (variance), precision (precision) vs recall (recall). and propose some solutions for you to use when you encounter such situations.

High deviation or high variance

The first thing to do when testing a machine learning model is to see if there is a "high deviation" or "Gaofangcha" (Bias).

High deviation refers to whether your model is "under-fitted (underfitting)" to the experimental data (see). High deviations are not good because your model is not very accurate or representative of the relationship between the input value and the predicted output value, and often output high error values (such as the difference between the model predicted value and the real value).

Gaofangcha refers to the opposite situation. When high variance or "overfitting" occurs, the machine learning model is too accurate to fit the experimental data perfectly. The results look good, but they need to be noticed, as such models are often not suitable for future data. So while the model works well for existing data, you don't know how it can work on other data.

How do you know if your model has high deviations or high variance?

A straightforward approach is to split the data in one place: the training set and the test set. For example, the model is trained on 70% of the data, and then the remaining 30% data is used to measure the error rate. If the model has a high error in both the training data and the test data, the model is not fit in both sets of data, that is, there are high deviations. If the model has a low error rate on the training set and a high error rate on the test set, this means high variance, that is, the model cannot be applied to the second set of data.

If the model overall has a low error rate on the training set (past data) and the test set (future data), you find a "just" model that balances the degree of deviation and variance.

Low accuracy or low recall rate

Even if the machine learning model is highly accurate, other types of errors may occur.

To classify e-mail messages as spam (positive category positive class) and non-spam (negative category negative class) as an example. 99% of the cases, the messages you receive are not spam, but maybe 1% are spam. Let's say we train a machine learning model to learn to always predict email as a non-spam (negative category), which is accurate in the case of model 99%, but has never captured a positive category.

In this case, it would be helpful to use two indicators-accuracy and recall-to determine exactly what percentage of positive categories to predict.

Accuracy is measured by the number of positive categories often true, can be calculated by "true (true positive, for example, predicted as spam and really spam)" and "true negative (true negative, such as the forecast as spam, but the fact that this is not the case)" The sum of "true" in the total.

Recall rates are used to measure how many of the actual positive categories are often accurately predicted to calculate true and false negative (false negative, such as predicting that a message is not spam, but in fact the message is spam).

Another way to understand the difference between accuracy and recall is that accuracy is measured by how much of the forecast in the positive category is true, and the recall rate tells you how often a positive category can be captured in a prediction. Therefore, when the positive category is predicted to be very rare, there is a low accuracy rate, when the positive category is rarely predicted, there is a low recall rate.

The goal of a good machine learning model is to achieve a balance between accuracy and recall by trying to maximize the number of "real" and minimize "false negative" and "false positive" (as shown).

5 ways to improve the model

If the model faces high deviations vs. high variance, or is difficult to balance between accuracy and recall rates, there are several strategies to adopt.

For example, when a machine learning model has a high deviation, you can try increasing the number of input features (feature). As discussed above, high deviations occur when the data behind the model is not ready, and there is a high error rate in both the training set and the test set. If the error of the model is drawn based on the function of the number of input features (see), the more features we find, the better the fit of the model.

Similarly, for high variance, you can reduce the number of input features. If the model is over-fitting the training data, it is possible that you have used too many features, reducing the number of input features to make the model more flexible for testing or for future data. Also, increasing the number of training samples is beneficial for high variance, which helps the machine learning algorithm to build a more general model.

To balance low-precision and low-recall situations, you can adjust the probability threshold (probability threshold) that distinguishes the positive and negative categories. The probability threshold can be improved by the low accuracy rate, so that the model is more conservative when assigning positive categories. Conversely, the probability threshold can be reduced when the low recall rate is encountered, so that the positive category can be predicted more frequently.

With enough iterations, it is possible to find a suitable machine learning model that balances deviations and variances, accuracy and recall rates.

R Language ︱ machine Learning Model Evaluation Index + (TURN) Model error four reasons and how to correct

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More