Classification Model Evaluation and Selection Summary

Last Update:2014-11-01 Source: Internet

Author: User

Tags element groups

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Evaluation of classifier performance measurement

After a classification model is created, the performance or accuracy of the model will be considered. The following table describes the evaluation metrics of several classifiers:

Assume that a classifier is used in a training set composed of labeled tuples. P indicates the number of positive element groups, and N indicates the number of negative element groups.

Measurement	Formula
Accuracy and recognition rate	(TP + Tn)/(p + n)
Error Rate and misclassification Rate	(FP + FN)/(p + n)
Sensitivity, real case rate, recall rate	TP/P
Special effects and negative examples	TN/n
Precision	TP/(TP + FP)
F, F1, and f scores Harmonic mean of accuracy and recall rate	2 * precision * recall/(precision + recall)
FB, where B is not a negative real number	(1 + B ^ 2) * precision * recall/(B ^ 2 * precision + recall)

Note: Some measurements have multiple names. TP, TN, FP, FN, P, N indicate real, true negative, false positive, false negative, and positive and negative, respectively.

There are four additional terms to know. These terms are "components" used to calculate many evaluation measurements. Understanding them helps to understand the meaning of various measurements.

True positive (true positive, TP): indicates the positive element group correctly classified by the classifier. Make TP the number of real instances.

True negative (TN): indicates the negative tuples correctly classified by the classifier. The number of negative instances that make tn true.

False Positive/false positive (FP): The negative tuples that are incorrectly marked as positive element groups. Number of false positive examples for FP.

False Negative (FN): A Positive metagroup marked as a negative tuples by mistake. Number of false negative samples for fn.

These words are always in the confusion matrix:

Actual class \ predicted class	Yes	No	Total
Yes	TP	FN	P
No	Fp	TN	N
Total	P'	N'	P + n

In addition to accuracy-based measurements, you can also compare classifiers based on other aspects:

Speed: This involves the computing overhead of generating and using classifier.

Robustness: This is the ability of the classifier to make a correct prediction when false data has noise or missing values. In general, robustness is evaluated using a series of merging datasets that increase progressively with noise and missing values.

Scalability: This involves the ability to effectively construct a classifier with a given amount of data. Generally, scalability is evaluated using a series of incremental datasets.

Interpretability: This involves classifier or prediction of the level of understanding and insight it provides. Interpretability is subjective and therefore difficult to evaluate. Decision Trees and classification rules may be easily interpreted, but as they become more complex, their interpretability also disappears.

Summary: When data classes are evenly distributed, the accuracy is the best. Other measurements, such as sensitivity, special effects, precision, f, and Fb, are more suitable for class imbalance problems, where classes of interest are scarce.

2. How to obtain reliable classifier accuracy estimation

A. Retention Method and random Secondary Sampling

Persistence: the given data is randomly divided into two independent sets: training set and test set. Use the training set to export the model, and its accuracy is estimated by the test set. The estimation is pessimistic, because only a portion of the initial data is used to export the model.

Random Secondary Sampling: The method is retained K times, and the total accuracy is estimated to take the average value of the accuracy of each iteration.

B. Cross-validation

In K-fold cross-validation, the initial data is randomly divided into k subsets that do not overlap or "fold" D1, D2 ,..., DK, each of which is roughly equal in size. Perform training and testing K times. In the I iteration, partition Di is used as the test set, and other partitions are used as the training model together. For classification, the accuracy estimation is the total number of tuples correctly classified by K iterations divided by the total number of tuples in the initial data. It is generally recommended to use a-fold crossover to verify the estimation accuracy because it has relatively low bias and variance.

C. Self-Help

A self-help method is used to evenly sample data from the given training tuples.

One common method is the. 632 self-help method. The method is as follows: assume that the given dataset contains D tuples. This dataset has been sampled D times back, and a self-help sample set or training set of D samples is generated. Some of the original data tuples may appear multiple times in this sample set. The data tuples that do not enter the training set form the test set. Assume that this sampling is performed multiple times. The result is that, on average, 63.2% of the original data tuples will appear in the self-help sample, and the remaining 38.8% of the tuples will form a test set.

The sampling process can be repeated K times. In each iteration, the accuracy of the model obtained from the current self-help sample is estimated using the current test set. The overall accuracy of the model is estimated in the following formula: ACC (m) = Σ {0.632 * ACC (MI) _ test_set + 0.368 * ACC (MI) _ train_set, ACC (MI) _ test_set is a self-help sample I model used to test the accuracy of set J. ACC (MI) _ train_set is the accuracy of the model obtained by self-help sample I for the original data tuples. The self-help method works well for small datasets.

D. Other Methods

We also use statistical significance test to select a model, and compare classifier based on cost-benefit and ROC curves.

3. Techniques for Improving Classification Accuracy

A. Introduction to composite classification methods

Composite classification combines K learning models (or base classifier) M1, M2,..., and MK to create an improved composite classification model M *. Use the given dataset d to create K training sets D1, D2,..., DK, where Di is used to create the classifier mi. Given a new data tuples to be classified, each base classifier returns a class Prediction vote. The composite classifier is based on the voting return class Prediction of the Set classifier. Common include bagging, AdaBoost lifting, and random forest.

B. Bagging

Algorithm: bagging. Bag-filling algorithm-creates a composite classification model for the learning solution, where each model provides equal weight prediction.

Input:

D: a set of D training tuples;

K: number of models in a composite classifier;

A learning scheme (such as decision tree algorithm and backward Propagation)

Output: Composite classifier-composite model M *

Method:

For I = 1 to k do // create K models

Create a self-help sample di by sampling D;

Use Di and learning methods to export model Mi;

End

Use a combination classifier to classify tuples X:

Let k models classify K and return a majority vote;

C. Upgrade and AdaBoost

Algorithm: AdaBoost. An improved algorithm-a combination of classifier creation. Each gives a weighted vote.

Input:

D: training set of class tags.

K: number of turns (one classifier is generated for each wheel)

A classification learning solution.

Output: A composite model

Method:

Initialize the weight of each tuples in D to 1/D;

For I = 1 to k do

(3) sampling from d based on the weight of the tuples to obtain di;

Use the training set di to export the model Mi;

Calculate the MI error rate error (MI): Error (MI) = Σ wi * err (XJ); j = 1, 2,..., d

If error (MI)> 0.5 then

Switch step (3) retry;

End if

Multiply the weight of the tuples by error (MI)/(1-error (MI); // update the weight.

Normalize the weight of each tuples

End

Use a combination classifier to classify tuples X:

Initializes the weight of each class to 0;

For I = 1 to k do // For each classifier

Wi = Log {(1-error (MI)/error (MI)} // vote weight of the Classifier

C = mi (x); // obtain class Prediction of X from Mi

Weight of adding WI to Class C;

End

Returns the class with the maximum weight;

To calculate the MI error rate of the model, calculate the weighted sum of each tuples in di by Mi misclassification. That is, error (MI) = Σ wi * err (XJ)

Here, err (XJ) is the incorrect classification error of XJ, the err (XJ) is 1 If XJ is incorrectly classified; otherwise, it is 0. If the performance of the classifier Mi is too poor and the error rate exceeds 0.5, discard it and generate a new training set di to export the new MI.

The MI error rate affects the update of training tuples. If a tuple is correctly classified in round I, its weight is multiplied by error (MI)/(1-error (MI )). Once the weights of all correctly classified tuples are updated, the weights of all tuples (including incorrectly classified tuples) are normalized so that they are the same as before. To normalize the weight, multiply it by the sum of old weights and divide it by the sum of new weights. As described above, the weights of incorrectly classified tuples are increased, while those of correctly classified tuples are reduced.

Once the upgrade is completed, the promotion adds a weight to the voting right of each category based on the splitter's classification. The lower the error rate of a classifier, the higher its accuracy, and the higher its voting weight. The voting weight of the classifier Mi is log {(1-error (MI)/error (MI)}. For each class C, sums the weights of each classifier that assigns Class C to class X. The class with the maximum weight and the winner is returned as the class prediction of the tuples X.

Classification Model Evaluation and Selection Summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More