Python algorithm walkthrough-One Rule algorithm, pythonrule

Last Update:2017-05-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In this way, a feature has only 0 and 1 values, and the dataset has three categories. If Category A has 20 such individuals, Category B has 60 such individuals, and category C has 20 such individuals. Therefore, when this feature is set to 0, Class B is the most likely. However, there are still 40 individuals not in Class B. Therefore, the error rate of dividing this feature from 0 to Class B is 40%. Then, all features are counted, all feature error rates are calculated, and features with the lowest error rate are selected as the unique classification criterion-this is OneR.

Now we use code to implement algorithms.

# OneR Algorithm Implementation import numpy as npfrom sklearn. datasets import load_iris # load the iris dataset = load_iris () # load the data array (feature of the dataset) in the iris dataset X = dataset. data # load the target array (Category of the dataset) in the iris dataset y_true = dataset.tar get # calculate the average value of each feature attribute_means = X. mean (axis = 0) # Compare with the average value. If the value is greater than or equal to "1", the smaller value is "0 ". change the continuous feature value to a discrete class type. X = np. array (X> = attribute_means, dtype = "int") from sklearn. model_selection import partition, x_test, y_train, y_test = train_test_split (x, y_true, random_state = 14) from operator import itemgetterfrom collections import defaultdict # locate the category of different values under a feature. Def evaluate (x, y_true, feature_index, feature_values): num_class = defaultdict (int) for sample, y in zip (x, y_true): if sample [feature_index] = feature_values: num_class [y] + = 1 # Sort To find the largest category. Sort sorted_num_class = sorted (num_class.items (), key = itemgetter (1), reverse = True) in ascending order) most_frequent_class = sorted_num_class [0] [0] error = sum (value_num for class_num, value_num in sorted_num_class if class_num! = Most_frequent_class) return most_frequent_class, error # print train_feature_class (x_train, y_train, 0, 1) # define a function with the feature as the independent variable to find the best feature with the lowest error rate, and the category of each feature value under this feature. Def train_feature (x, y_true, feature_index): n_sample, n_feature = x. shape assert 0 <= feature_index <n_feature value = set (x [:, feature_index]) predictors = {} errors = [] for current_value in value: most_frequent_class, error = cursor (x, y_true, feature_index, current_value) predictors [current_value] = most_frequent_class errors. append (error) total_error = sum (errors) return predict Ors, total_error # locate the class of each feature value under all features. The format is {0 :( {0: 0, 1: 2}, 41)}. First, it is a dictionary, the dictionary key is a feature. The dictionary value is composed of a set, which is composed of a dictionary and a value. The dictionary key is the feature value and the dictionary value is a category, the last value is the error rate. All_predictors = {feature: train_feature (x_train, y_train, feature) for feature in xrange (x_train.shape [1])} # print all_predictors # filter out the error rate of each feature. errors = {feature: error for feature, (mapping, error) in all_predictors.items ()} # Sort the error rate, obtain the optimal features and the lowest error rate. This is the one Rule (OneR) algorithm. Best_feature, best_error = sorted (errors. items (), key = itemgetter (1), reverse = False) [0] # print "The best model is based on feature {0} and has error {1 :. 2f }". format (best_feature, best_error) # print all_predictors [best_feature] [0] # create model = {"feature": best_feature, "predictor ": all_predictors [best_feature] [0]} # print model # start the test-classify the feature values under the optimal feature. Def predict (x_test, model): feature = model ["feature"] predictor = model ["predictor"] y_predictor = np. array ([predictor [int (sample [feature])] for sample in x_test]) return y_predictory_predictor = predict (x_test, model) # print y_predictor # Under this optimal feature, the classification of each feature value is compared with the test dataset to obtain the accuracy. Accuracy = np. mean (y_predictor = y_test) * 100 print "The test accuracy is {0 :. 2f} % ". format (accuracy) from sklearn. metrics import classification_report # print (classification_report (y_test, y_predictor ))

Conclusion: In the OneR algorithm, I initially thought it was a feature with the lowest error rate that can be used to determine the classification of all features,
In fact, it is clear that it can only judge the classification of feature values under this feature, so obviously it has some limitations. Just say
It is quick and simple. However, you still have to determine whether to use it.

Class precision recall f1-score support

0 0.94 1.00 0.97 17
1 0.00 0.00 0.00 13
2 0.40 1.00 0.57 8

Avg/total 0.51 0.66 0.55 38

Note:

# In the above Code.
For sample in x_test:
Print sample [0]
# Obtain the first column of x_test data. The following code is used to obtain the first row of x_test data.
Print x_test [0]
# Note the differences between the two

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python algorithm walkthrough-One Rule algorithm, pythonrule

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python algorithm walkthrough-One Rule algorithm, pythonrule

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support