Python algorithm walkthrough--one Rule algorithm

Last Update:2017-05-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Such a feature has only 0 and 12 values, and the dataset has three categories. When taking 0, if the category A has 20 such individuals, category B has 60 such individuals, category C has 20 such individuals. Therefore, this feature is 0 o'clock, most likely the Class B, but there are still 40 individuals are not in category B, so the error rate of 0 points to Category B is 40%. Then, all features are counted, all feature error rates are computed, and the lowest error rate is selected as the only classification criterion-this is oner.

Now use the code to implement the algorithm.

#Oner Algorithm ImplementationImportNumPy as NP fromSklearn.datasetsImportLoad_iris#loading the iris data setDataSet =Load_iris ()#loading the data array in the iris dataset (characteristics of the dataset)X =Dataset.data#load the target array in the Iris dataset (the category of the dataset)Y_true =Dataset.target#calculate the average of each featureAttribute_means = X.mean (axis=0)#compared to the average, the value is greater than or equal to "1", and less than "0". The eigenvalues of continuity are transformed into categorical types with discrete properties. x = Np.array (x >= Attribute_means, dtype="int") fromSklearn.model_selectionImportTrain_test_splitx_train, X_test, Y_train, Y_test= Train_test_split (x, Y_true, random_state=14) fromoperatorImportItemgetter fromCollectionsImportdefaultdict#find the category that belongs to a different value under a feature. deftrain_feature_class (x, Y_true, Feature_index, feature_values): Num_class=defaultdict (int) forSample, Yinchzip (x, y_true):ifSample[feature_index] = =Feature_values:num_class[y]+ = 1#sorting to find the most categories. Sort from large to smallSorted_num_class = sorted (Num_class.items (), Key=itemgetter (1), reverse=True) Most_frequent_class=Sorted_num_class[0][0] Error= SUM (value_num forClass_num, Value_numinchSorted_num_classifClass_num! =Most_frequent_class)returnMost_frequent_class, Error#Print Train_feature_class (x_train, Y_train, 0, 1)#Next, define a function that takes a feature as an argument to find the best feature with the lowest error rate and the category under which the characteristic values belong. deftrain_feature (x, Y_true, Feature_index): N_sample, N_feature=X.shapeassert0 <= Feature_index <n_feature Value=Set (x[:, Feature_index]) predictors={} errors= []     forCurrent_valueinchValue:most_frequent_class, Error=train_feature_class (x, Y_true, Feature_index, Current_value) Predictors[current_value]=most_frequent_class errors.append (Error) Total_error=sum (Errors)returnpredictors, Total_error#find the categories of each characteristic value under all features, such as: {0: ({0:0, 1:2}, 41)} First a dictionary, the dictionary key is a feature, the value of the dictionary is composed of a set, the set is a dictionary and a value, the dictionary key is the eigenvalues, the dictionary value is the category, The last individual value is the error rate. All_predictors = {feature:train_feature (X_train, Y_train, feature) forFeatureinchXrange (x_train.shape[1])}#Print All_predictors#filter out the error rate under each featureErrors = {Feature:error forfeature, (mapping, error)inchall_predictors.items ()}#Order the error rate, get the best feature and the lowest error rate, as the model and rule. This is the one Rule (OneR) algorithm. Best_feature, Best_error = sorted (Errors.items (), Key=itemgetter (1), reverse=False) [0]#print "The best model was based on feature {0} and have error {1:.2f}". Format (best_feature, Best_error)#Print All_predictors[best_feature][0]#Building a modelModel = {"feature": Best_feature,"Predictor": all_predictors[best_feature][0]}#Print Model#Start testing-classify the category of eigenvalues under the optimal feature. defPredict (X_test, model): Feature= model["feature"] Predictor= model["Predictor"] Y_predictor= Np.array ([Predictor[int (Sample[feature])] forSampleinchX_test]) returnY_predictory_predictor=Predict (x_test, model)#Print Y_predictor#under this optimal feature, the category of each eigenvalue is relative to the test data set, and the accuracy rate is obtained. accuracy = Np.mean (Y_predictor = = y_test) * 100Print "The test accuracy is {0:.2f}%". Format (accuracy) fromSklearn.metricsImportClassification_report#Print (Classification_report (y_test, y_predictor))

Summary: Oner algorithm, I thought at the beginning that it was to find a characteristic of the lowest error rate can be judged by the classification of all features,
In fact, now understand that it can only judge the characteristics of the characteristics of the classification of the values, so it is obvious that it will have some limitations. Just saying
It is relatively quick and simple and clear. However, it is up to the situation to decide whether to use it.

Class Precision recall F1-score support

0 0.94 1.00) 0.97 17
1 0.00 0.00) 0.00 13
2 0.40 1.00) 0.57 8

Avg/total 0.51 0.66 0.55 38

Note:

# in the code above.
For sample in X_test:
Print Sample[0]
# Get the first column of data from X_test. The following code is used to get the first line of data for X_test.
Print X_test[0]
# Note the difference between the two

Python algorithm walkthrough--one Rule algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python algorithm walkthrough--one Rule algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python algorithm walkthrough--one Rule algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support