Python algorithm walkthrough--one Rule algorithm

Source: Internet
Author: User

Such a feature has only 0 and 12 values, and the dataset has three categories. When taking 0, if the category A has 20 such individuals, category B has 60 such individuals, category C has 20 such individuals. Therefore, this feature is 0 o'clock, most likely the Class B, but there are still 40 individuals are not in category B, so the error rate of 0 points to Category B is 40%. Then, all features are counted, all feature error rates are computed, and the lowest error rate is selected as the only classification criterion-this is oner.

Now use the code to implement the algorithm.

#Oner Algorithm ImplementationImportNumPy as NP fromSklearn.datasetsImportLoad_iris#loading the iris data setDataSet =Load_iris ()#loading the data array in the iris dataset (characteristics of the dataset)X =Dataset.data#load the target array in the Iris dataset (the category of the dataset)Y_true =Dataset.target#calculate the average of each featureAttribute_means = X.mean (axis=0)#compared to the average, the value is greater than or equal to "1", and less than "0". The eigenvalues of continuity are transformed into categorical types with discrete properties. x = Np.array (x >= Attribute_means, dtype="int") fromSklearn.model_selectionImportTrain_test_splitx_train, X_test, Y_train, Y_test= Train_test_split (x, Y_true, random_state=14) fromoperatorImportItemgetter fromCollectionsImportdefaultdict#find the category that belongs to a different value under a feature. deftrain_feature_class (x, Y_true, Feature_index, feature_values): Num_class=defaultdict (int) forSample, Yinchzip (x, y_true):ifSample[feature_index] = =Feature_values:num_class[y]+ = 1#sorting to find the most categories. Sort from large to smallSorted_num_class = sorted (Num_class.items (), Key=itemgetter (1), reverse=True) Most_frequent_class=Sorted_num_class[0][0] Error= SUM (value_num forClass_num, Value_numinchSorted_num_classifClass_num! =Most_frequent_class)returnMost_frequent_class, Error#Print Train_feature_class (x_train, Y_train, 0, 1)#Next, define a function that takes a feature as an argument to find the best feature with the lowest error rate and the category under which the characteristic values belong. deftrain_feature (x, Y_true, Feature_index): N_sample, N_feature=X.shapeassert0 <= Feature_index <n_feature Value=Set (x[:, Feature_index]) predictors={} errors= []     forCurrent_valueinchValue:most_frequent_class, Error=train_feature_class (x, Y_true, Feature_index, Current_value) Predictors[current_value]=most_frequent_class errors.append (Error) Total_error=sum (Errors)returnpredictors, Total_error#find the categories of each characteristic value under all features, such as: {0: ({0:0, 1:2}, 41)} First a dictionary, the dictionary key is a feature, the value of the dictionary is composed of a set, the set is a dictionary and a value, the dictionary key is the eigenvalues, the dictionary value is the category, The last individual value is the error rate. All_predictors = {feature:train_feature (X_train, Y_train, feature) forFeatureinchXrange (x_train.shape[1])}#Print All_predictors#filter out the error rate under each featureErrors = {Feature:error forfeature, (mapping, error)inchall_predictors.items ()}#Order the error rate, get the best feature and the lowest error rate, as the model and rule. This is the one Rule (OneR) algorithm. Best_feature, Best_error = sorted (Errors.items (), Key=itemgetter (1), reverse=False) [0]#print "The best model was based on feature {0} and have error {1:.2f}". Format (best_feature, Best_error)#Print All_predictors[best_feature][0]#Building a modelModel = {"feature": Best_feature,"Predictor": all_predictors[best_feature][0]}#Print Model#Start testing-classify the category of eigenvalues under the optimal feature. defPredict (X_test, model): Feature= model["feature"] Predictor= model["Predictor"] Y_predictor= Np.array ([Predictor[int (Sample[feature])] forSampleinchX_test]) returnY_predictory_predictor=Predict (x_test, model)#Print Y_predictor#under this optimal feature, the category of each eigenvalue is relative to the test data set, and the accuracy rate is obtained. accuracy = Np.mean (Y_predictor = = y_test) * 100Print "The test accuracy is {0:.2f}%". Format (accuracy) fromSklearn.metricsImportClassification_report#Print (Classification_report (y_test, y_predictor))

Summary: Oner algorithm, I thought at the beginning that it was to find a characteristic of the lowest error rate can be judged by the classification of all features,
In fact, now understand that it can only judge the characteristics of the characteristics of the classification of the values, so it is obvious that it will have some limitations. Just saying
It is relatively quick and simple and clear. However, it is up to the situation to decide whether to use it.

Class Precision recall F1-score support

0 0.94 1.00) 0.97 17
1 0.00 0.00) 0.00 13
2 0.40 1.00) 0.57 8

Avg/total 0.51 0.66 0.55 38

Note:

# in the code above.
For sample in X_test:
Print Sample[0]
# Get the first column of data from X_test. The following code is used to get the first line of data for X_test.
Print X_test[0]
# Note the difference between the two

Python algorithm walkthrough--one Rule algorithm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.