Oner Algorithm Introduction
Oner, also known as 1-r, is an extremely simple Classification Algorithm Model of 1993, which can generate a single-layer decision tree.
The oner algorithm is a simple and inexpensive method, but it can often obtain a very good result to describe the structure of data.
The oner algorithm is widely used to obtain a general understanding of data, and sometimes even directly obtain results.
Oner Algorithm Implementation
The idea of oner is very simple. It establishes a rule that only tests a single attribute and performs different branches. Different attribute values for each branch.
The branch class is the class with the most appearance of raw data (training data) on this branch.
Each attribute generates a different rule set, and each rule corresponds to each value of this attribute. Evaluate the error rate of the rule set for each attribute value and select the one with the best effect.
PseudoCodeStatement:
For each attribute
For each attribute value of this attribute, create the following rules
Calculate the frequency of occurrence of each category
Find the most frequent category
Create a rule and assign this category to this attribute value
Calculate the rule Error Rate
Select the rule with the smallest error rate
In a simple example, the data uses the weather dataset that comes with WEKA.
For each attribute, there are five. The last one is the result we want to output, so there are only four attribute values. Outlook, temperature, humidity, and windy.
Calculate the outlook attribute first. It has three attribute values: Sunny, rainy, and overcast.
For the value sunny, there are a total of five data records.
There are 3 corresponding to play as NO, 2 corresponding to play as yes, and the most frequent is no. Therefore, Sunny is assigned no.
Similarly, for rainy, there are five records.
There are 3 corresponding to play as yes, 2 corresponding to play as no, and the most frequent is yes, so the value of Rainy is yes.
Similarly, if the overcast attribute value is calculated, the value is yes.
Then calculate the error rate
Sunny-> NO contains three correct categories and two wrong categories, with an error of 0.4.
Rainy-> yes, the error is 0.4.
Overcast-> yes, the error is 0.
Outlook error 4/14
The temperature, humidity, and windy attributes are calculated in sequence, and the error and total error are calculated. Then select the smallest error (if the difference is the same, it is random or the stability is high ).
The final result is
Sunny->Norainy->Yesovercast-> Yes
Use WEKA to implement the oner Algorithm
WEKA itself has implemented the oner algorithm, which is located in the WEKA. classifiers. Rules package.
Oner can input a parameter. If it is a continuous value and you want to discretization them, you can specify a bucket size.
Instances instances = datasource. Read ("Data/weather. ARFF"); Instances. setclassindex (instances. numattributes ()-1); System. Out. println (instances. tosummarystring (); oner=NewOner (); oner. setdebug (False); Oner. setminbucketsize (6); Oner. buildclassifier (instances); system. Out. println (oner. tostring ());
Effect:
If you set the bucket size to 1, the results will be very different.
Obviously, the previous result is much more useful.
References
For more information about oner, see r.c. Holte in 1993.ArticleVery simple classification rules perform well on most commonly used datasets. The page number is 63-91.
Attached: http://www.ctdisk.com/file/6000694