Oner algorithm and oner usage in WEKA-Data Mining and WEKA usage (4)

Source: Internet
Author: User
Oner Algorithm Introduction

Oner, also known as 1-r, is an extremely simple Classification Algorithm Model of 1993, which can generate a single-layer decision tree.

The oner algorithm is a simple and inexpensive method, but it can often obtain a very good result to describe the structure of data.

The oner algorithm is widely used to obtain a general understanding of data, and sometimes even directly obtain results.

Oner Algorithm Implementation

The idea of oner is very simple. It establishes a rule that only tests a single attribute and performs different branches. Different attribute values for each branch.

The branch class is the class with the most appearance of raw data (training data) on this branch.

 

Each attribute generates a different rule set, and each rule corresponds to each value of this attribute. Evaluate the error rate of the rule set for each attribute value and select the one with the best effect.

PseudoCodeStatement:

For each attribute

For each attribute value of this attribute, create the following rules

Calculate the frequency of occurrence of each category

Find the most frequent category

Create a rule and assign this category to this attribute value

Calculate the rule Error Rate

Select the rule with the smallest error rate

 

In a simple example, the data uses the weather dataset that comes with WEKA.

For each attribute, there are five. The last one is the result we want to output, so there are only four attribute values. Outlook, temperature, humidity, and windy.

Calculate the outlook attribute first. It has three attribute values: Sunny, rainy, and overcast.

For the value sunny, there are a total of five data records.

There are 3 corresponding to play as NO, 2 corresponding to play as yes, and the most frequent is no. Therefore, Sunny is assigned no.

Similarly, for rainy, there are five records.

There are 3 corresponding to play as yes, 2 corresponding to play as no, and the most frequent is yes, so the value of Rainy is yes.

Similarly, if the overcast attribute value is calculated, the value is yes.

 

Then calculate the error rate

Sunny-> NO contains three correct categories and two wrong categories, with an error of 0.4.

Rainy-> yes, the error is 0.4.

Overcast-> yes, the error is 0.

Outlook error 4/14

 

The temperature, humidity, and windy attributes are calculated in sequence, and the error and total error are calculated. Then select the smallest error (if the difference is the same, it is random or the stability is high ).

The final result is

Sunny->Norainy->Yesovercast-> Yes

 

Use WEKA to implement the oner Algorithm

WEKA itself has implemented the oner algorithm, which is located in the WEKA. classifiers. Rules package.

Oner can input a parameter. If it is a continuous value and you want to discretization them, you can specify a bucket size.

 
Instances instances = datasource. Read ("Data/weather. ARFF"); Instances. setclassindex (instances. numattributes ()-1); System. Out. println (instances. tosummarystring (); oner=NewOner (); oner. setdebug (False); Oner. setminbucketsize (6); Oner. buildclassifier (instances); system. Out. println (oner. tostring ());

Effect:

If you set the bucket size to 1, the results will be very different.

Obviously, the previous result is much more useful.

References

For more information about oner, see r.c. Holte in 1993.ArticleVery simple classification rules perform well on most commonly used datasets. The page number is 63-91.

Attached: http://www.ctdisk.com/file/6000694

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.