Oner algorithm and oner usage in WEKA-Data Mining and WEKA usage (4)

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Oner Algorithm Introduction

Oner, also known as 1-r, is an extremely simple Classification Algorithm Model of 1993, which can generate a single-layer decision tree.

The oner algorithm is a simple and inexpensive method, but it can often obtain a very good result to describe the structure of data.

The oner algorithm is widely used to obtain a general understanding of data, and sometimes even directly obtain results.

Oner Algorithm Implementation

The idea of oner is very simple. It establishes a rule that only tests a single attribute and performs different branches. Different attribute values for each branch.

The branch class is the class with the most appearance of raw data (training data) on this branch.

Each attribute generates a different rule set, and each rule corresponds to each value of this attribute. Evaluate the error rate of the rule set for each attribute value and select the one with the best effect.

PseudoCodeStatement:

For each attribute

For each attribute value of this attribute, create the following rules

Calculate the frequency of occurrence of each category

Find the most frequent category

Create a rule and assign this category to this attribute value

Calculate the rule Error Rate

Select the rule with the smallest error rate

In a simple example, the data uses the weather dataset that comes with WEKA.

For each attribute, there are five. The last one is the result we want to output, so there are only four attribute values. Outlook, temperature, humidity, and windy.

Calculate the outlook attribute first. It has three attribute values: Sunny, rainy, and overcast.

For the value sunny, there are a total of five data records.

There are 3 corresponding to play as NO, 2 corresponding to play as yes, and the most frequent is no. Therefore, Sunny is assigned no.

Similarly, for rainy, there are five records.

There are 3 corresponding to play as yes, 2 corresponding to play as no, and the most frequent is yes, so the value of Rainy is yes.

Similarly, if the overcast attribute value is calculated, the value is yes.

Then calculate the error rate

Sunny-> NO contains three correct categories and two wrong categories, with an error of 0.4.

Rainy-> yes, the error is 0.4.

Overcast-> yes, the error is 0.

Outlook error 4/14

The temperature, humidity, and windy attributes are calculated in sequence, and the error and total error are calculated. Then select the smallest error (if the difference is the same, it is random or the stability is high ).

The final result is

Sunny->Norainy->Yesovercast-> Yes

Use WEKA to implement the oner Algorithm

WEKA itself has implemented the oner algorithm, which is located in the WEKA. classifiers. Rules package.

Oner can input a parameter. If it is a continuous value and you want to discretization them, you can specify a bucket size.

 Instances instances = datasource. Read ("Data/weather. ARFF"); Instances. setclassindex (instances. numattributes ()-1); System. Out. println (instances. tosummarystring (); oner=NewOner (); oner. setdebug (False); Oner. setminbucketsize (6); Oner. buildclassifier (instances); system. Out. println (oner. tostring ());

Effect:

If you set the bucket size to 1, the results will be very different.

Obviously, the previous result is much more useful.

References

For more information about oner, see r.c. Holte in 1993.ArticleVery simple classification rules perform well on most commonly used datasets. The page number is 63-91.

Attached: http://www.ctdisk.com/file/6000694

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Oner algorithm and oner usage in WEKA-Data Mining and WEKA usage (4)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Oner algorithm and oner usage in WEKA-Data Mining and WEKA usage (4)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support