[Orange] use orange

Source: Internet
Author: User

Http://blog.csdn.net/yiweis/article/category/1315006

Orange Data Format

In addition to C4.5 and other formats, data mining tool orange also has its own data format.

Native Data Format

Unlike C4.5, the native data format consists of multiple files, but a single file. This file ends with. tab.

The first line shows the name of the Data Attribute. The class name is separated by a tab.

The second row shows the data type. Continuous Data is represented by C, and discontinuous data is represented by D.

The third row provides additional information about the data. For example, it indicates that a column is a class, or ignore a column in the mining process and use I to represent it.

The following is a famous exampleTail flowerData:

Sepal length sepal width petal length petal width Iris
C d
Class
5.1 3.5 1.4 0.2 iris-setosa
4.9 3.0 1.4 0.2 iris-setosa
4.7 3.2 1.3 0.2 iris-setosa
4.6 3.1 1.5 0.2 iris-setosa
5.0 3.6 1.4 0.2 iris-setosa
5.4 3.9 1.7 0.4 iris-setosa
4.6 3.4 1.4 0.3 iris-setosa
5.0 3.4 1.5 0.2 iris-setosa
4.4 2.9 1.4 0.2 iris-setosa
4.9 3.1 1.5 0.1 iris-setosa
5.4 3.7 1.5 0.2 iris-setosa
4.8 3.4 1.6 0.2 iris-setosa
4.8 3.0 1.4 0.1 iris-setosa
4.3 3.0 1.1 0.1 iris-setosa
5.8 4.0 1.2 0.2 iris-setosa
5.7 4.4 1.5 0.4 iris-setosa
5.4 3.9 1.3 0.4 iris-setosa
5.1 3.5 1.4 0.3 iris-setosa
5.7 3.8 1.7 0.3 iris-setosa

......

For C4.5 data format, refer to here

Http://www.cs.washington.edu/dm/vfml/appendixes/c45.htm

 

Ii. Clustering

Import Orange # load data = Orange. data. table ("Iris") # hierarchical clustering. By default, the similarity between clusters is calculated on average in groups. Root = Orange. clustering. hierarchical. clustering (data) Labels = [STR (D. get_class () for D in dataworks outputs the generated image hclust-dendrogram.png orange. clustering. hierarchical. dendrogram_draw ("hclust-dendrogram.png", root, labels = labels)
Import Orange # load data Iris = Orange. data. table ('iris ') KNN = Orange. classification. KNN. knnlearner (IRIS, K = 10) For I in IRIS: # output the part of the prediction result that is different from the actual result if I. getclass ()! = KNN (I): print I. getclass (), KNN (I)

 

Iii. C4.5 decision tree

Install orange C4.5

  1. Download: http://www.rulequest.com/personal/c4.5r8.tar.gzand decompress
  2. Download ensemble. c buildc45.py to the SRC subfolder of the folder decompressed in the previous step.
  3. Run the buildc45.py File
import Orangeiris = Orange.data.Table("iris")tree = Orange.classification.tree.C45Learner(iris)print "\n\nC4.5 with default arguments"for i in iris[:5]:    print tree(i), i.getclass()

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.