Decision Tree in Accord.net

Source: Internet
Author: User
Tags id3

Decision Tree in Accord.net

Decision Tree Introduction

Decision tree is a kind of machine learning algorithm, which can classify and predict data sets. Please read my other blog (http://www.cnblogs.com/twocold/p/5424517.html) for details.

Accord.net

Accord.net (http://accord-framework.net/) is an open source. NET environment to implement the machine learning algorithm library. And also includes computer vision, image processing, data analysis and many other algorithms, and is basically written in C #, for. NET programmers are very friendly. The code is hosted on GitHub and is still in maintenance. (Https://github.com/accord-net/framework). No details are given here, and interested can go to the official website or github to download the documentation and code in-depth. The implementation and usage of the decision Tree section are simply described here.

Decision tree structure

The decision tree, as the name implies, is definitely one and tree structure, as one of the most basic data structures, we know the flexibility of tree structure. So how does accord.net achieve this structure? See class Diagram

First look at the most important structure in the tree structure, the class diagram of the node class is as follows:

A brief introduction to the main attribute methods.

properties

< P align= "center" meaning

isleaf

isroot

is the root node

output

value

branches

There are also tree structures:

Properties, Methods

Meaning

Root

Root node

Attributes

Identifying information for each feature (continuous, discrete, range)

Inputcount

Number of features

Outputclasses

Number of output categories

Compute ()

Calculate the category information for a sample

Load (), Save ()

Store the decision tree in a file or read

Toassembly ()

Storing to a DLL assembly

There are other dependencies that are no longer introduced, and are more clearly explained in Accord's official documentation.

The main want to say is id3learning and c45learning two classes. This is the accord.net implementation of the two decision tree Learning (Training) algorithm, ID3 algorithm and C4.5 algorithm (ID iterative dichotomiser abbreviation, iterative splitter; c is the abbreviation for classifier, that is, the 4.5 generation classifier). The difference between the two is described later.

Decision Tree Learning algorithm:

Here is a classic example of playing tennis, introducing the learning process of the ID3 algorithm. To understand that the following code may need to have a basic understanding of the learning process of the decision tree, you can refer to the basic concepts of the decision tree under the link learning from the beginning.

Mitchell ' s tennis Example

Day

Outlook

Temperature

Humidity

Wind

Playtennis

D1

Sunny

Hot

High

Weak

No

D2

Sunny

Hot

High

Strong

No

D3

Overcast

Hot

High

Weak

Yes

D4

Rain

Mild

High

Weak

Yes

D5

Rain

Cool

Normal

Weak

Yes

D6

Rain

Cool

Normal

Strong

No

D7

Overcast

Cool

Normal

Strong

Yes

D8

Sunny

Mild

High

Weak

No

D9

Sunny

Cool

Normal

Weak

Yes

D10

Rain

Mild

Normal

Weak

Yes

D11

Sunny

Mild

Normal

Strong

Yes

D12

Overcast

Mild

High

Strong

Yes

D13

Overcast

Hot

Normal

Weak

Yes

D14

Rain

Mild

High

Strong

No

First of all, in order to further construct the decision tree, we need to simplify the above data, the string storage and comparison will consume a lot of memory space, and reduce efficiency. Considering that all features are discrete features, you can directly use the simplest integer representation on the line, as long as the corresponding relationship between the number and the string is saved. Accord.net with the codebook to achieve, here is not specifically introduced. Some properties of the tree need to be initialized, such as the number of features (Inputcount), and the number of categories (outputclasses). There are also the possible number of values for each feature. It is then possible to construct the sample data that is Codebook escaped above.

The following is a pseudo-code of the recursive method in the ID3 algorithm, roughly explaining its implementation logic (note: This code deletes a lot of details, so it can't run, only about its implementation logic.) )。

        <summary>/////Decision Tree Learning Recursive method////</summary>//<param name= "root" > Current recursion Junction </param>//<param name= "input" > Input sample Feature </param>//<param name= "Output" > sample corresponding category &lt ;/param>//<param name= "height" > Current node layer </param> private void split (Decisionnode root, int[][ ] input, int[] output, int height) {//recursive return condition//1. If output[] is equal, that is, all current sample categories are the same, the recursion ends.            The nodes are marked as leaf node, and the output value is identified as the sample category value Double entropy = Statistics.Tools.Entropy (output, outputclasses); if (entropy = = 0) {if (output. Length > 0) root.                Output = output[0];            Return }//2. If all the features on the current path have been used once, that is to say, all the samples now have the same value on all the features, it is impossible to divide them; recursion ends. The node is marked as a leaf, and the output value identifies the one that has the most value for the sample category./This variable stores the number of features that have not been used, int candidatecount = Attributeusagecount.count (x            = x < 1); if (Candidatecount = = 0) {root.                Output = Statistics.Tools.Mode (output);            Return }//If you need to continue splitting, first look for the optimal split feature//store all the remaining information gain size of the feature double[] scores = new Double[candidatecou            NT]; The loop calculates the information gain for each feature split when stored in scores parallel.for (0, scores.                    Length, i = {Scores[i] = computegainratio (input, output, candidates[i],            Entropy, out Partitions[i], out outputsubs[i]); }//Gets the maximum information gain corresponding to the characteristic int maxgainindex = scores.            Max ();            Next, you need to split the current dataset by the value of the feature and pass it to the child node recursive decisionnode[] children = new Decisionnode[maxgainpartition.length]; for (int i = 0; I < children. Length; i++) {int[][] Inputsubset = input.                Submatrix (Maxgainpartition[i]); Split (Children[i], inputsubset, outputsubset, height + 1); Recursively each child node} root.        Branches.addrange (children); }

This code is only easy to understand, specific implementation details please download accord source code to read, I believe you will have a lot of harvest.

The implementation of C4.5 is basically the same as the ID3 algorithm, there are several differences

1) When selecting the optimal segmentation feature, the ID3 algorithm uses the information gain, and the C4.5 uses the gain rate.

2) C4.5 supports continuous features, so before recursion, we should use dichotomy to calculate the n-1 candidate partition points, and treat these dividing points as discrete variables and the ID3 process is consistent. It is also because of continuous variables, so that continuous features can be used multiple times in a single path, while discrete features are used only once.

3) C4.5 supports the processing of missing values, but unfortunately accord does not include this feature.

A simple pruning algorithm is given in Accord.net, which is interesting to read by itself.

In the example above, here is a code example for constructing and training a decision tree in Accord.net.

           Data input is stored as DataTable datatable data = new DataTable ("Mitchell ' s Tennis Example"); Data.            Columns.Add ("Day"); Data.            Columns.Add ("Outlook"); Data.            Columns.Add ("Temperature"); Data.            Columns.Add ("humidity"); Data.            Columns.Add ("Wind"); Data.            Columns.Add ("Playtennis"); Data.            Rows.Add ("D1", "Sunny", "hot", "High", "Weak", "No"); Data.            Rows.Add ("D2", "Sunny", "hot", "high", "strong", "No"); Data.            Rows.Add ("D3", "Overcast", "hot", "High", "Weak", "Yes"); Data.            Rows.Add ("D4", "Rain", "Mild", "High", "Weak", "Yes"); Data.            Rows.Add ("D5", "Rain", "Cool", "Normal", "Weak", "Yes"); Data.            Rows.Add ("D6", "Rain", "Cool", "Normal", "strong", "No"); Data.            Rows.Add ("D7", "Overcast", "Cool", "Normal", "strong", "Yes"); Data.            Rows.Add ("D8", "Sunny", "Mild", "High", "Weak", "No"); Data. Rows.Add ("D9", "Sunny", "Cool", "Normal", "Weak", "Yes"); Data.            Rows.Add ("D10", "Rain", "Mild", "Normal", "Weak", "Yes"); Data.            Rows.Add ("D11", "Sunny", "Mild", "Normal", "strong", "Yes"); Data.            Rows.Add ("D12", "Overcast", "Mild", "High", "strong", "Yes"); Data.            Rows.Add ("D13", "Overcast", "hot", "Normal", "Weak", "Yes"); Data.            Rows.Add ("D14", "Rain", "Mild", "High", "strong", "No"); Creates a Codebook object that is used to "translate" the string in data into an integer codification codebook = new Codification (data, "Outlook", "T            Emperature "," humidity "," wind "," Playtennis "); The sample Feature data section and category information in data are converted to the array DataTable symbols = codebook.            Apply (data);            int[][] Inputs = matrix.toarray<double> (symbols, "Outlook", "Temperature", "humidity", "wind");            int[] outputs = matrix.toarray<int> (symbols, "Playtennis");           Analyze the information for each feature, such as the number of desirable values for each feature. decisionvariable[] attributes = Decisionvariable.fromcodebook (codebook, "Outlook", "Temperature", "HumiDity "," Wind "); int classcount = 2;            Two possible outputs, playing tennis and not playing//initializing a tree structure according to the parameters DecisionTree tree = new DecisionTree (attributes, ClassCount);            Create a ID3 training method id3learning id3learning = new id3learning (tree); Train the decision tree id3learning.            Run (inputs, outputs); You can now use the completed decision tree to predict a sample and come back with the codebook "translate" string answer = Codebook. Translate ("Playtennis", Tree.compute (codebook. Translate ("Sunny", "hot", "high", "Strong")));

Stick to a small example using a decision tree.

Tags: machine learning,. Net

Decision Tree in Accord.net

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.