Decision Tree in Accord.net

Last Update:2016-05-03 Source: Internet

Author: User

Tags id3

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Decision Tree in Accord.net

Decision Tree Introduction

Decision tree is a kind of machine learning algorithm, which can classify and predict data sets. Please read my other blog (http://www.cnblogs.com/twocold/p/5424517.html) for details.

Accord.net

Accord.net (http://accord-framework.net/) is an open source. NET environment to implement the machine learning algorithm library. And also includes computer vision, image processing, data analysis and many other algorithms, and is basically written in C #, for. NET programmers are very friendly. The code is hosted on GitHub and is still in maintenance. (Https://github.com/accord-net/framework). No details are given here, and interested can go to the official website or github to download the documentation and code in-depth. The implementation and usage of the decision Tree section are simply described here.

Decision tree structure

The decision tree, as the name implies, is definitely one and tree structure, as one of the most basic data structures, we know the flexibility of tree structure. So how does accord.net achieve this structure? See class Diagram

First look at the most important structure in the tree structure, the class diagram of the node class is as follows:

A brief introduction to the main attribute methods.

properties	< P align= "center" meaning
isleaf
isroot	is the root node
output
value
branches

There are also tree structures:

Properties, Methods	Meaning
Root	Root node
Attributes	Identifying information for each feature (continuous, discrete, range)
Inputcount	Number of features
Outputclasses	Number of output categories
Compute ()	Calculate the category information for a sample
Load (), Save ()	Store the decision tree in a file or read
Toassembly ()	Storing to a DLL assembly

There are other dependencies that are no longer introduced, and are more clearly explained in Accord's official documentation.

The main want to say is id3learning and c45learning two classes. This is the accord.net implementation of the two decision tree Learning (Training) algorithm, ID3 algorithm and C4.5 algorithm (ID iterative dichotomiser abbreviation, iterative splitter; c is the abbreviation for classifier, that is, the 4.5 generation classifier). The difference between the two is described later.

Decision Tree Learning algorithm:

Here is a classic example of playing tennis, introducing the learning process of the ID3 algorithm. To understand that the following code may need to have a basic understanding of the learning process of the decision tree, you can refer to the basic concepts of the decision tree under the link learning from the beginning.

Mitchell ' s tennis Example
Day	Outlook	Temperature	Humidity	Wind	Playtennis
D1	Sunny	Hot	High	Weak	No
D2	Sunny	Hot	High	Strong	No
D3	Overcast	Hot	High	Weak	Yes
D4	Rain	Mild	High	Weak	Yes
D5	Rain	Cool	Normal	Weak	Yes
D6	Rain	Cool	Normal	Strong	No
D7	Overcast	Cool	Normal	Strong	Yes
D8	Sunny	Mild	High	Weak	No
D9	Sunny	Cool	Normal	Weak	Yes
D10	Rain	Mild	Normal	Weak	Yes
D11	Sunny	Mild	Normal	Strong	Yes
D12	Overcast	Mild	High	Strong	Yes
D13	Overcast	Hot	Normal	Weak	Yes
D14	Rain	Mild	High	Strong	No

First of all, in order to further construct the decision tree, we need to simplify the above data, the string storage and comparison will consume a lot of memory space, and reduce efficiency. Considering that all features are discrete features, you can directly use the simplest integer representation on the line, as long as the corresponding relationship between the number and the string is saved. Accord.net with the codebook to achieve, here is not specifically introduced. Some properties of the tree need to be initialized, such as the number of features (Inputcount), and the number of categories (outputclasses). There are also the possible number of values for each feature. It is then possible to construct the sample data that is Codebook escaped above.

The following is a pseudo-code of the recursive method in the ID3 algorithm, roughly explaining its implementation logic (note: This code deletes a lot of details, so it can't run, only about its implementation logic.) ）。

        <summary>/////Decision Tree Learning Recursive method////</summary>//<param name= "root" > Current recursion Junction </param>//<param name= "input" > Input sample Feature </param>//<param name= "Output" > sample corresponding category &lt ;/param>//<param name= "height" > Current node layer </param> private void split (Decisionnode root, int[][ ] input, int[] output, int height) {//recursive return condition//1. If output[] is equal, that is, all current sample categories are the same, the recursion ends.            The nodes are marked as leaf node, and the output value is identified as the sample category value Double entropy = Statistics.Tools.Entropy (output, outputclasses); if (entropy = = 0) {if (output. Length > 0) root.                Output = output[0];            Return }//2. If all the features on the current path have been used once, that is to say, all the samples now have the same value on all the features, it is impossible to divide them; recursion ends. The node is marked as a leaf, and the output value identifies the one that has the most value for the sample category./This variable stores the number of features that have not been used, int candidatecount = Attributeusagecount.count (x            = x < 1); if (Candidatecount = = 0) {root.                Output = Statistics.Tools.Mode (output);            Return }//If you need to continue splitting, first look for the optimal split feature//store all the remaining information gain size of the feature double[] scores = new Double[candidatecou            NT]; The loop calculates the information gain for each feature split when stored in scores parallel.for (0, scores.                    Length, i = {Scores[i] = computegainratio (input, output, candidates[i],            Entropy, out Partitions[i], out outputsubs[i]); }//Gets the maximum information gain corresponding to the characteristic int maxgainindex = scores.            Max ();            Next, you need to split the current dataset by the value of the feature and pass it to the child node recursive decisionnode[] children = new Decisionnode[maxgainpartition.length]; for (int i = 0; I < children. Length; i++) {int[][] Inputsubset = input.                Submatrix (Maxgainpartition[i]); Split (Children[i], inputsubset, outputsubset, height + 1); Recursively each child node} root.        Branches.addrange (children); }

This code is only easy to understand, specific implementation details please download accord source code to read, I believe you will have a lot of harvest.

The implementation of C4.5 is basically the same as the ID3 algorithm, there are several differences

1) When selecting the optimal segmentation feature, the ID3 algorithm uses the information gain, and the C4.5 uses the gain rate.

2) C4.5 supports continuous features, so before recursion, we should use dichotomy to calculate the n-1 candidate partition points, and treat these dividing points as discrete variables and the ID3 process is consistent. It is also because of continuous variables, so that continuous features can be used multiple times in a single path, while discrete features are used only once.

3) C4.5 supports the processing of missing values, but unfortunately accord does not include this feature.

A simple pruning algorithm is given in Accord.net, which is interesting to read by itself.

In the example above, here is a code example for constructing and training a decision tree in Accord.net.

           Data input is stored as DataTable datatable data = new DataTable ("Mitchell ' s Tennis Example"); Data.            Columns.Add ("Day"); Data.            Columns.Add ("Outlook"); Data.            Columns.Add ("Temperature"); Data.            Columns.Add ("humidity"); Data.            Columns.Add ("Wind"); Data.            Columns.Add ("Playtennis"); Data.            Rows.Add ("D1", "Sunny", "hot", "High", "Weak", "No"); Data.            Rows.Add ("D2", "Sunny", "hot", "high", "strong", "No"); Data.            Rows.Add ("D3", "Overcast", "hot", "High", "Weak", "Yes"); Data.            Rows.Add ("D4", "Rain", "Mild", "High", "Weak", "Yes"); Data.            Rows.Add ("D5", "Rain", "Cool", "Normal", "Weak", "Yes"); Data.            Rows.Add ("D6", "Rain", "Cool", "Normal", "strong", "No"); Data.            Rows.Add ("D7", "Overcast", "Cool", "Normal", "strong", "Yes"); Data.            Rows.Add ("D8", "Sunny", "Mild", "High", "Weak", "No"); Data. Rows.Add ("D9", "Sunny", "Cool", "Normal", "Weak", "Yes"); Data.            Rows.Add ("D10", "Rain", "Mild", "Normal", "Weak", "Yes"); Data.            Rows.Add ("D11", "Sunny", "Mild", "Normal", "strong", "Yes"); Data.            Rows.Add ("D12", "Overcast", "Mild", "High", "strong", "Yes"); Data.            Rows.Add ("D13", "Overcast", "hot", "Normal", "Weak", "Yes"); Data.            Rows.Add ("D14", "Rain", "Mild", "High", "strong", "No"); Creates a Codebook object that is used to "translate" the string in data into an integer codification codebook = new Codification (data, "Outlook", "T            Emperature "," humidity "," wind "," Playtennis "); The sample Feature data section and category information in data are converted to the array DataTable symbols = codebook.            Apply (data);            int[][] Inputs = matrix.toarray<double> (symbols, "Outlook", "Temperature", "humidity", "wind");            int[] outputs = matrix.toarray<int> (symbols, "Playtennis");           Analyze the information for each feature, such as the number of desirable values for each feature. decisionvariable[] attributes = Decisionvariable.fromcodebook (codebook, "Outlook", "Temperature", "HumiDity "," Wind "); int classcount = 2;            Two possible outputs, playing tennis and not playing//initializing a tree structure according to the parameters DecisionTree tree = new DecisionTree (attributes, ClassCount);            Create a ID3 training method id3learning id3learning = new id3learning (tree); Train the decision tree id3learning.            Run (inputs, outputs); You can now use the completed decision tree to predict a sample and come back with the codebook "translate" string answer = Codebook. Translate ("Playtennis", Tree.compute (codebook. Translate ("Sunny", "hot", "high", "Strong")));

Stick to a small example using a decision tree.

Tags: machine learning,. Net

Decision Tree in Accord.net

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More