Ml-Decision Tree Algorithm implementation (TRAIN+TEST,MATLAB) _

Ml-Decision Tree Algorithm implementation (TRAIN+TEST,MATLAB) __ Machine learning

Last Update:2018-08-21 Source: Internet

Author: User

Tags strcmp

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Huadian North Wind Blows
Key laboratory of cognitive computing and application, Tianjin University
Modification Date: 2015/8/15

Decision tree is a very simple machine learning classification algorithm. The decision tree idea comes from the human decision-making process. For the simplest example, when humans find it raining, they tend to have an easterly wind and then darken the sky. Corresponding to the decision tree model, predicting the wind and dark in the weather model is the feature we collect, whether it rains is the category label. The decision tree is constructed as shown in the following illustration

The process of decision tree model construction is that the recursive selected feature which is not put back in the feature set as the node of decision tree--the current node information gain or gain rate is the largest, the value of the current node as a forward side of the current node (in fact, the main selection is these edges, which is derived from the information gain calculation formula can be obtained). For a visual explanation of this
In an extreme case, if there is a feature that takes a different value, the corresponding category label is pure, and the decision maker will definitely choose this feature as a criterion for identifying unknown data. The following formula for calculating the gain of information can be found to be the maximum information gain at this time.
G (D,a) =h (D)-H (d| A
G (D,a): Representation of the information gain of feature A on training dataset D
H (d): Representing the empirical entropy of data set D
H (d| A): Represents the conditional entropy of a data set D under a given condition of feature a.
Conversely, when a feature of its various values of the corresponding category label evenly distributed in the time of H (d| A) is the largest and is the same for all feature H (D). Therefore, this time the G (d,a) is minimal.
In a word, we have to pick the characteristics of the current characteristics of the various values contained in the classification of the most specific information.
Here we look at a matlab written decision Tree algorithm to help understand
The tree termination condition is
1, the feature number is empty
2, the tree is pure
3, information gain or gain rate is less than valve value

Part of Model training
Training Model main function:

function Decisiontreemodel=decisiontree (Data,label,propertyname,delta)

global Node;

Node=struct (' Level ', -1, ' fathernodename ', [], ' edgeproperty ', [], ' nodename ', []);
Buildtree ( -1, ' root ', ' Stem ', data,label,propertyname,delta);
Node (1) =[];
Model. Node=node;
Decisiontreemodel=model;

Recursive construction of decision tree parts

function Buildtree (fatherlevel,fathernodename,edge,data,label,propertyname,delta) global Node;
Sonnode=struct (' level ', 0, ' fathernodename ', [], ' edgeproperty ', [], ' nodename ', []);
sonnode.level=fatherlevel+1;
Sonnode.fathernodename=fathernodename;
Sonnode.edgeproperty=edge;
    If Length (unique (label)) ==1 Sonnode.nodename=label (1);
    Node=[node Sonnode];
Return
    End If Length (PropertyName) <1 labelset=unique (label);
    K=length (Labelset);
    Labelnum=zeros (k,1);
    For I=1:k Labelnum (i) =length (Find (Label==labelset (i)));
    End [~,labelindex]=max (Labelnum);
    Sonnode.nodename=labelset (Labelindex);
    Node=[node Sonnode];
Return
End [Sonindex,buildnode]=calcutenode (Data,label,delta);
    If Buildnode Datarowindex=setdiff (1:length (PropertyName), Sonindex);
    Sonnode.nodename=propertyname{sonindex};
    Node=[node Sonnode];
    PropertyName (Sonindex) =[];
    Sondata=data (:, Sonindex);

    Sonedge=unique (Sondata); For I=1:length (Sonedge) EdgedataiNdex=find (Sondata==sonedge (i)); Buildtree (Sonnode.level,sonnode.nodename,sonedge (i), data (Edgedataindex,datarowindex), label (Edgedataindex,:),
    Propertyname,delta);
    End Else Labelset=unique (label);
    K=length (Labelset);
    Labelnum=zeros (k,1);
    For I=1:k Labelnum (i) =length (Find (Label==labelset (i)));
    End [~,labelindex]=max (Labelnum);
    Sonnode.nodename=labelset (Labelindex);
    Node=[node Sonnode];
Return End

To compute a node feature in a decision tree

function [Nodeindex,buildnode]=calcutenode (Data,label,delta)

largeentropy=centropy (label);
[M,n]=size (data);
Entropygain=largeentropy*ones (1,n);
Buildnode=true;
For I=1:n
    pdata=data (:, i);
    Itemlist=unique (pData);
    For J=1:length (itemList)
        Itemindex=find (Pdata==itemlist (j));
        Entropygain (i) =entropygain (i)-length (itemindex)/m*centropy (label (itemindex));
    End
    % Here is the gain rate, the annotation is gain
    % entropygain (i) =entropygain (i)/centropy (pData); 
End
[Maxgainentropy,nodeindex]=max (Entropygain);
If Maxgainentropy<delta
    buildnode=false;
End

Computational entropy

function Result=centropy (propertylist)

result=0;
Totallength=length (propertylist);
Itemlist=unique (propertylist);
Pnum=length (itemList);
For I=1:pnum
    itemlength=length (propertylist==itemlist (i)));
    Pitem=itemlength/totallength;
    RESULT=RESULT-PITEM*LOG2 (pitem);
End

Two, model prediction
The following function is based on a trained decision tree model, input test sample sets and feature names, and predict output results for each test sample.

function Label=decisiontreetest (decisiontreemodel,sampleset,propertyname) lengthsample=size (sampleSet,1);
Label=zeros (lengthsample,1);
    For Sampleindex=1:lengthsample Sample=sampleset (Sampleindex,:);
    Nodes=decisiontreemodel.node;
    Rootnode=nodes (1);
    Head=rootnode.nodename;
    Index=getfeaturenum (Propertyname,head);
    Edge=sample (index);
    I=s;
    level=1;
        While K<length (Nodes) k=k+1; If Nodes (k). Level==level If strcmp (Nodes (k). Fathernodename,head) if Nodes (k). Edgeproperty==edge if Nodes (k). Nodename<10 label (Sampleindex) =nodes (k).
                        NodeName;
                    Break else Head=nodes (k).
                        NodeName;
                        Index=getfeaturenum (Propertyname,head);
                        Edge=sample (index);
                    level=level+1; End, end, end, end

Because the node name is stored in the well trained decision tree model, it is necessary to get the feature of the section names corresponding to the prediction. The following function is for the convenience of getting the feature dimension ordinal number.

function Result=getfeaturenum (PROPERTYNAME,STR)
result=0;
For I=1:length (PropertyName)
    if strcmp (propertyname{i},str) ==1
        result=i;
        break;
    End End

Iii. Decision Tree Experiment
This is a lot of books have an example, you can see the accuracy of the prediction results 100%.

CLEAR;CLC;
% outlooktype=struct (' Sunny ', 1, ' rainy ', 2, ' overcast ', 3);
% temperaturetype=struct (' hot ', 1, ' warm ', 2, ' cool ', 3);
% humiditytype=struct (' high ', 1, ' norm ', 2);
% windytype={' True ', 1, ' False ', 0};
% playgolf={' Yes ', 1, ' No ', 0};

% data=struct (' Outlook ', [], ' temperature ', [], ' humidity ', [], ' windy ', [], ' playgolf ', []);
outlook=[1,1,3,2,2,2,3,1,1,2,1,3,3,2] ';
temperature=[1,1,1,2,3,3,3,2,3,3,2,2,1,2] ';
humidity=[1,1,1,1,2,2,2,1,2,2,2,1,2,1] ';

windy=[0,1,0,0,0,1,1,0,0,0,1,1,0,1] ';
Data=[outlook temperature humidity windy];
playgolf=[0,0,1,1,1,0,1,0,1,1,1,1,1,0] ';
propertyname={' Outlook ', ' temperature ', ' humidity ', ' windy '};
delta=0.1;

Decisiontreemodel=decisiontree (Data,playgolf,propertyname,delta); Label=decisiontreetest (decisiontreemodel,data,propertyname);

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More