Ml-Decision Tree Algorithm implementation (TRAIN+TEST,MATLAB) __ Machine learning

Source: Internet
Author: User
Tags strcmp

Huadian North Wind Blows
Key laboratory of cognitive computing and application, Tianjin University
Modification Date: 2015/8/15

Decision tree is a very simple machine learning classification algorithm. The decision tree idea comes from the human decision-making process. For the simplest example, when humans find it raining, they tend to have an easterly wind and then darken the sky. Corresponding to the decision tree model, predicting the wind and dark in the weather model is the feature we collect, whether it rains is the category label. The decision tree is constructed as shown in the following illustration

The process of decision tree model construction is that the recursive selected feature which is not put back in the feature set as the node of decision tree--the current node information gain or gain rate is the largest, the value of the current node as a forward side of the current node (in fact, the main selection is these edges, which is derived from the information gain calculation formula can be obtained). For a visual explanation of this
In an extreme case, if there is a feature that takes a different value, the corresponding category label is pure, and the decision maker will definitely choose this feature as a criterion for identifying unknown data. The following formula for calculating the gain of information can be found to be the maximum information gain at this time.
G (D,a) =h (D)-H (d| A
G (D,a): Representation of the information gain of feature A on training dataset D
H (d): Representing the empirical entropy of data set D
H (d| A): Represents the conditional entropy of a data set D under a given condition of feature a.
Conversely, when a feature of its various values of the corresponding category label evenly distributed in the time of H (d| A) is the largest and is the same for all feature H (D). Therefore, this time the G (d,a) is minimal.
In a word, we have to pick the characteristics of the current characteristics of the various values contained in the classification of the most specific information.
Here we look at a matlab written decision Tree algorithm to help understand
The tree termination condition is
1, the feature number is empty
2, the tree is pure
3, information gain or gain rate is less than valve value

Part of Model training
Training Model main function:

function Decisiontreemodel=decisiontree (Data,label,propertyname,delta)

global Node;

Node=struct (' Level ', -1, ' fathernodename ', [], ' edgeproperty ', [], ' nodename ', []);
Buildtree ( -1, ' root ', ' Stem ', data,label,propertyname,delta);
Node (1) =[];
Model. Node=node;
Decisiontreemodel=model;

Recursive construction of decision tree parts

function Buildtree (fatherlevel,fathernodename,edge,data,label,propertyname,delta) global Node;
Sonnode=struct (' level ', 0, ' fathernodename ', [], ' edgeproperty ', [], ' nodename ', []);
sonnode.level=fatherlevel+1;
Sonnode.fathernodename=fathernodename;
Sonnode.edgeproperty=edge;
    If Length (unique (label)) ==1 Sonnode.nodename=label (1);
    Node=[node Sonnode];
Return
    End If Length (PropertyName) <1 labelset=unique (label);
    K=length (Labelset);
    Labelnum=zeros (k,1);
    For I=1:k Labelnum (i) =length (Find (Label==labelset (i)));
    End [~,labelindex]=max (Labelnum);
    Sonnode.nodename=labelset (Labelindex);
    Node=[node Sonnode];
Return
End [Sonindex,buildnode]=calcutenode (Data,label,delta);
    If Buildnode Datarowindex=setdiff (1:length (PropertyName), Sonindex);
    Sonnode.nodename=propertyname{sonindex};
    Node=[node Sonnode];
    PropertyName (Sonindex) =[];
    Sondata=data (:, Sonindex);

    Sonedge=unique (Sondata); For I=1:length (Sonedge) EdgedataiNdex=find (Sondata==sonedge (i)); Buildtree (Sonnode.level,sonnode.nodename,sonedge (i), data (Edgedataindex,datarowindex), label (Edgedataindex,:),
    Propertyname,delta);
    End Else Labelset=unique (label);
    K=length (Labelset);
    Labelnum=zeros (k,1);
    For I=1:k Labelnum (i) =length (Find (Label==labelset (i)));
    End [~,labelindex]=max (Labelnum);
    Sonnode.nodename=labelset (Labelindex);
    Node=[node Sonnode];
Return End

To compute a node feature in a decision tree

function [Nodeindex,buildnode]=calcutenode (Data,label,delta)

largeentropy=centropy (label);
[M,n]=size (data);
Entropygain=largeentropy*ones (1,n);
Buildnode=true;
For I=1:n
    pdata=data (:, i);
    Itemlist=unique (pData);
    For J=1:length (itemList)
        Itemindex=find (Pdata==itemlist (j));
        Entropygain (i) =entropygain (i)-length (itemindex)/m*centropy (label (itemindex));
    End
    % Here is the gain rate, the annotation is gain
    % entropygain (i) =entropygain (i)/centropy (pData); 
End
[Maxgainentropy,nodeindex]=max (Entropygain);
If Maxgainentropy<delta
    buildnode=false;
End

Computational entropy

function Result=centropy (propertylist)

result=0;
Totallength=length (propertylist);
Itemlist=unique (propertylist);
Pnum=length (itemList);
For I=1:pnum
    itemlength=length (propertylist==itemlist (i)));
    Pitem=itemlength/totallength;
    RESULT=RESULT-PITEM*LOG2 (pitem);
End

Two, model prediction
The following function is based on a trained decision tree model, input test sample sets and feature names, and predict output results for each test sample.

function Label=decisiontreetest (decisiontreemodel,sampleset,propertyname) lengthsample=size (sampleSet,1);
Label=zeros (lengthsample,1);
    For Sampleindex=1:lengthsample Sample=sampleset (Sampleindex,:);
    Nodes=decisiontreemodel.node;
    Rootnode=nodes (1);
    Head=rootnode.nodename;
    Index=getfeaturenum (Propertyname,head);
    Edge=sample (index);
    I=s;
    level=1;
        While K<length (Nodes) k=k+1; If Nodes (k). Level==level If strcmp (Nodes (k). Fathernodename,head) if Nodes (k). Edgeproperty==edge if Nodes (k). Nodename<10 label (Sampleindex) =nodes (k).
                        NodeName;
                    Break else Head=nodes (k).
                        NodeName;
                        Index=getfeaturenum (Propertyname,head);
                        Edge=sample (index);
                    level=level+1; End, end, end, end

Because the node name is stored in the well trained decision tree model, it is necessary to get the feature of the section names corresponding to the prediction. The following function is for the convenience of getting the feature dimension ordinal number.

function Result=getfeaturenum (PROPERTYNAME,STR)
result=0;
For I=1:length (PropertyName)
    if strcmp (propertyname{i},str) ==1
        result=i;
        break;
    End End

Iii. Decision Tree Experiment
This is a lot of books have an example, you can see the accuracy of the prediction results 100%.

CLEAR;CLC;
% outlooktype=struct (' Sunny ', 1, ' rainy ', 2, ' overcast ', 3);
% temperaturetype=struct (' hot ', 1, ' warm ', 2, ' cool ', 3);
% humiditytype=struct (' high ', 1, ' norm ', 2);
% windytype={' True ', 1, ' False ', 0};
% playgolf={' Yes ', 1, ' No ', 0};

% data=struct (' Outlook ', [], ' temperature ', [], ' humidity ', [], ' windy ', [], ' playgolf ', []);
outlook=[1,1,3,2,2,2,3,1,1,2,1,3,3,2] ';
temperature=[1,1,1,2,3,3,3,2,3,3,2,2,1,2] ';
humidity=[1,1,1,1,2,2,2,1,2,2,2,1,2,1] ';

windy=[0,1,0,0,0,1,1,0,0,0,1,1,0,1] ';
Data=[outlook temperature humidity windy];
playgolf=[0,0,1,1,1,0,1,0,1,1,1,1,1,0] ';
propertyname={' Outlook ', ' temperature ', ' humidity ', ' windy '};
delta=0.1;

Decisiontreemodel=decisiontree (Data,playgolf,propertyname,delta); Label=decisiontreetest (decisiontreemodel,data,propertyname); 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.