ml-Decision Tree (Train,matlab)

Source: Internet
Author: User

Huadian North Wind Blows

Key laboratory of cognitive computing and application, Tianjin University

Date Modified: 2015/8/11


Decision tree is a very simple machine learning classification algorithm. The decision tree idea originates from the human decision-making process. In the simplest case, humans find that when it rains, they tend to scrape the East wind and darken the sky. Corresponding to the decision tree model, predicting the wind and darkening of the weather model is the feature we collect and whether it rains is the category tag. The decision tree is built as shown in the figure below


Decision tree Model construction process, in the feature set is not put back in succession, recursive selection feature as the node of the decision tree-the current node information gain or gain rate is the largest, the current node value as the current node branch out of the forward edge (in fact, the main choice is these edges, which is derived from the information gain calculation formula can be obtained). The intuitive explanation for this


In the extreme case, if there is a feature, when the characteristics of the different values, the corresponding category labels are pure, the decision-maker will certainly choose this feature, as the identification of unknown data criteria. The following formula for calculating the gain of information can be found at this time the corresponding information gain is the largest.

G (D,a) =h (D)-H (d| A

G (D,a): Represents the information gain of feature A on training data set D

H (d): Empirical entropy representing data set D

H (d| A): Represents the conditional entropy of the data set D under a given condition of feature a.


Conversely, when the corresponding category label is evenly distributed under the respective values of a characteristic, H (d| A) The maximum, and for all features H (D) is the same. Therefore, at this time the G (d,a) is the smallest.

In a word, the characteristic we want to pick is: the classification information contained in the current feature is the most explicit.


Let's look at a decision tree algorithm written by Matlab to help understand

Tree termination condition is

1, the feature number is empty

2, the tree is pure

3, the information gain or gain rate is less than the threshold

Main function:

CLEAR;CLC;

% outlooktype=struct (' Sunny ', 1, ' Rainy ', 2, ' overcast ', 3);
% temperaturetype=struct (' hot ', 1, ' warm ', 2, ' cool ', 3);
% humiditytype=struct (' high ', 1, ' norm ', 2);
% windytype={' True ', 1, ' False ', 0};
% playgolf={' Yes ', 1, ' No ', 0};
% data=struct (' Outlook ', [], ' temperature ', [], ' humidity ', [], ' windy ', [], ' playgolf ', []);

outlook=[1,1,3,2,2,2,3,1,1,2,1,3,3,2] ';
temperature=[1,1,1,2,3,3,3,2,3,3,2,2,1,2] ';
humidity=[1,1,1,1,2,2,2,1,2,2,2,1,2,1] ';
windy=[0,1,0,0,0,1,1,0,0,0,1,1,0,1] ';

Data=[outlook temperature humidity windy];
playgolf=[0,0,1,1,1,0,1,0,1,1,1,1,1,0] ';
propertyname={' Outlook ', ' temperature ', ' humidity ', ' windy '};
delta=0.1;
Decisiontreemodel=decisiontree (Data,playgolf,propertyname,delta);

Building the Model main function section

function Decisiontreemodel=decisiontree (Data,label,propertyname,delta)

global Node;

Node=struct (' Fathernodename ', [], ' edgeproperty ', [], ' NodeName ', []);
Buildtree (' root ', ' Stem ', data,label,propertyname,delta);
Node (1) =[];
Model. Node=node;
Decisiontreemodel=model;

Recursive build tree

function Buildtree (fathernodename,edge,data,label,propertyname,delta) global Node;
Sonnode=struct (' Fathernodename ', [], ' edgeproperty ', [], ' NodeName ', []);
Sonnode.fathernodename=fathernodename;
Sonnode.edgeproperty=edge;
    If length (unique label) ==1 Sonnode.nodename=label (1);
    Node=[node Sonnode];
Return
    End If Length (PropertyName) <1 labelset=unique (label);
    K=length (Labelset);
    Labelnum=zeros (k,1);
    For I=1:k Labelnum (i) =length (Find (Label==labelset (i)));
    End [~,labelindex]=max (Labelnum);
    Sonnode.nodename=labelset (Labelindex);
    Node=[node Sonnode];
Return
End [Sonindex,buildnode]=calcutenode (Data,label,delta);
    If Buildnode Datarowindex=setdiff (1:length (PropertyName), Sonindex);
    Sonnode.nodename=propertyname{sonindex};
    Node=[node Sonnode];
    PropertyName (Sonindex) =[];
    Sondata=data (:, Sonindex);
    
    Sonedge=unique (Sondata);
        For I=1:length (Sonedge) edgedataindex=find (Sondata==sonedge (i)); BuildtrEE (Sonnode.nodename,sonedge (i), data (Edgedataindex,datarowindex), label (Edgedataindex,:), Propertyname,delta);
    End Else Labelset=unique (label);
    K=length (Labelset);
    Labelnum=zeros (k,1);
    For I=1:k Labelnum (i) =length (Find (Label==labelset (i)));
    End [~,labelindex]=max (Labelnum);
    Sonnode.nodename=labelset (Labelindex);
    Node=[node Sonnode];
Return End

Calculate the characteristics of the next tree node

function [Nodeindex,buildnode]=calcutenode (Data,label,delta)

largeentropy=centropy (label);
[M,n]=size (data);
Entropygain=largeentropy*ones (1,n);
Buildnode=true;
For I=1:n
    pdata=data (:, i);
    Itemlist=unique (pData);
    For J=1:length (itemList)
        Itemindex=find (Pdata==itemlist (j));
        Entropygain (i) =entropygain (i)-length (itemIndex)/m*centropy (label (ItemIndex));
    End
    % is run as gain rate, commented out as gain
    % entropygain (i) =entropygain (i)/centropy (pData); 
End
[Maxgainentropy,nodeindex]=max (Entropygain);
If Maxgainentropy<delta
    buildnode=false;
End

Calculate entropy

function Result=centropy (propertylist)

result=0;
Totallength=length (propertylist);
Itemlist=unique (propertylist);
Pnum=length (itemList);
For I=1:pnum
    itemlength=length (Find (Propertylist==itemlist (i));
    Pitem=itemlength/totallength;
    RESULT=RESULT-PITEM*LOG2 (pitem);
End

The data structure of the output is node type,

struct (' Fathernodename ', [], ' edgeproperty ', [], ' NodeName ', [])
Because MATLAB does not have the object-oriented function, if using python,java,c#, writing a binary tree will be more convenient.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.