Test function corresponding to decision tree algorithm

Source: Internet
Author: User
Tags strcmp

Huadian North Wind Blows
Key laboratory of cognitive computing and application, Tianjin University
Date Modified: 2015/8/15

Decision tree is a very simple machine learning classification algorithm. The decision tree idea originates from the human decision-making process. In the simplest case, humans find that when it rains, they tend to scrape the East wind and darken the sky. Corresponding to the decision tree model, predicting the wind and darkening of the weather model is the feature we collect and whether it rains is the category tag. The decision tree is built as shown in

Decision tree Model construction process, in the feature set is not put back in succession, recursive selection feature as the node of the decision tree-the current node information gain or gain rate is the largest, the current node value as the current node branch out of the forward edge (in fact, the main choice is these edges, which is derived from the information gain calculation formula can be obtained). The intuitive explanation for this
In the extreme case, if there is a feature, when the characteristics of the different values, the corresponding category labels are pure, the decision-maker will certainly choose this feature, as the identification of unknown data criteria. The following formula for calculating the gain of information can be found at this time the corresponding information gain is the largest.
G (D,a) =h (D)-H (d| A
G (D,a): Represents the information gain of feature A on training data set D
H (d): Empirical entropy representing data set D
H (d| A): Represents the conditional entropy of the data set D under a given condition of feature a.
Conversely, when the corresponding category label is evenly distributed under the respective values of a characteristic, H (d| A) The maximum, and for all features H (D) is the same. Therefore, at this time the G (d,a) is the smallest.
In a word, the characteristic we want to pick is: the classification information contained in the current feature is the most explicit.
Let's look at a decision tree algorithm written by Matlab to help understand
Tree termination condition is
1, the feature number is empty
2, the tree is pure
3, the information gain or gain rate is less than the threshold

Part One, model training
Training Model main function:

function decisionTreeModel=decisionTree(data,label,propertyName,delta)global Node;Node=struct(‘level‘,-1,‘fatherNodeName‘,[],‘EdgeProperty‘,[],‘NodeName‘,[]);BuildTree(-1,‘root‘,‘Stem‘,data,label,propertyName,delta);Node(1)=[];model.Node=Node;decisionTreeModel=model;

Recursive Building decision Tree Section

 function buildtree(Fatherlevel,fathernodename,edge,data,label,propertyname,delta) GlobalNode;sonnode=struct (' Level ',0,' Fathernodename ',[],' Edgeproperty ',[],' NodeName ',[]);Sonnode.level=fatherlevel+1;Sonnode.Fathernodename=fathernodename;Sonnode.Edgeproperty=edge;if length(Unique label) = =1    Sonnode.Nodename=label (1); Node=[Node Sonnode];return;Endif length(PropertyName) <1Labelset=unique (label); k=length(Labelset); labelnum=ZerosK1); for I=1: K Labelnum (I)=length(Find(Label==labelset (I)));End    [~,labelindex]=max (Labelnum);Sonnode.Nodename=labelset (Labelindex); Node=[Node Sonnode];return;End[Sonindex,buildnode]=calcutenode (Data,label,delta);ifBuildnode Datarowindex=setdiff (1:length(PropertyName), Sonindex);Sonnode.Nodename=propertyname{Sonindex}; Node=[Node Sonnode]; PropertyName (Sonindex) =[];    Sondata=data (:, Sonindex); Sonedge=unique (Sondata); for I=1:length(Sonedge) edgedataindex=Find(Sondata==sonedge (I)); Buildtree (Sonnode.LevelSonnode.Nodename,sonedge (I), data (Edgedataindex,datarowindex), label (Edgedataindex,:), Propertyname,delta);EndElseLabelset=unique (label); k=length(Labelset); labelnum=ZerosK1); for I=1: K Labelnum (I)=length(Find(Label==labelset (I)));End    [~,labelindex]=max (Labelnum);Sonnode.Nodename=labelset (Labelindex); Node=[Node Sonnode];return;End

Compute the next node feature in a decision tree

 function [nodeindex,buildnode]=calcutenode(data,label,delta) Largeentropy=centropy (label);[M,n]=size(data); entropygain=largeentropy*ones(1, n); Buildnode=true; for I=1: N Pdata=data (:,I); Itemlist=unique (PData); for J=1:length(itemList) itemindex=Find(Pdata==itemlist (J)); Entropygain (I) =entropygain (I)-length(ItemIndex)/m*centropy (label (ItemIndex));End    % operation here is gain rate, commented out is gain    % Entropygain (i) =entropygain (i)/centropy (pData);End[Maxgainentropy,nodeindex]=max (Entropygain);ifMaxgainentropy<delta Buildnode=false;End

Calculate entropy

function result=CEntropy(propertyList)result=0;totalLength=length(propertyList);itemList=unique(propertyList);pNum=length(itemList);fori=1:pNum    itemLength=length(find(propertyList==itemList(i)));    pItem=itemLength/totalLength;    result=result-pItem*log2(pItem);end

Second, model prediction
The following function is based on a well-trained decision tree model, inputting test sample sets and feature names, and predicting the output for each test sample.

 function label=decisiontreetest(decisiontreemodel,sampleset,propertyname)  lengthsample=size(sampleset,1);label=zeros (Lengthsample,1); forsampleindex=1: Lengthsample sample=sampleset (Sampleindex,:);    Nodes=decisiontreemodel.node; Rootnode=nodes (1); Head=rootnode.nodename;Index=getfeaturenum (Propertyname,head); Edge=sample (Index); k=1; Level=1; whileK<length (Nodes) k=k+1;ifNodes (k). Level==levelifstrcmp (Nodes (k). Fathernodename,head)ifNodes (k). Edgeproperty==edgeifNodes (k). nodename<Ten                        label(Sampleindex) =nodes (k).                        NodeName; BreakElseHead=nodes (k). NodeName;Index=getfeaturenum (Propertyname,head); Edge=sample (Index); level=level+1;End                End            End        End    EndEnd

Because the well-trained decision tree model is the node name, it is necessary to get the corresponding feature in the forecast. The following function is for the convenience of getting the feature dimension ordinal.

function result=GetFeatureNum(propertyName,str)result=0;fori=1:length(propertyName)    if strcmp(propertyName{i},str)==1        result=i;        break;    endend

Three, Decision tree experiment
This is a lot of books on an example, you can see the prediction accuracy rate of 100%.

clear;clc;% outlooktype=struct(' Sunny ',1,' Rainy ',2,' Overcast ',3);% temperaturetype=struct(' hot ',1,' warm ',2,' Cool ',3);% humiditytype=struct(' High ',1,' Norm ',2);% windytype={' True ',1,' False ',0};% playgolf={' Yes ',1,' No ',0};% Data=struct(' Outlook ',[],' Temperature ',[],' humidity ',[],' Windy ',[],' Playgolf ', []); outlook=[1,1,3,2,2,2,3,1,1,2,1,3,3,2]'; temperature=[1,1,1,2,3,3,3,2,3,3,2,2,1,2] '; humidity=[1,1,1,1,2,2,2,1,2,2,2,1,2,1]'; windy=[0,1,0,0,0,1,1,0,0,0,1,1,0,1] ';d Ata=[outlook temperature humidity windy]; playgolf=[0,0,1,1,1,0,1,0,1,1,1,1,1,0]';p ropertyname={'Outlook', 'Temperature', 'Humidity', 'Windy'};d elta=0.1;decisiontreemodel=decisiontree (Data,playgolf,propertyname,delta); Label=decisiontreetest ( Decisiontreemodel,data,propertyname);

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Test function corresponding to decision tree algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.