AI decision tree ID3 code (C ++)

Source: Internet
Author: User
Tags id3

Source code project file (vs2005) http://d.download.csdn.net/down/1018461/cctt_1

I used to find some code on the Internet and found that the code to be written should be changed. I also want to practice my c ++ code by the way.

First, I want to build a real tree structure. So I used my previous generaltree. H (of course, I modified some generaltree code here, such as adding some functions and turning some private functions into public functions ).

The format of the training text is as follows: the parallel name is decision2.txt and is concurrently stored in the project directory. You can also modify the source code.

 

Concept color Shape weight
Average apple red balls
Average apple green balls
Yellow and yellow bananas
Strawberry red ball light
Strawberry green ball light
Green elliptical weight of Watermelon
Watermelon green ball weight
Orange, orange, and elliptical light

 

The text format of the test format is as follows: Name test.txt and put it in the project directory (try to modify the source code)

Color and shape
Average red ball
Green ball
Generally

 

Classes should be separated here. But to make it look convenient, it will be combined.

The code below is as follows:

/* Created by Chico Chen <br/> * Date 2009/02/02 <br/> * indicate the source for reprinting <br/> */<br/> # include "stdafx. H "<br/> # include <iostream> <br/> # include <fstream> <br/> # include <string> <br/> # include <sstream> <br /># include <vector> <br/> # include <map> <br/> # include <cmath> <br/> # include "D: // tools // not finished // treetest // generaltree. H "<br/> using namespace STD; <br/> // This class is for computing attrib Ute entropy <br/> class attribdiff <br/>{< br/> Public: <br/> string attribname; // attribute name <br/> Map <string, int> attribnum; // attribute and number pair <br/> Map <string, Map <string, int> typenumber; <br/> // The first string is the specific attribute name, the second is the type, <br/> // int is the number of types in the specific attribute. <br/> // example: visible shape <br/> // 1 watermelon circle <br/> // 1 melon flat <br/> // 0 orange circle <br/> // specific attribute circle, the number of watermelons in the shape of "watermelon" and "circle" <br/> attribdiff (string & attribname) <br/>{< br/> This-> attribnam E = attribname; <br/>}< br/> // In order to computer entropy of an attribute <br/> double attribdiffercomputer (vector <string> Infos, int I _attrib, int I _types, vector <int> & visible) <br/>{< br/> double probability = 0; <br/> double entropy = 0; <br/> double attribg = 0; <br/> Map <string, int> temp; <br/> int tempnum = 0; <br/> for (INT I = 0; I <Infos. size (); I ++) <br/>{< br/> If (visible [I]! = 0) <br/>{< br/> tempnum = attribnum [Infos [I] [I _attrib]; <br/> attribnum [Infos [I] [I _attrib] = ++ tempnum; <br/> temp = typenumber [Infos [I] [I _attrib]; <br/> tempnum = temp [Infos [I] [I _types]; <br/> temp [Infos [I] [I _types] = ++ tempnum; <br/> typenumber [Infos [I] [I _attrib] = temp; <br/>}< br/> Map <string, int> :: iterator I _number; <br/> Map <string, int >:: iterator I _type; </P> <p> for (I _number = attribnum. Begin (); I _number! = Attribnum. end (); I _number ++) <br/>{</P> <p> probability = (* I _number ). second/(double) Infos. size (); <br/> cout <(* I _number ). first <"Probability:" <probability <Endl; <br/> entropy = 0; </P> <p> for (I _type = typenumber [(* I _number ). first]. begin (); I _type! = Typenumber [(* I _number ). first]. end (); I _type ++) <br/>{< br/> entropy + = computerentropyhelp (* I _type ). second/(double) (* I _number ). second); <br/>}</P> <p> attribg + = (-1) * probability * entropy; </P> <p >}</P> <p> return attribg; <br/>}< br/> // compute the entropy <br/> double computerentropyhelp (double PI) <br/>{< br/> return pI * log (PI) /log (double) 2); <br/>}< br/>}; <br/> // This class is create a data struct For general Tree node <br/> class nodeinfo <br/>{< br/> public: <br/> // color <br/> // red <br/> // blue <br/> string attribname; // The attribute name, such as color <br/> vector <string> detailattrib; // all of detail attributes under one of attribute name, for example, red <br/> nodeinfo () <br/>{< br/> attribname = ""; <br/>}< br/> nodeinfo (string & attribname) <br/>{< br/> This-> attribname = attribname; <br/>}< br/> Nodeinfo (nodeinfo & Ni) <br/>{< br/> This-> attribname = ni. attribname; <br/> This-> detailattrib = ni. detailattrib; <br/>}< br/> // Add detail attributes in nodeinfo <br/> void nodedetailinfoadd (string & detaila) <br/>{< br/> If (! Checkishas (detaila) <br/>{< br/> This-> detailattrib. push_back (detaila); <br/>}< br/> // If detail attribute is in the detailattrib list, return true; <br/> // else return false; <br/> bool checkishas (string & name) <br/>{< br/> for (INT I = 0; I <detailattrib. size (); I ++) <br/>{< br/> If (strcmp (name. c_str (), detailattrib [I]. c_str () = 0) <br/>{< br/> // The same attribute <br/> return true; <br/>}< B R/>}< br/> return false; <br/>}< br/> // This is print control for printing nodeinfo <br/> static void print (nodeinfo & info) <br/>{< br/> cout <info. attribname <"("; </P> <p> for (INT I = 0; I <info. detailattrib. size (); I ++) <br/>{< br/> cout <info. detailattrib [I] <"; <br/>}< br/> cout <")/n "; </P> <p >}</P> <p> }; <br/> // This class is demo-tree <br/> class dt <br/> {<br/> protected: <br/> const St Ring filename; // the data file name <br/> vector <string> Infos; // The array for storing information <br/> vector <string> attribs; // The array for storing the attributes <br/> generaltree <nodeinfo> gt; // The General tree for storing the demo-tree <br/> const int start; // which column is the start attribute, cannot t the type column <br/> const int I _type; // The column index of Type <br/> Co NST int max_entropy; // set an Max entropy to find the minimal entropy <br/> PRIVATE: <br/> // to help print <br/> void printhelp (INT helpprint) <br/>{< br/> for (INT I = 0; I <pelpprint; I ++) <br/>{< br/> cout <".. "; <br/>}< br/> // to find the index of the attribname in attribs array <br/> int find (string & attribname, vector <string> & attribs) <br/> {<br/> for (INT I = 0; I <attribs. size (); I ++) <BR/>{< br/> If (strcmp (attribname. c_str (), attribs [I]. c_str () = 0) <br/>{< br/> // The same <br/> return I; <br/>}< br/> return-1; <br/>}< br/> // this function is used for detecting if the arithmetic is over <br/> bool checkover (vector <int> & visible, string & type) <br/>{< br/> Map <string, int> types; <br/> int temp = 0; <br/> for (INT I = 0; I <Infos. size (); I ++) <br/>{< br/> If (visible [I]! = 0) <br/>{< br/> type = Infos [I] [I _type]; <br/> temp = types [Infos [I] [I _type]; <br/> If (temp = 0) <br/> {<br/> types [Infos [I] [I _type] = 1; <br/>}< br/> If (types. size ()> 1) <br/>{< br/> return false; // there are more than one types <br/>}< br/> return true; // there is only one type <br/>}< br/> // to create a demo-tree <br/> void dtcreate (generaltreenode <nodeinfo> * parent, vector <int> Visible, vector <int> visiblea, int I _attrib, string & detaila, int helpprint) <br/>{< br/> If (I _attrib> = start) <br/> {<br/> for (INT I = 0; I <Infos. size (); I ++) <br/>{< br/> If (strcmp (Infos [I] [I _attrib]. c_str (), detaila. c_str ())! = 0) <br/>{< br/> // not same detail attribute <br/> visible [I] = 0; <br/>}< br/> string type = ""; <br/> If (checkover (visible, type )) <br/>{< br/> // The arithmetic is over and add the type node into the general tree <br/> nodeinfo N (type ); <br/> generaltreenode <nodeinfo> * node = new generaltreenode <nodeinfo> (n); <br/> GT. insert (node, parent); <br/> printhelp (helpprint); <br/> cout <"demo- Type: "<n. attribname <Endl; <br/> return; <br/>}</P> <p> Map <string, double> attribgs; // This is for deciding which attrib shoshould be used </P> <p> for (INT I = start; I <attribs. size (); I ++) <br/>{< br/> // iterate attribs <br/> If (visiblea [I]! = 0) <br/>{< br/> attribdiff AD (attribs [I]); <br/> attribgs [attribs [I] = AD. attribdiffercomputer (Infos, I, I _type, visible); <br/> cout <attribs [I] <"G: "<attribgs [attribs [I] <Endl; <br/>}< br/> // to find the demo-attribute <br/> double min = max_entropy; <br/> string attributename; <br/> for (Map <string, double>: iterator I = attribgs. begin (); I! = Attribgs. end (); I ++) <br/>{</P> <p> If (min> = (* I ). second) <br/>{< br/> attributename = (* I ). first; <br/> min = (* I ). second; <br/>}< br/> nodeinfo N (attributename); <br/> int I _max = find (attributename, attribs ); <br/> for (INT I = 0; I <Infos. size (); I ++) <br/>{< br/> N. nodedetailinfoadd (Infos [I] [I _max]); <br/>}< br/> generaltreenode <nodeinfo> * node = new generaltreenode <nodeinfo> (N ); <br/> GT. inser T (node, parent); <br/> visiblea [I _max] = 0; <br/> printhelp (helpprint); <br/> cout <"choose attribute: "<attributename <Endl; <br/> for (INT I = 0; I <node-> data. detailattrib. size (); I ++) <br/>{< br/> printhelp (helpprint); <br/> cout <"Go into the branch: "<node-> data. detailattrib [I] <Endl; <br/> // go to every branch to demo-< br/> dtcreate (node, visible, visiblea, I _max, node-> data. detailattrib [I], he Lpprint + 1); <br/>}</P> <p >}< br/> Public: <br/> // decision2.txt must be stored in the project directory. Of course, if you want to write the absolute path <br/> // note the file format: <br/> // first, the first column is a category, then there are various attributes <br/> // For example: type Shape <br/> // watermelon circle <br/> // melon flat <br/> // orange circle <br/> DT (): filename ("decision2.txt"), start (1), I _type (0), max_entropy (10000) <br/>{< br/> getinfo (attribs, Infos, filename ); <br/> dtcreate (); </P> <p >}</P> <p> // this function is used for read data from the file <br/> // and create the attribute array and all information array <br/> // post: attrib S has at least one element <br/> // Infos has at least one element <br/> // pre: filename is not empty and the file is exist <br/> void getinfo (vector <string> & attribs, vector <string> & Infos, const string & filename) <br/>{< br/> ifstream read (filename. c_str (); </P> <p> int start = 0; <br/> int end = 0; <br/> string info = ""; <br/> Getline (read, Info); <br/> istringstream ISS (Info); <br/> string attrib; </P> <p> while (ISS> attrib) <br/> {<br/> attribs. push_back (attrib); <br/>}< br/> while (true) <br/>{< br/> info = ""; <br/> Getline (read, info); <br/> If (Info = "" | info. length () <= 1) <br/>{< br/> break; <br/>}< br/> vector <string> Infoline; <br/> istringstream stream (Info); </P> <p> while (Stream> attrib) <br/>{< br/> Infoline. push_back (attrib); <br/>}< br/> Infos. push_back (Infoline); <br/>}< br/> Read. clos E (); <br/>}< br/> // create the dt <br/> void dtcreate () <br/>{< br/> vector <int> visible (Infos. size (), 1); <br/> vector <int> visiblea (attribs. size (), 1); // to judge which attribute is useless <br/> string temp = ""; <br/> dtcreate (null, visible, visiblea, start-1, temp, 0); <br/>}< br/> // print the dt <br/> void print () <br/>{</P> <p> GT. levelprint (nodeinfo: print); <br/>}< br/> void judge (const string & testfile Name, vector <string> & types, const string & testresultfilename) <br/>{< br/> vector <string> attribs_test; <br/> vector <string> infos_test; <br/> getinfo (attribs_test, infos_test, testfilename); </P> <p> If (! Checkfileformat (attribs_test) <br/>{< br/> throw "File Format error"; <br/>}< br/> generaltreenode <nodeinfo> * root = gt. getroot (); <br/> for (INT I = 0; I <infos_test.size (); I ++) <br/>{</P> <p> types. push_back (judgetype (root, infos_test [I], attribs_test); <br/>}< br/> writetesttypesinfo (testresultfilename, types ); <br/>}< br/> void writetesttypesinfo (const string & filename, vector <string> & types) <br/>{< br/> ofstream out (filename. c_str (); <br/> out <"category" <Endl; <br/> for (INT I = 0; I <types. size (); I ++) <br/>{< br/> out <types [I] <Endl; <br/>}< br/> out. close (); <br/>}< br/> string judgetype (generaltreenode <nodeinfo> * node, vector <string> & info, vector <string> & attribs_test) <br/>{< br/> If (GT. getchildnodenum (node) = 0) <br/>{< br/> return node-> getdata (). attribname; <br/>}< br/> int Index = find (node-> getdata (). attribname, attribs_test); <br/> int branch_index = find (info [Index], node-> getdata (). detailattrib); <br/> If (branch_index =-1) <br/> {<br/> // is not find this detail attribute in this node detailattrib <br/> // There are two way to deal with this situation <br/> // 1. every branch has possibility to choose <br/> // 2. no such type and can not judge <br/> // The first solution make the correct ratio low <br/> // The second solution has no fault-tolerance. <br/> // and here I choose the second solution. <br/> // if I have more free time later, I will write the first solution <br/> throw "No such type "; <br/>}< br/> generaltreenode <nodeinfo> * childnode = gt. getachild (node, branch_index); <br/> return judgetype (childnode, info, attribs_test); <br/>}< br/> bool checkfileformat (vector <string> & attribs_test) <br/> {<br/> bool iscorrect = true; <br/> for (Int J = 0; j <attribs_test.size (); j ++) <br/> {<br/> If (find (attribs_test [J], attribs) =-1) <br/>{< br/> iscorrect = false; <br/>}< br/> If (attribs_test.size () = attribs. size ()-1) <br/>{< br/> iscorrect = iscorrect & true; <br/>}< br/> else <br/>{< br/> iscorrect = false; <br/>}< br/> return iscorrect; <br/>}< br/> };

Here, the main function is written as follows (vs2005 used by myself ):

Int _ tmain (INT argc, _ tchar * argv []) <br/>{< br/> DT; <br/> // DT. print (); <br/> string testfile = "test.txt"; <br/> string testresult = "testresult.txt"; <br/> vector <string> types; <br/> DT. judge (testfile, types, testresult); <br/> return 0; <br/>}< br/>

I feel that DT comments are more detailed, so I will not explain too much in my blog. In addition, this code will place the test result in testresult.txt under the project directory.

In addition, information related to the decision tree ID3 is displayed on the console, for example:

The red probability is 0.25.
The yellow probability is 0.125.
Orange Probability: 0.125
The green probability is 0.5.
The color G is: 1
The ball probability is: 0.625.
The probability of an elliptical is 0.25.
The probability of bending month is: 0.125
The G type is 1.20121.
The light probability is: 0.375
General Probability: 0.375
Probability of Weight: 0.25
G severity: 0.688722
Choose attribute: Weight
Go into the branch: Average
The red probability is 0.125.
The yellow probability is 0.125.
The green probability is 0.125.
Color G: 0
The ball probability is: 0.25.
The probability of bending month is: 0.125
Shape G: 0
... Choose attribute: Color
.. Go into the branch: red
... Demo-type: Apple
.. Go into the branch: Green
... Demo-type: Apple
.. Go into the branch: Yellow
... Demo-type: banana
.. Go into the branch: Orange
... Demo-type:
Go into the branch: Light
The red probability is 0.125.
Orange Probability: 0.125
The green probability is 0.125.
Color G: 0
The ball probability is: 0.25.
The probability of an elliptical is 0.125.
Shape G: 0
... Choose attribute: Color
.. Go into the branch: red
... Demo-type: Strawberry
.. Go into the branch: Green
... Demo-type: Strawberry
.. Go into the branch: Yellow
... Demo-type:
.. Go into the branch: Orange
... Demo-type: Orange
Go into the branch: heavy
... Demo-type: Watermelon

 

What does this piece of information mean?

 

The red probability is 0.25.
The yellow probability is 0.125.
Orange Probability: 0.125
The green probability is 0.5.
The color G is: 1

The probability of red, yellow, orange, and green is the specific property of the color. Entropy is not printed here. If this code is lucky to be seen by the Chinese Emy of sciences teachers and sisters,

You can add several lines of code to the attribdiffercomputer () function to print each entropy. The teacher will also let you read the code. Here is a homework question. (In addition, the example of the decision tree on the machine learning PPT in Chapter 10 has incorrect computing results. If you have calculated it carefully, the meaning of color G is the decision Value of Color G. The smaller the decision value, the higher the probability of selecting this attribute.


So what does a decision tree look like?

Choose attribute: Weight
Go into the branch: Average

... Choose attribute: Color
.. Go into the branch: red

......................

Look at the above. it indicates that the root node is "light and weight", and then enters the "General" branch. Then, the nodes that enter the "General" branch are color .. then go to the "red" branch. pay attention ".. ", equal ".. "indicates the same hierarchy of the tree.


The ID3 code used to make the demo-tree is mainly used for testing and testing by students and sisters. Because I only tested the example in the instructor's ppt and it is not guaranteed that all examples are correct. In addition, the instructor's questions are abnormal (about 10 attributes). It may take about an hour for manual calculation.

I regretted not compiling a program first. I wish you a smooth test .. (I think this Code may be found before the test ).


We also remind you that ID3 is not a good algorithm. When the G values of the two attributes are the same, if it does not provide a better judgment standard. In addition, a non-Minimum decision tree may be generated if the sequence is selected. This is worth studying.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.