Data mining algorithm Learning (6) cart

Source: Internet
Author: User

Classification and regression tree: The cart (Classification and regression tree) algorithm uses a binary recursive Segmentation Technique to divide the current sample set into two subsample sets, each non-leaf node generated has two branches. Therefore, the decision tree generated by the cart algorithm is a simple binary tree.

Two basic ideas of Classification Tree: the first is to recursively divide the training sample into independent variable spaces for building, and the second is to use verification data for pruning.


The difference between cart and C4.5 is that the node uses the Gini index during split. The Gini index is mainly used for Metric Data Division or the non-purity of the Training dataset D. It is used as a test attribute and the smaller the Gini value, indicates that the purity of the sample is higher (that is, the probability that the sample belongs to the same class is higher ). Select the subset of the smallest Gini metric generated by this attribute as its split subset.

Algorithm steps:

Cart_classification (dataset, featurelist, Alpha ,):

Create root node R

If the data in the current dataset has the same category, mark the r category as this class.

If the decision tree height is greater than alpha, it will not be decomposed, marking the category classify (Dataset) of R)

Recursion:

Class classify (Dataset) for marking R)

Select attribute F from featurelist (select the smallest attribute division of Gini (dataset, f). For continuous attributes, refer to the discretization process of C4.5 (use the minimum Gini value as the criteria ))

Based on F, the dataset is binary divided into ds_l and ds_r:

If ds_l or ds_r is empty, it will not be decomposed.

If neither ds_l nor ds_r is empty, the node

C_l = cart_classification (ds_l, featurelist, alpha );

C_r = cart_classification (ds_rfeaturelist, alpha)

Add nodes c_l and c_r as left and right subnodes of R


Use SQL to implement the core code:
 
rr:while (1=1) doset @weather = (select id from weather where class = 0 limit 0,1);set @feature =(select parent from finalgini where statetemp=1 limit 0,1);if (@weather is null ) thenleave rr;else if(@feature is null) thenupdate finalgini set statetemp = state; end if;end if;if (@weather is not null) thenb:beginset current_gini = (select min(gini) from finalgini where statetemp=1);set current_class = (select parent from finalgini where gini = current_gini);drop table if exists aa;create temporary table aa (namee varchar(100));insert into aa select class from finalgini where parent=current_class;insert into aa select class2 from finalgini where parent=current_class;tt:while (1=1) doset @x = (select namee from aa limit 0,1);if (@x is not null) thena0:begindrop table if exists bb;set @b=concat('create temporary table bb as \(select id from ', current_table,' where ',current_class,' regexp \'',@x,'\' and class = 0 \)');prepare stmt2 from @b;execute stmt2;set @count = (select count(distinct play) from bb left join weather on bb.id = weather.id); if (@count =1) thena1:beginupdate bb left join weather on bb.id=weather.id set class = current_num;set current_num = current_num+1;if (current_table ='cc') thendelete from cc where id in (select id from bb);end if;set @f=(select play from cc limit 0,1);if (@f is null) thenset current_table='weather';update finalgini set statetemp=state; end if;delete from aa where namee = @x;end a1;end if;if (@count>1) thenset @id = (select count(id) from bb); if(@id = 2) thenw:beginupdate bb left join weather on bb.id=weather.id set class = current_num where play='yes';set current_num = current_num+1;update bb left join weather on bb.id=weather.id set class = current_num where play='no';set current_num = current_num+1;if (current_table ='cc') thendelete from cc where id in (select id from bb);end if;set @f=(select play from cc limit 0,1);if (@f is null) thenset current_table='weather';update finalgini set statetemp=state; end if;delete from aa where namee = @x;end w;end if;if(@id > 2) then drop table if exists cc;create temporary table cc select * from weather inner join bb using(id);set current_table = 'cc';leave tt;end if;end if;if(@count=0) thendelete from aa where namee = @x; end if;end a0;else update finalgini set state=0 where parent=current_class;leave tt;end if;end while;update finalgini set statetemp=0 where parent=current_class;  end b;end if;end while;end |delimiter ;  

Description of tables in the program:

? Table 2 What are the Gini values of different classification sets of classgini attributes? Table 3finalgini stores the optimal classification of each attribute and corresponding Gini values


Data mining algorithm Learning (6) cart

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.