Reprint: Scikit-learn Learning Decision Tree algorithm

Last Update:2016-09-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Copyright NOTICE: <--This article for the author of painstaking work to create, to reprint, please specify the source @http://blog.csdn.net/gamer_gyt <--

Directory (?) [+]

======================================================================
This series of blogs mainly refer to the Scikit-learn official website for each algorithm, and to do some translation, if there are errors, please correct me
======================================================================

The algorithm analysis of decision tree and Python code implementation please refer to a previous blog: Click to read the next I mainly demonstrate how to use Scikit-learn to complete the decision tree algorithm call

Iris Data Set Description:

Click to view

The advantages of the decision Tree algorithm:
1: Simple to understand and explain, and the decision tree model can be imagined
2: The amount of data that needs to be prepared is small, while other technologies often require large datasets that need to create virtual variables to eliminate incomplete data, but the algorithm does not accurately predict lost data
3: The time complexity of the decision tree algorithm (i.e. prediction data) is the logarithm of the data points used to train the decision tree
4: The ability to process numbers and data categories (which requires a corresponding shift), while other algorithms analyze datasets that are often only one type of variable
5: Able to handle multi-output problems
6: Using a white box model, if a given situation is observed in a model, the interpretation of the condition is easily explained by the Boolean logic, in contrast, in a black box model (such as an artificial neural network), the results may be more difficult to interpret
8: From the data results, It performs well, although its assumptions are somewhat inconsistent with the real model

Decision Tree algorithm disadvantages:
1: Decision Tree algorithm learners can create complex trees, but there is no promotion basis, this is called overfitting, in order to avoid this problem, There is the concept of pruning, that is, the minimum number required to set a leaf node or the maximum depth of the setting tree
5: Decision tree learners are likely to create biased trees when certain classes dominate, so it is recommended to train the decision tree with balanced data

Classification Simple example

>>>FromSklearnImportTree>>>X=[[0,0],[1,1]]>>>y = [1]>>> clf = tree. decisiontreeclassifier ()  >>> clf = clf. fit (xy)

 >>> clf. ([[2. 2. array ([1])

< Span style= "FONT-SIZE:24PX;" >

>>>FromSklearnImportTree>>>X=[[0,0],[2,2]]>>>Y=[0.5,2.5]>>>ClF=tree. () >> > clf = clf. fit (xy) >>> clf. ([[11]) array ([0.5])
/span>

The decision tree algorithm uses sample datasets such as:

The procedure is as follows: [Python]View Plain copy

<span style="FONT-SIZE:18PX;" >#-*-coding:utf-8-*-
"' "
Created on 2016/4/23
@author: Administrator
‘‘‘
From sklearn.feature_extraction import Dictvectorizer
Import CSV
From Sklearn Import preprocessing
From Sklearn import tree
From Sklearn.externals.six import Stringio
#Read in the CSV File and put feature in a list of class label
Allelectronicsdata = open (R"Example.csv","RB")
Reader = Csv.reader (allelectronicsdata)
headers = Reader.next ()
#print headers
FeatureList = []
Labellist = []
#存放在两个元祖中
For row in reader:
Labellist.append (Row[len (Row)-1])
Rowdic = {}
For I in range (1,len (Row)-1):
Rowdic[headers[i]] = Row[i]
Featurelist.append (Rowdic)
# Print FeatureList
# Print Labellist
# Vector Feature
VEC = Dictvectorizer ()
Dummyx = Vec.fit_transform (featurelist). ToArray ()
# print "Dummyx:", Dummyx
# Print Vec.get_feature_names ()
# print "Labellist:" +str (Labellist)
LB = preprocessing. Labelbinarizer ()
Dummyy = Lb.fit_transform (labellist)
#print "DUMMYY:" + str (DUMMYY)
#using Desiciontree for Classfication
CLF = tree. Decisiontreeclassifier (criterion="entropy") #创建一个分类器, Entropy decided to use the ID3 algorithm
CLF = Clf.fit (Dummyx, Dummyy)
Print "CLF:" +str (CLF)
#Visulize model
With open ("AllEallElectronicInfomationGainori.txt","W") as F:
f = Tree.export_graphviz (CLF, Feature_names=vec.get_feature_names (), out_file = f)
#预测
onerowx = dummyx[0,:]
#print "ONEROWX:" +str (ONEROWX)
NEWROWX = Onerowx
newrowx[0] = 1
newrowx[2] = 0
Print "newrowx:" +str (NEWROWX)
Predictedy = Clf.predict (NEWROWX)
print

Import graphics with a command pdf:dot-t PDF ex.txt-o output.pdf.txt

Reprint: Scikit-learn Learning decision Tree algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Reprint: Scikit-learn Learning Decision Tree algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Reprint: Scikit-learn Learning Decision Tree algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support