Copyright NOTICE: <--This article for the author of painstaking work to create, to reprint, please specify the source @http://blog.csdn.net/gamer_gyt <--
Directory (?) [+]
======================================================================
This series of blogs mainly refer to the Scikit-learn official website for each algorithm, and to do some translation, if there are errors, please correct me
======================================================================
The algorithm analysis of decision tree and Python code implementation please refer to a previous blog: Click to read the next I mainly demonstrate how to use Scikit-learn to complete the decision tree algorithm call
Iris Data Set Description:
Click to view
The advantages of the decision Tree algorithm:
1: Simple to understand and explain, and the decision tree model can be imagined
2: The amount of data that needs to be prepared is small, while other technologies often require large datasets that need to create virtual variables to eliminate incomplete data, but the algorithm does not accurately predict lost data
3: The time complexity of the decision tree algorithm (i.e. prediction data) is the logarithm of the data points used to train the decision tree
4: The ability to process numbers and data categories (which requires a corresponding shift), while other algorithms analyze datasets that are often only one type of variable
5: Able to handle multi-output problems
6: Using a white box model, if a given situation is observed in a model, the interpretation of the condition is easily explained by the Boolean logic, in contrast, in a black box model (such as an artificial neural network), the results may be more difficult to interpret
8: From the data results, It performs well, although its assumptions are somewhat inconsistent with the real model
Decision Tree algorithm disadvantages:
1: Decision Tree algorithm learners can create complex trees, but there is no promotion basis, this is called overfitting, in order to avoid this problem, There is the concept of pruning, that is, the minimum number required to set a leaf node or the maximum depth of the setting tree
5: Decision tree learners are likely to create biased trees when certain classes dominate, so it is recommended to train the decision tree with balanced data
Classification Simple example
>>>FromSklearnImportTree>>>X=[[0,0],[1,1]]>>>y = [1]>>> clf = tree. decisiontreeclassifier () >>> clf = clf. fit (xy)
>>> clf. ([[2. 2. array ([1])
< Span style= "FONT-SIZE:24PX;" >
>>>FromSklearnImportTree>>>X=[[0,0],[2,2]]>>>Y=[0.5,2.5]>>>ClF=tree. () >> > clf = clf. fit (xy) >>> clf. ([[11]) array ([0.5])
/span>
The decision tree algorithm uses sample datasets such as:
The procedure is as follows:
[Python]View Plain copy
- <span style="FONT-SIZE:18PX;" >#-*-coding:utf-8-*-
- "' "
- Created on 2016/4/23
- @author: Administrator
- ‘‘‘
- From sklearn.feature_extraction import Dictvectorizer
- Import CSV
- From Sklearn Import preprocessing
- From Sklearn import tree
- From Sklearn.externals.six import Stringio
- #Read in the CSV File and put feature in a list of class label
- Allelectronicsdata = open (R"Example.csv","RB")
- Reader = Csv.reader (allelectronicsdata)
- headers = Reader.next ()
- #print headers
- FeatureList = []
- Labellist = []
- #存放在两个元祖中
- For row in reader:
- Labellist.append (Row[len (Row)-1])
- Rowdic = {}
- For I in range (1,len (Row)-1):
- Rowdic[headers[i]] = Row[i]
- Featurelist.append (Rowdic)
- # Print FeatureList
- # Print Labellist
- # Vector Feature
- VEC = Dictvectorizer ()
- Dummyx = Vec.fit_transform (featurelist). ToArray ()
- # print "Dummyx:", Dummyx
- # Print Vec.get_feature_names ()
- # print "Labellist:" +str (Labellist)
- LB = preprocessing. Labelbinarizer ()
- Dummyy = Lb.fit_transform (labellist)
- #print "DUMMYY:" + str (DUMMYY)
- #using Desiciontree for Classfication
- CLF = tree. Decisiontreeclassifier (criterion="entropy") #创建一个分类器, Entropy decided to use the ID3 algorithm
- CLF = Clf.fit (Dummyx, Dummyy)
- Print "CLF:" +str (CLF)
- #Visulize model
- With open ("AllEallElectronicInfomationGainori.txt","W") as F:
- f = Tree.export_graphviz (CLF, Feature_names=vec.get_feature_names (), out_file = f)
- #预测
- onerowx = dummyx[0,:]
- #print "ONEROWX:" +str (ONEROWX)
- NEWROWX = Onerowx
- newrowx[0] = 1
- newrowx[2] = 0
- Print "newrowx:" +str (NEWROWX)
- Predictedy = Clf.predict (NEWROWX)
- print
Import graphics with a command pdf:dot-t PDF ex.txt-o output.pdf.txt
Reprint: Scikit-learn Learning decision Tree algorithm