Reprint: Scikit-learn Learning Decision Tree algorithm

Source: Internet
Author: User

Copyright NOTICE: <--This article for the author of painstaking work to create, to reprint, please specify the source @http://blog.csdn.net/gamer_gyt <--

Directory (?) [+]

======================================================================
This series of blogs mainly refer to the Scikit-learn official website for each algorithm, and to do some translation, if there are errors, please correct me
======================================================================

The algorithm analysis of decision tree and Python code implementation please refer to a previous blog: Click to read the next I mainly demonstrate how to use Scikit-learn to complete the decision tree algorithm call


Iris Data Set Description:

Click to view



The advantages of the decision Tree algorithm:
1: Simple to understand and explain, and the decision tree model can be imagined
2: The amount of data that needs to be prepared is small, while other technologies often require large datasets that need to create virtual variables to eliminate incomplete data, but the algorithm does not accurately predict lost data
3: The time complexity of the decision tree algorithm (i.e. prediction data) is the logarithm of the data points used to train the decision tree
4: The ability to process numbers and data categories (which requires a corresponding shift), while other algorithms analyze datasets that are often only one type of variable
5: Able to handle multi-output problems
6: Using a white box model, if a given situation is observed in a model, the interpretation of the condition is easily explained by the Boolean logic, in contrast, in a black box model (such as an artificial neural network), the results may be more difficult to interpret
8: From the data results, It performs well, although its assumptions are somewhat inconsistent with the real model


Decision Tree algorithm disadvantages:
1: Decision Tree algorithm learners can create complex trees, but there is no promotion basis, this is called overfitting, in order to avoid this problem, There is the concept of pruning, that is, the minimum number required to set a leaf node or the maximum depth of the setting tree
5: Decision tree learners are likely to create biased trees when certain classes dominate, so it is recommended to train the decision tree with balanced data


Classification Simple example
>>>FromSklearnImportTree>>>X=[[0,0],[1,1]]>>>y = [1]>>> clf = tree. decisiontreeclassifier ()  >>> clf = clf. fit (xy)  
 >>> clf. ([[2. 2. array ([1])      

< Span style= "FONT-SIZE:24PX;" > 

>>>FromSklearnImportTree>>>X=[[0,0],[2,2]]>>>Y=[0.5,2.5]>>>ClF=tree. () >> > clf = clf. fit (xy) >>> clf. ([[11]) array ([0.5])
/span>

The decision tree algorithm uses sample datasets such as:

The procedure is as follows: [Python]View Plain copy
  1. <span style="FONT-SIZE:18PX;" >#-*-coding:utf-8-*-
  2. "' "
  3. Created on 2016/4/23
  4. @author: Administrator
  5. ‘‘‘
  6. From sklearn.feature_extraction import Dictvectorizer
  7. Import CSV
  8. From Sklearn Import preprocessing
  9. From Sklearn import tree
  10. From Sklearn.externals.six import Stringio
  11. #Read in the CSV File and put feature in a list of class label
  12. Allelectronicsdata = open (R"Example.csv","RB")
  13. Reader = Csv.reader (allelectronicsdata)
  14. headers = Reader.next ()
  15. #print headers
  16. FeatureList = []
  17. Labellist = []
  18. #存放在两个元祖中
  19. For row in reader:
  20. Labellist.append (Row[len (Row)-1])
  21. Rowdic = {}
  22. For I in range (1,len (Row)-1):
  23. Rowdic[headers[i]] = Row[i]
  24. Featurelist.append (Rowdic)
  25. # Print FeatureList
  26. # Print Labellist
  27. # Vector Feature
  28. VEC = Dictvectorizer ()
  29. Dummyx = Vec.fit_transform (featurelist). ToArray ()
  30. # print "Dummyx:", Dummyx
  31. # Print Vec.get_feature_names ()
  32. # print "Labellist:" +str (Labellist)
  33. LB = preprocessing. Labelbinarizer ()
  34. Dummyy = Lb.fit_transform (labellist)
  35. #print "DUMMYY:" + str (DUMMYY)
  36. #using Desiciontree for Classfication
  37. CLF = tree. Decisiontreeclassifier (criterion="entropy") #创建一个分类器, Entropy decided to use the ID3 algorithm
  38. CLF = Clf.fit (Dummyx, Dummyy)
  39. Print "CLF:" +str (CLF)
  40. #Visulize model
  41. With open ("AllEallElectronicInfomationGainori.txt","W") as F:
  42. f = Tree.export_graphviz (CLF, Feature_names=vec.get_feature_names (), out_file = f)
  43. #预测
  44. onerowx = dummyx[0,:]
  45. #print "ONEROWX:" +str (ONEROWX)
  46. NEWROWX = Onerowx
  47. newrowx[0] = 1
  48. newrowx[2] = 0
  49. Print "newrowx:" +str (NEWROWX)
  50. Predictedy = Clf.predict (NEWROWX)
  51. print 


Import graphics with a command pdf:dot-t PDF ex.txt-o output.pdf.txt

Reprint: Scikit-learn Learning decision Tree algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.