Python Implementation Decision Tree algorithm

Source: Internet
Author: User
in this paper, a Python implementation decision tree algorithm is described. Share to everyone for your reference, as follows:

From sklearn.feature_extraction import dictvectorizerimport csvfrom sklearn import treefrom sklearn Import Preprocessingfrom sklearn.externals.six Import stringio# reads the CSV data and stores the data and eigenvalues in the Dictionary and class label list allelectronicsdata = open ( R ' Allelectronics.csv ', ' rt ') reader = Csv.reader (allelectronicsdata) headers = Next (reader) # The original code is: # headers = Reader.next () # This code should be used in the previous version and has now been updated without the next function # Print (headers) FeatureList = []labellist = []for row in Reader:labelli St.append (Row[len (Row)-1]) Rowdict = {} for I in range (1, Len (row)-1): rowdict[headers[i]] = Row[i] Fe Aturelist.append (rowdict) # print (featurelist) # Vectorization of eigenvalues, representing the vectorization of various parameters VEC = Dictvectorizer () Dummyx = Vec.fit_transform ( featurelist). ToArray () # Print ("Dummyx:" + str (DUMMYX)) # Print (Vec.get_feature_names ()) # Print ("labellist:" + str ( labellist) # Vectorization of the class label list is the final result lb = preprocessing. Labelbinarizer () Dummyy = Lb.fit_transform (labellist) # print ("DUMMYY:" + str (DUMMYY)) # Use the decision tree to classify CLF = tree. Decisiontreeclassifier () # CLF = tree. DecisIontreeclassifier (criterion = ' entropy ') CLF = Clf.fit (Dummyx, Dummyy) # print ("CLF:" + str (CLF)) # Visualize the model with open ("AllE Lectrionicinformationori.dot ", ' W ') as F:f = Tree.export_graphviz (CLF, feature_names = Vec.get_feature_names (), Out_fi  Le = f) onerowx = dummyx[0,:]# print ("ONEROWX:" + str (ONEROWX)) # Next change some data for prediction NEWROWX = onerowxnewrowx[0] = 0newrowx[1] = 1print ("NEWROWX:" + str (NEWROWX)) Predictedy = Clf.predict (Newrowx.reshape (1,-1)) # The predicted results need to be followed by the reshape (1,-1), otherwise the # error: # valueerror:expected 2D Array, got 1D array instead:# array=[0. 1.1. 0.1. 1.0. 0.1.  0.].# reshape your data either using Array.reshape ( -1, 1) # If your data have a single feature or Array.reshape (1,-1) if it Contains a single sample.print ("predicted result:" + str (predictedy))


In order to classify the items according to the purchasing power of people, the results can be predicted in the final process. Code see above, there are some advantages and disadvantages

The advantages of the decision Tree algorithm:

1) Simple and intuitive, the resulting decision tree is intuitive.

2) Basically do not need preprocessing, do not need to pre-normalization, processing missing values.

3) The cost of using a decision tree prediction isO(lo < Span style= "margin:0px; padding:0px; vertical-align:0px; Line-height:normal; " >g 2 m)O (log2m)。 M is the number of samples.

4) You can handle both discrete values and continuous values. Many algorithms just focus on discrete values or continuous values.

5) can handle the classification problem of multi-dimensional output.

6) Decision trees can be logically interpreted in a logical sense compared to black-box classification models such as neural networks.

7) The model can be selected by cross-validation pruning to improve generalization ability.

8) The fault-tolerant ability of the anomaly is good and the robustness is high.

Let's look at the disadvantages of the decision tree algorithm:

1) Decision Tree algorithm is very easy to fit, resulting in a not strong generalization ability. You can improve by setting the minimum number of samples for a node and limiting the depth of the decision tree.

2) The decision tree will cause a drastic change in the tree structure as a result of a little change in the sample. This can be solved by means of integrated learning.

3 Finding the optimal decision tree is a NP-hard problem, and we usually get into local optimum by heuristic method. It can be improved by means of integrated learning.

4) Some of the more complex relationships, decision trees are difficult to learn, such as XOR. This is no way, generally this relationship can be replaced by neural network classification method to solve.

5) If the sample proportions of some features are too large, the resulting decision tree tends to be biased towards these characteristics. This can be improved by adjusting the sample weights.


Related recommendations:

The decision tree of the ten algorithms of data mining

Decision Tree algorithm

The principle and case of decision tree algorithm

Implementation of decision Tree algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.