Python Implementation Decision Tree algorithm

Last Update:2018-04-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

in this paper, a Python implementation decision tree algorithm is described. Share to everyone for your reference, as follows:

From sklearn.feature_extraction import dictvectorizerimport csvfrom sklearn import treefrom sklearn Import Preprocessingfrom sklearn.externals.six Import stringio# reads the CSV data and stores the data and eigenvalues in the Dictionary and class label list allelectronicsdata = open ( R ' Allelectronics.csv ', ' rt ') reader = Csv.reader (allelectronicsdata) headers = Next (reader) # The original code is: # headers = Reader.next () # This code should be used in the previous version and has now been updated without the next function # Print (headers) FeatureList = []labellist = []for row in Reader:labelli St.append (Row[len (Row)-1]) Rowdict = {} for I in range (1, Len (row)-1): rowdict[headers[i]] = Row[i] Fe Aturelist.append (rowdict) # print (featurelist) # Vectorization of eigenvalues, representing the vectorization of various parameters VEC = Dictvectorizer () Dummyx = Vec.fit_transform ( featurelist). ToArray () # Print ("Dummyx:" + str (DUMMYX)) # Print (Vec.get_feature_names ()) # Print ("labellist:" + str ( labellist) # Vectorization of the class label list is the final result lb = preprocessing. Labelbinarizer () Dummyy = Lb.fit_transform (labellist) # print ("DUMMYY:" + str (DUMMYY)) # Use the decision tree to classify CLF = tree. Decisiontreeclassifier () # CLF = tree. DecisIontreeclassifier (criterion = ' entropy ') CLF = Clf.fit (Dummyx, Dummyy) # print ("CLF:" + str (CLF)) # Visualize the model with open ("AllE Lectrionicinformationori.dot ", ' W ') as F:f = Tree.export_graphviz (CLF, feature_names = Vec.get_feature_names (), Out_fi  Le = f) onerowx = dummyx[0,:]# print ("ONEROWX:" + str (ONEROWX)) # Next change some data for prediction NEWROWX = onerowxnewrowx[0] = 0newrowx[1] = 1print ("NEWROWX:" + str (NEWROWX)) Predictedy = Clf.predict (Newrowx.reshape (1,-1)) # The predicted results need to be followed by the reshape (1,-1), otherwise the # error: # valueerror:expected 2D Array, got 1D array instead:# array=[0. 1.1. 0.1. 1.0. 0.1.  0.].# reshape your data either using Array.reshape ( -1, 1) # If your data have a single feature or Array.reshape (1,-1) if it Contains a single sample.print ("predicted result:" + str (predictedy))

In order to classify the items according to the purchasing power of people, the results can be predicted in the final process. Code see above, there are some advantages and disadvantages

The advantages of the decision Tree algorithm:

1) Simple and intuitive, the resulting decision tree is intuitive.

2) Basically do not need preprocessing, do not need to pre-normalization, processing missing values.

3) The cost of using a decision tree prediction isO(lo < Span style= "margin:0px; padding:0px; vertical-align:0px; Line-height:normal; " >g 2 m)O (log2m)。 M is the number of samples.

4) You can handle both discrete values and continuous values. Many algorithms just focus on discrete values or continuous values.

5) can handle the classification problem of multi-dimensional output.

6) Decision trees can be logically interpreted in a logical sense compared to black-box classification models such as neural networks.

7) The model can be selected by cross-validation pruning to improve generalization ability.

8) The fault-tolerant ability of the anomaly is good and the robustness is high.

Let's look at the disadvantages of the decision tree algorithm:

1) Decision Tree algorithm is very easy to fit, resulting in a not strong generalization ability. You can improve by setting the minimum number of samples for a node and limiting the depth of the decision tree.

2) The decision tree will cause a drastic change in the tree structure as a result of a little change in the sample. This can be solved by means of integrated learning.

3 Finding the optimal decision tree is a NP-hard problem, and we usually get into local optimum by heuristic method. It can be improved by means of integrated learning.

4) Some of the more complex relationships, decision trees are difficult to learn, such as XOR. This is no way, generally this relationship can be replaced by neural network classification method to solve.

5) If the sample proportions of some features are too large, the resulting decision tree tends to be biased towards these characteristics. This can be improved by adjusting the sample weights.

Related recommendations:

The decision tree of the ten algorithms of data mining

Decision Tree algorithm

The principle and case of decision tree algorithm

Implementation of decision Tree algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Implementation Decision Tree algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Implementation Decision Tree algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support