in this paper, a Python implementation decision tree algorithm is described. Share to everyone for your reference, as follows:
From sklearn.feature_extraction import dictvectorizerimport csvfrom sklearn import treefrom sklearn Import Preprocessingfrom sklearn.externals.six Import stringio# reads the CSV data and stores the data and eigenvalues in the Dictionary and class label list allelectronicsdata = open ( R ' Allelectronics.csv ', ' rt ') reader = Csv.reader (allelectronicsdata) headers = Next (reader) # The original code is: # headers = Reader.next () # This code should be used in the previous version and has now been updated without the next function # Print (headers) FeatureList = []labellist = []for row in Reader:labelli St.append (Row[len (Row)-1]) Rowdict = {} for I in range (1, Len (row)-1): rowdict[headers[i]] = Row[i] Fe Aturelist.append (rowdict) # print (featurelist) # Vectorization of eigenvalues, representing the vectorization of various parameters VEC = Dictvectorizer () Dummyx = Vec.fit_transform ( featurelist). ToArray () # Print ("Dummyx:" + str (DUMMYX)) # Print (Vec.get_feature_names ()) # Print ("labellist:" + str ( labellist) # Vectorization of the class label list is the final result lb = preprocessing. Labelbinarizer () Dummyy = Lb.fit_transform (labellist) # print ("DUMMYY:" + str (DUMMYY)) # Use the decision tree to classify CLF = tree. Decisiontreeclassifier () # CLF = tree. DecisIontreeclassifier (criterion = ' entropy ') CLF = Clf.fit (Dummyx, Dummyy) # print ("CLF:" + str (CLF)) # Visualize the model with open ("AllE Lectrionicinformationori.dot ", ' W ') as F:f = Tree.export_graphviz (CLF, feature_names = Vec.get_feature_names (), Out_fi Le = f) onerowx = dummyx[0,:]# print ("ONEROWX:" + str (ONEROWX)) # Next change some data for prediction NEWROWX = onerowxnewrowx[0] = 0newrowx[1] = 1print ("NEWROWX:" + str (NEWROWX)) Predictedy = Clf.predict (Newrowx.reshape (1,-1)) # The predicted results need to be followed by the reshape (1,-1), otherwise the # error: # valueerror:expected 2D Array, got 1D array instead:# array=[0. 1.1. 0.1. 1.0. 0.1. 0.].# reshape your data either using Array.reshape ( -1, 1) # If your data have a single feature or Array.reshape (1,-1) if it Contains a single sample.print ("predicted result:" + str (predictedy))
In order to classify the items according to the purchasing power of people, the results can be predicted in the final process. Code see above, there are some advantages and disadvantages
The advantages of the decision Tree algorithm:
1) Simple and intuitive, the resulting decision tree is intuitive.
2) Basically do not need preprocessing, do not need to pre-normalization, processing missing values.
3) The cost of using a decision tree prediction isO(lo < Span style= "margin:0px; padding:0px; vertical-align:0px; Line-height:normal; " >g 2 m)O (log2m)。 M is the number of samples.
4) You can handle both discrete values and continuous values. Many algorithms just focus on discrete values or continuous values.
5) can handle the classification problem of multi-dimensional output.
6) Decision trees can be logically interpreted in a logical sense compared to black-box classification models such as neural networks.
7) The model can be selected by cross-validation pruning to improve generalization ability.
8) The fault-tolerant ability of the anomaly is good and the robustness is high.
Let's look at the disadvantages of the decision tree algorithm:
1) Decision Tree algorithm is very easy to fit, resulting in a not strong generalization ability. You can improve by setting the minimum number of samples for a node and limiting the depth of the decision tree.
2) The decision tree will cause a drastic change in the tree structure as a result of a little change in the sample. This can be solved by means of integrated learning.
3 Finding the optimal decision tree is a NP-hard problem, and we usually get into local optimum by heuristic method. It can be improved by means of integrated learning.
4) Some of the more complex relationships, decision trees are difficult to learn, such as XOR. This is no way, generally this relationship can be replaced by neural network classification method to solve.
5) If the sample proportions of some features are too large, the resulting decision tree tends to be biased towards these characteristics. This can be improved by adjusting the sample weights.
Related recommendations:
The decision tree of the ten algorithms of data mining
Decision Tree algorithm
The principle and case of decision tree algorithm
Implementation of decision Tree algorithm