Python Machine Learning decision tree

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article describes the python Machine Learning Decision tree in detail (demo-trees, DTs) is an unsupervised learning method for classification and regression.

Advantages: low computing complexity, easy to understand output results, insensitive to missing median values, and the ability to process irrelevant feature data
Disadvantage: the problem of over-matching may occur.
Applicable data types: numeric and nominal source code download https://www.manning.com/books/machine-learning-in-action

Run demo

Key algorithms

If so return class label;

Else

Find the best feature for dividing a dataset
Divide a dataset
Create branch nodes
For each branch node
Call the createBranch function and add the returned results to the branch node.
Return branch node

Corresponding code

Def createTree (dataSet, labels ):
ClassList = [example [-1] for example in dataSet] is not the last element in dataset [-1] {dataset}. in this case, the last one element in each dataset element is
If classList. count (classList [0]) = len (classList): if the returned List has the same count type, this type is returned! Indicates whether the subnode can be classified. if the subnode is of the same type, the return value is recursive. Otherwise, the subnode can be classified recursively.
Return classList [0] # stop splitting when all of the classes are equal
If len (dataSet [0]) = 1: # stop splitting when there are no more features in dataSet if there is only one element
Return majorityCnt (classList)
BestFeat = chooseBestFeatureToSplit (dataSet) select the best feature Index
BestFeatLabel = labels [bestFeat]. do you get this label flippers or no surfaces?
MyTree = {bestFeatLabel :{}} and then create the subtree of the best category
Del (labels [bestFeat]) deletes the best category.
FeatValues = [example [bestFeat] for example in dataSet]
UniqueVals = set (featValues) set is a classification, depending on how many types
For value in uniqueVals:
SubLabels = labels [:] # copy all of labels, so trees don't mess up existing labels
MyTree [bestFeatLabel] [value] = createTree (splitDataSet (dataSet, bestFeat, value), subLabels)
Return myTree

Information gain occurs after a dataset is divided. The biggest principle of dividing a dataset is to make disordered data more orderly. Here we understand the principle of splitting a pie:

0.5509775004326937 = + = prob * calcShannonEnt (subDataSet) after the subset is separated, the probability * Shannon drops, and the sum obtained, the original overall Shannon drops ratio

# The closer the data is, the less the Shannon entropy value, and the closer it is to 0. the more different the data is, the more logic it is, the larger the Shannon entropy # calculate only the featVec [-1] result tag def calcShannonEnt (dataSet) of its dataSet ):

0.4199730940219749 infoGain = baseEntropy-newEntropy

Summary:

At the beginning, I couldn't understand the code and couldn't understand what to do! Classification. our goal is to classify a bunch of data and label them with labels.
Like k adjacent classify ([0, 0], group, labels, 3), it means to put the new data [0, 0] in the group by k = 3 adjacent algorithm, categories in labels data! The group corresponds to the label!

Later we can see

Finally, let's talk about the decision tree in another article:

Decision tree: Speed up efficiency! Divide the first negative tag with the 'maximum optimiz', and the tag must be further divided! Negative, directly return the leaf node answer! The corresponding other dimensions do not continue to be judged!

In theory, even if no decision tree algorithm is used, this means that all data dimensions are rotated once every time! And there is the last tag answer! Number of dimensions * Number of data! Complexity! This is a matching answer to the memory! Suitable expert system! Poor ability to predict unexpected situations! However, the data volume is large, the speed is fast, and it can also feel intelligent! Because it is a repeat of past experience! But is it dead? No, it's not dead! The final effort is endless, but the decision tree is dynamic! Learning! Change tree! At least it is dynamic! When the data is incomplete, it may be incomplete! When a judgment can be solved, one judgment is used. no more judgment is needed! Dimension added!

The above is a detailed description of the python Machine Learning decision tree. For more information, see other related articles in the first PHP community!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Machine Learning decision tree

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Machine Learning decision tree

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support