Python Machine Learning decision tree

Source: Internet
Author: User
This article describes the python Machine Learning Decision tree in detail (demo-trees, DTs) is an unsupervised learning method for classification and regression.

Advantages: low computing complexity, easy to understand output results, insensitive to missing median values, and the ability to process irrelevant feature data
Disadvantage: the problem of over-matching may occur.
Applicable data types: numeric and nominal source code download https://www.manning.com/books/machine-learning-in-action

Run demo

Key algorithms

If so return class label;

Else

Find the best feature for dividing a dataset
Divide a dataset
Create branch nodes
For each branch node
Call the createBranch function and add the returned results to the branch node.
Return branch node

Corresponding code

Def createTree (dataSet, labels ):
ClassList = [example [-1] for example in dataSet] is not the last element in dataset [-1] {dataset}. in this case, the last one element in each dataset element is
If classList. count (classList [0]) = len (classList): if the returned List has the same count type, this type is returned! Indicates whether the subnode can be classified. if the subnode is of the same type, the return value is recursive. Otherwise, the subnode can be classified recursively.
Return classList [0] # stop splitting when all of the classes are equal
If len (dataSet [0]) = 1: # stop splitting when there are no more features in dataSet if there is only one element
Return majorityCnt (classList)
BestFeat = chooseBestFeatureToSplit (dataSet) select the best feature Index
BestFeatLabel = labels [bestFeat]. do you get this label flippers or no surfaces?
MyTree = {bestFeatLabel :{}} and then create the subtree of the best category
Del (labels [bestFeat]) deletes the best category.
FeatValues = [example [bestFeat] for example in dataSet]
UniqueVals = set (featValues) set is a classification, depending on how many types
For value in uniqueVals:
SubLabels = labels [:] # copy all of labels, so trees don't mess up existing labels
MyTree [bestFeatLabel] [value] = createTree (splitDataSet (dataSet, bestFeat, value), subLabels)
Return myTree

Information gain occurs after a dataset is divided. The biggest principle of dividing a dataset is to make disordered data more orderly. Here we understand the principle of splitting a pie:

0.5509775004326937 = + = prob * calcShannonEnt (subDataSet) after the subset is separated, the probability * Shannon drops, and the sum obtained, the original overall Shannon drops ratio

# The closer the data is, the less the Shannon entropy value, and the closer it is to 0. the more different the data is, the more logic it is, the larger the Shannon entropy # calculate only the featVec [-1] result tag def calcShannonEnt (dataSet) of its dataSet ):


0.4199730940219749 infoGain = baseEntropy-newEntropy

Summary:

At the beginning, I couldn't understand the code and couldn't understand what to do! Classification. our goal is to classify a bunch of data and label them with labels.
Like k adjacent classify ([0, 0], group, labels, 3), it means to put the new data [0, 0] in the group by k = 3 adjacent algorithm, categories in labels data! The group corresponds to the label!

Later we can see

        

Finally, let's talk about the decision tree in another article:

Decision tree: Speed up efficiency! Divide the first negative tag with the 'maximum optimiz', and the tag must be further divided! Negative, directly return the leaf node answer! The corresponding other dimensions do not continue to be judged!

In theory, even if no decision tree algorithm is used, this means that all data dimensions are rotated once every time! And there is the last tag answer! Number of dimensions * Number of data! Complexity! This is a matching answer to the memory! Suitable expert system! Poor ability to predict unexpected situations! However, the data volume is large, the speed is fast, and it can also feel intelligent! Because it is a repeat of past experience! But is it dead? No, it's not dead! The final effort is endless, but the decision tree is dynamic! Learning! Change tree! At least it is dynamic! When the data is incomplete, it may be incomplete! When a judgment can be solved, one judgment is used. no more judgment is needed! Dimension added!

The above is a detailed description of the python Machine Learning decision tree. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.