Data analysis algorithms
Decision Tree
Decision trees are used to classify records in the dataset.
Assuming that each record contains several attributes, the decision tree is categorized by attributes.
ID3 algorithm
How do I decide which attribute to select for partitioning? The judging condition is that the information entropy of the data set is minimized according to the attribute (the smaller the information entropy indicates the more neat the data), the entropy difference is the largest.
Suppose that a attribute has n values, and a partition of a will get n branches, and the sub datasets in each branch delete the A attribute.
Recursively divides the data sets in the branch. End up being non-divided or all data is the same value.
This will generate a decision tree. Classification using leaf nodes of decision tree
c4.5 algorithm
ID3 extension, Difference:
1. The entropy difference/split degree is compared when selecting the partitioning attribute
2.c4.5 uses a pruning algorithm to reduce the over-adaptation caused by noise data.
http://blog.csdn.net/xuxurui007/article/details/18045943
http://blog.csdn.net/zjd950131/article/details/8027081
KNN
Determine the type of new data based on the properties of the dataset and the category of each record.
Calculates the distance between data points, taking the most recent type of k points as the prediction type for new data points.
method of calculating distances distance = sqrt (Delta (Attributex) **2 + Delta (attributey) **2 + Delta (Attributex) **2 + ...)
Naive Bayes (naive Bayes)
adjourned
Data analysis algorithms