10 classic algorithms for data mining (1) C4.5.

Last Update:2018-12-03 Source: Internet

Author: User

Tags id3

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

========================================================== ========================================================

This series is reprinted from aladdina

The aim is to give a general introduction to the top ten typical algorithms of Data Mining. If you need to study these algorithms in depth, please visit the internet. ========================================================== ==========================================================

In machine learning, a decision tree is a prediction model, which represents a ing between object attributes and object values. Each node in the tree represents an object, and each forks PATH represents a possible attribute value, each leaf node corresponds to the object value represented by the path from the root node to the leaf node. A decision tree has only one output. To have a plural output, you can create an independent decision tree to process different outputs.

The machine learning technology that generates decision trees from data is called Decision Tree Learning.

Decision tree learning is also a common method in data mining. Here, each decision tree expresses a tree structure. Its Branches classify objects of this type based on attributes. Each decision tree can be used to test data by splitting the source database. This process can recursively trim the tree. The recursive process is complete when no further division or a separate class can be applied to a branch. In addition, the random forest classifier combines many decision trees to improve the classification accuracy.

Decision trees can also be constructed based on the probability of computing conditions. If the decision tree relies on the mathematical calculation method, it can achieve better results.

How decision trees work
Decision Trees are generally generated from top to bottom.
There are several ways to select the segmentation method, but the purpose is the same: try the best segmentation for the target class.
There is a path from the root node to the leaf node. This path is a "rule ".
The decision tree can be binary or multi-cross.
Measurement of each node:
1) number of records from this node
2) classification path for leaf nodes
3) Percentage of leaf nodes correctly classified.
Some rules have better effects than others.

Due to some problems in actual application of the ID3 algorithm, quilan proposed the C4.5 algorithm. Strictly speaking, C4.5 can only be an improved algorithm of ID3. I believe everyone is familiar with the ID3 algorithm. I will not introduce it here.
The C4.5 algorithm inherits the advantages of the ID3 algorithm and improves the ID3 algorithm in the following aspects:
1) The information gain rate is used to select attributes, which overcomes the shortcomings of attributes with many options;
2) pruning during tree construction;
3) discretization of continuous attributes;
4) ability to process incomplete data.
The C4.5 algorithm has the following advantages: the generated classification rules are easy to understand and have high accuracy. The disadvantage is that the dataset needs to be scanned and sorted multiple times during tree construction, which leads to inefficient algorithms. In addition, C4.5 is only applicable to datasets that can reside in the memory. When the training set is too large to accommodate the memory, the program cannot run.

Other content from the search:

C4.5 is a classification decision tree algorithm in machine learning algorithms. Its core algorithm is ID3.
The classification decision tree algorithm is a top-down decision tree that extracts classification rules from a large number of cases.
Each part of the decision tree is:
Root: Learning case set.
Branch: Condition for determining classification.
Leaf: Well-divided categories.
§ 4. 3.2 ID3 algorithm
1. Concept extraction algorithm CLS
1) The initialization parameter C = {e}. E contains all the examples, which are the root.
2) if any element e in if C belongs to the same decision class, a leaf is created.
Node Yes termination.
According to the heuristic standard, else selects feature Fi = {V1, V2, V3,... vn} and creates
Judge Node

Divide C into N sets that do not overlap, C1, C2, C3,..., CN;
3) recursion of any CI.
2. ID3 algorithm
1) randomly select a subset W (window) of C ).
2) Call Cls to generate W's classification tree DT (the heuristic standard is emphasized later ).
3) scan C in sequence to collect DT exceptions (that is, examples that cannot be determined by DT ).
4) combine W with the discovered accident to form a new W.
5) Repeat 2) to 4) until there is no exception.

Heuristic criteria:
It is only related to its subtree and measured by Entropy Based on Information Theory.
Entropy is a measure of the degree of freedom selected when an event is selected. The calculation method is
P = freq (CJ, S)/| S |;
Info (S) =-sum (p * log (p); the sum () function is used to evaluate the sum of J values from 1 to n.
Gain (x) = Info (x)-infox (X );
Infox (x) = sum (| Ti |/| T |) * Info (X );
To minimize the number of decision trees generated, the ID3 algorithm selects the feature that minimizes the entropy (that is, gain (s) of the generated Decision Tree) to generate a subtree.

§ 4. 3.3: data requirements of ID3 algorithms
1. All attributes must be discrete.
2. All attributes of all training instances must have a clear value.
3. The same factors must get the same conclusion and the training instance must be unique.

§ 3.4: C4.5 improvements to the ID3 algorithm:
1. Improved entropy and added the information of subtree.
Split_infox (x) =-sum (| T |/| Ti |) * log (| Ti |/| T | ));
Gain ratio (x) = gain (X)/split infox (X );
2. Improvement on input data.
1)
The value of a factor attribute can be a continuous value. C4.5 sorts the value and divides it into different sets. Then, it is processed as a discrete value based on the ID3 algorithm. However, the conclusion attribute value must be a discrete value.
2) The attribute values of the factors in the training instance can be uncertain? But the conclusion must be definite.
3. Crop the generated decision tree to reduce the size of the generated tree.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More