ID3 algorithm of "machine learning" decision Tree (2)

Source: Internet
Author: User
Tags id3

ID3 Algorithm of decision tree

Content

1.ID3 Concept 2. Information Entropy 3. Information Gain information Gain4. ID3 BIAS5. Python algorithm Implementation (tbd) one, ID3 concept

The ID3 algorithm was first developed by Roscun (J. Ross Quinlan) in 1975 at the University of Sydney proposed a classification prediction algorithm, the core of the algorithm is "information entropy." The ID3 algorithm calculates the information gain of each attribute, considers that the information gain is good attribute, the attribute with the highest information gain is divided into criteria, and repeats the process until a decision tree is created that can perfectly classify the training sample.

The decision tree is the classification of the data, so as to achieve the purpose of prediction. This decision tree method first forms the decision tree according to the training set data, if the tree cannot give the correct classification for all objects, then select some exceptions to join the training set data, repeat the process until the formation of the correct decision set. The decision tree represents the tree structure of the decision set.

The ID3 algorithm is a greedy algorithm used to construct decision trees. The ID3 algorithm originates from the Concept Learning System (CLS), which uses the declining speed of information entropy as the criterion for selecting the test attribute, that is, to select the attribute with the highest information gain that has not yet been used for partitioning in each node, and then continue the process until the resulting decision tree can perfectly classify the training sample.

首选分类方法:降低随机性——to a low entropy 红归红,绿归绿回到classification 的本质,把混合的东西分开!属于:增大信息增益——每次差异越大越好
P.S. Greedy algorithm:

Greedy algorithm (also known as greedy algorithm) refers to, in the problem solving, always make the best choice at present. In other words, not considering the overall optimality, he makes a local optimal solution in a sense.

The principle of the Ames Razor: Do not add entities if not necessary

The ID3 algorithm is one of the decision trees, based on the principle of the Ames Razor, which is to do more with less. ID3 algorithm,

namely iterative Dichotomiser 3, iterative binary tree 3 generation, is the Ross Quinlan invented a decision tree algorithm, this

The algorithm is based on the above-mentioned principle of the OCA razor, the smaller the decision tree is better than the big decision tree, however, not always

is to generate the smallest tree structure, but rather a heuristic algorithm.

In information theory, the smaller the expectation, the greater the information gain and the higher the purity. The core idea of ID3 algorithm is to use information

Gain to measure the selection of attributes, select the attribute with the greatest information gain after splitting to split. The algorithm uses top-down greedy search through

Possible decision-making space in the calendar.

Second, information entropy Entropy

A method of measuring randomness by entropy entropy--

The concept of entropy originated in physics and was used in physics to measure the degree of disorder of a thermodynamic system, and in informatics, entropy

is a measure of uncertainty. In the 1948, Shannon introduced the entropy of information as the probability that discrete random events appeared, and a system became more

is orderly, the information entropy is lower, conversely, the more chaotic a system, the higher the information entropy. So information entropy can be thought of as an orderly system.

A measure of the degree of

\[h (x) =-\sum_{i=1}^{n} p_{i} log_{2} p_{i} \]

Third, information gain information Gain

The information gain is for one characteristic, that is, to see a characteristic, the system has it and the amount of information when it is not, both

The difference is the amount of information that this feature brings to the system, that is, the message gain.

The following is an example of the weather forecast. The following is a description of the Weather data table, the learning target is play or not play.

As can be seen, a total of 14 samples, including 9 positive and 5 negative examples. So the entropy of the current information is calculated as follows

\[Entropy (S) =-\frac{9}{14} log_2 \frac{9}{14}-\frac{5}{14} log_2 \frac{5}{14} = 0.940286\]

In the decision tree classification problem, the information gain is the difference between the decision tree and the information before and after the attribute selection is divided. Assuming the use

Attribute Outlook to categorize, then such as

Photo:outlook classification

After dividing, the data is divided into three parts, then the information entropy of each branch is calculated as follows

\[Entropy (Sunny) =-\frac{2}{5} log_2 \frac{2}{5}-\frac{3}{5} log_2 \frac{3}{5} = 0.970951\]

\[Entropy (Overcast) = 0\]

\[Entropy (rainy) =-\frac{3}{5} log_2 \frac{3}{5}-\frac{2}{5} log_2 \frac{2}{5} = 0.970951\]

Then the information entropy is divided into

\[Entropy (s| T) = \frac{5}{14}entropy (Sunny) +\frac{4}{14}entropy (Overcast) +\frac{5}{14}entropy (rainy) = 0.693536 \]

Entropy (s| T) represents the conditional entropy of the sample under the condition of the characteristic attribute T . The resulting information gain from the characteristic attribute T is

$$ IG (T) = Entropy (S)-Entropy (s| T) = 0.24675

Information Gain Calculation formula:

\[IG (s| T) = Entropy (S)-\sum_{value (t)} \frac{| s_v|} {S} Entropy (S_v) \]

Where s is the entire sample set,value (t) is a collection of all values of the property T , andv is t one of the property values, \ (s_v\) is s Of the property T in the

A sample collection with a value of v , \ (| s_v|\) is the number of samples contained in \ (s_v\).

Before each non-leaf node of the decision tree is divided, the information gain of each attribute is calculated, and the properties of the maximum information gain are selected to delimit

, because the greater the information gain, the stronger the ability to differentiate the sample, the more representative it is, it is obviously a top-down greedy strategy. Above

Is the core idea of the ID3 algorithm .

Iv. ID3 biasreview: Search Space Algorithm deviation

1.restriction bias Limit deviation

Limited to set hypothesis set

2.preference Bias Preferred deviation

The source of the hypothesis that informs the preferred hypothesis set

    • ID3 bias--inductive bias Inductive deviation

1.good splits near the top
2.correct over incorrect (biased towards the right decision tree rather than the wrong decision tree)
If a decision tree has a very good split at the top, but generates the wrong answer, it will not be selected.

P.s. Looks stupid, but that's engineering thinking. Must be quantifiable and enforceable. Let the machine execute.

3.shorter trees (results from the first natural result)

Reference:
http://blog.csdn.net/acdreamers/article/details/44661149

Sentiment: Learning communication Theory and information theory help the machine learning algorithm very much

ID3 algorithm of "machine learning" decision Tree (2)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.