Some basic knowledge of information theory

Last Update:2015-10-07 Source: Internet

Author: User

Tags id3

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basic Concepts

First of all: in information theory, log log by default refers to the base 2. A lot of books and materials are directly to the Shannon information content formula, as shown in the formula (1), can not give a basic reason why such a form of rationalization. Here is a more straightforward example. Suppose randomly from 0~63, a total of 64 integers randomly and evenly selected one out, and try to use the form of binary to identify, the number of the selected, how many bits do you need at least? About why choose Bits, I think there is no need to explain, every bits is a yes or no of the two-yuan discriminant bit. What we get is the number 8, which means we need 8-bit bits to identify each number.

Here are some common calculation formulas

Self-Information

Federated Self-Information

Conditional self-Information

Information entropy

Entropy is the information expectation of all the possible values of all categories, where the entropy is very similar to the field of physical chemistry, perhaps a lot of people on the concept of entropy is very vague, or even do not understand, entropy is a description of the chaos of things, an indicator, and the nature of things are always towards chaos in the direction of development, So in the isolated system, the entropy is only increased, there is no need to strictly according to the conditions of the physics hypothesis to define, we simply think the higher the entropy, the more chaotic, the lower things more regular can be, like you put a stack of neat paper thrown out, the paper will only become more and more chaotic.

Conditional entropy

Joint entropy

According to the chain rules, there are

Can draw

Information Gain Information Gain

The original entropy of the system is H (X), and the entropy of the system (conditional entropy) is H (x|) when the condition y is known Y), the information gain is the difference between these two entropy values.

Entropy represents the uncertainty of the system, so the greater the information gain, the greater the contribution of the condition Y to the determination of the system. Frankly speaking, the more the initial entropy is very large, who can let me fall the fastest, that is, who can let me to the maximum degree of normalization, who is good, so for the formula (7), the greater the better the difference.

Application of information gain in feature selection

The information gain of the entry W can be directly introduced by the (7) formula, the X in the formula (7) represents the set of categories, Y is the existence of W and there are no two cases

P (CI) is the probability that the class I document appears, P (W) is the proportion of the document that contains w in the entire training set, and P (ci|w) represents the proportion of documents in the document collection that appear w that are part of category I, indicating the proportion of documents in the document collection that do not appear w that are part of category I.

The application of information gain in decision tree

This example is described in detail in the ID3 algorithm, which is the core idea of the ID3 algorithm.

Outlook	Temperature	Humidity	Windy	Play
Sunny	Hot	High	FALSE	No
Sunny	Hot	High	TRUE	No
Overcast	Hot	High	FALSE	Yes
Rainy	Mild	High	FALSE	Yes
Rainy	Cool	Normal	FALSE	Yes
Rainy	Cool	Normal	TRUE	No
Overcast	Cool	Normal	TRUE	Yes
Sunny	Mild	High	FALSE	No
Sunny	Cool	Normal	FALSE	Yes
Rainy	Mild	Normal	FALSE	Yes
Sunny	Mild	Normal	TRUE	Yes
Overcast	Mild	High	TRUE	Yes
Overcast	Hot	Normal	FALSE	Yes
Rainy	Mild	High	TRUE	No

(7) In the form of X means playing and not playing two cases.

Just look at the last column. The probability that we get to play is 9/14, the probability of not playing is 5/14. Therefore, in the absence of any prior information, the entropy (uncertainty) of the system is

Outlook	Temperature	Humidity	Windy	Play
&NBSP;	yes	no	&NBSP;	yes	no	&NBSP;	yes	no	&NBSP;	yes	no	yes	no
sunny	2	3	hot	2	2	high	3	4	false	6	2	9	5
overcast	4	0	mild	4	2	normal	6	1	trur	3	3	&NBSP;	&NBSP;
Rainy	3	2	Cool	3	1

If you select Outlook as the root node of the decision tree, y in the (7) style is set {sunny, Overcast, rainy}, and the conditional entropy at this time is

That is, when you select Outlook as the root node of the decision tree, the information gain is 0.94-0.693=0.247.

The same method calculates the information gain of the system when selecting temperature, humidity, windy as the root node, and selects the largest IG value as the final root node.

Mutual Information Mutual informantion

The mutual information of YJ on XI is defined as the logarithm of the ratio of the posterior probability to the prior probability.

The greater the mutual information, the greater the contribution of YJ to determining the value of XI.

Average mutual information of the system

The visible average mutual information is the information gain!

The application of mutual information in feature selection

Entry W with the mutual information of the category CI for

P (W) represents the percentage of the total number of document points that appear W, and P (W|CI) represents the proportion of the total number of documents in the document point where W is present in the category CI.

For the whole system, the reciprocal information of the entry W is

Finally, the largest first k entries of the mutual information are selected as feature items.

Here again, it is very rare to use the ID3 algorithm to classify, and the C4.5 algorithm based on ID3 optimization is very popular, the C4.5 is not the information gain, but the information gain rate concept, the information gain rate and C4.5 story will be sorted out in the future.

Some basic knowledge of information theory

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More