Entropy of information theory
If the discrete random variable has a P (X) distribution, then X carries the entropy (amount of information):
The reason for using log2 as a base is to make it easy to measure how many bits the information can be represented by. Because 1 bit is not 0 or 1. It can be deduced from the above formula that when the probability of K states is the same, the greater the entropy of the random variable x carries. As indicated by the Bernoulli distribution the entropy carries the result with the probability variation:
KL divergence
KL divergence full name Kullback-leibler divergence , used to measure the degree of dispersion between two distributions. The formula is as follows:
H (p, q) is cross entropy.
KL divergence can be understood as the additional bit that is caused by the use of distribution Q instead of the distribution p to encode the data.
Mutual information
The mutual information is used to measure the KL divergence between P (x, Y) and P (x) p (y), and the expression is as follows, if the divergence between them is greater, the more connections are made. In particular, when divergence is 0, p (x) and P (y) are exactly independent. P (x,y) = P (X) P (Y).
Expressed in another form:
References
Prml
Mlapp
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Cs281:advanced Machine Learning Section II Information Theory information theory