Several important concepts in Information Theory

Source: Internet
Author: User
Information Theory It is an application mathematical discipline that uses probability theory and mathematical statistics to study information, information entropy, communication systems, data transmission, cryptography, data compression, and other issues [from Wikipedia]. The main concepts in information theory include entropy, conditional entropy, KL-divergence, and mutual information ).

Entropy (entropy ):In information theory, we use entropy to represent the uncertainty of an event. We can also think that entropy is a measure of the amount of information required for an event. For example, if a student applies for an intern at Microsoft Asia Research Institute, we want to know if the student will be hire. We record the intern as X if it was hire. If we knew nothing about the student, the uncertainty of the event would be great. In other words, the entropy of the variable X would be great. But if we know that this student is MIT's pH. d. We know that he will be hire in, and the uncertainty of this case will be much smaller. In this case, the entropy of the X variable will also be smaller.
For discrete variables (such as the preceding X), when X is evenly distributed, that is, when X gets the same probability for each value, the entropy of the variable is the largest. For the above example, if you do not know anything about the student, the probability of his being hire or not hire is both 1/2 (assuming that the admission rate of the Research Institute is 50%). At this time, the information entropy is the largest. If the X distribution is uneven, the information entropy is relatively small when the X distribution is concentrated on several values.
For continuous variables, when the variable is Gaussian distribution, the information entropy is the largest. For detailed proof of this point, see Pattern Recognition and machine learning 54.

Conditional Entropy (Conditional Entropy ):Conditional Entropy refers to the amount of information required to determine the value of another variable (x) when the value of a variable (Y) is known. If we use y to indicate a person's height, X to indicate the person's weight, the amount of information required to know a person's weight when he knows his height is the conditional entropy of X (the Conditional Entropy of X given Y) when he knows y ).

Relative Entropy (KL-divergence ):Relative Entropy is used to determine whether two positive functions are similar. For two completely identical functions, their relative entropy is equal to zero [from the beauty of mathematics 7]. In pattern recognition, the function here is generally a variable distribution function. One is the actual distribution function of the variable, such as P (x ). One is our estimated distribution function, such as Q (x ). KL (p | q) is inversely proportional to the similarity between function p (x) and function Q (x). Therefore, we can minimize the relative entropy to enable function Q (x) the approximation function p (x), that is, the distribution function that we estimate, is close to the real distribution function.

Mutual Information (mutual information ):Explanation of mutual information: In math 7, Mr. Zhu Jun of Google spoke vividly: "Mutual Information" is a measure of the correlation between two random events. For example, the random event rain in Beijing is highly correlated with the random variable air humidity, but it has nothing to do with whether Yao Ming's Houston Rockets can win the bulls. That is to say, the mutual information between the rain event and the air humidity variable is large, while the mutual information between the rain event and the result of the basketball match is small.
I am writing a technical blog for the first time. Please forgive me for not explaining clearly.
 
Related information:
1. pattern Recognition and machine learning section 1.6, the electronic book of this book can be downloaded online. If it is not found, you can also tell me your email address by leaving a message, I will send it to you.
2. The beauty of mathematics Series 7: Application of Information Theory in information processing. http://www.googlechinablog.com/2006/05/blog-post_25.html
3. Beautiful mathematical series 4-how to measure information? Http://googlechinablog.com/2006/04/4.html
4. Wikipedia-Information Theory http://en.wikipedia.org/wiki/Information_theory

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.