Understanding problems from the Information Entropy perspective

Source: Internet
Author: User

Information is a very abstract concept. People often say that there is a lot of information, or there is little information, but it is hard to tell how much information there is. For example, the amount of information contained in a 0.5 million-Word document. In 1948, Shannon proposed the concept of "information entropy", which solved the problem of quantitative measurement of information. Three Kingdoms live entertainment city

The size of a piece of information is directly related to its uncertainty. For example, we need to know a lot about a very uncertain thing or something we know nothing about. On the contrary, if we already have a better understanding of something, we can clarify it without too much information. Therefore, from this perspective, we can consider that the measurement of the amount of information is equal to the amount of uncertainty.

The formula for calculating the information entropy pointed out by Shannon is as follows (the logarithm of this article is always subject to 2 ):

H (x) =-Σ p (xi) log (P (XI) (I = ,.. n) (where p (x) is the probability of occurrence of X events) Unit: Bit

In the beauty of mathematics, I explained the concept of information entropy with how to know who is the champion in 32 teams after the game.

When the probability is equal, the principle of half-lookup is used for each query (for example, "is the champion team between 1 and 16 ?") You can reduce the number of teams by half so that you can know the results five times. Here is log32 = 5.

Using information entropy to calculate the amount of information is indeed 5. But why does the information entropy formula represent the amount of information?

In my understanding, in an equi probability event, 1/p (x) represents all the possible occurrences at that time. In a team problem, there are 32 possibilities.

In an equi probability event, because Σ p (xi) = 1, the information entropy can be viewed:

Information Entropy h (x) =-Σ p (xi) log (P (XI) (I = ,.. n) =-log (P (I) =-(-log (1/p (x) = Log (1/p (x ))

That is to say, the amount of information in a probability event can be viewed:

H (x) = Log (all possibilities)

In order to deepen our understanding of the amount of information and return to the problem of the 32 teams above, we already know that the amount of information is 5 bit.

Once asked, we can know which 16 teams the champion belongs to. That is to say, after we get 1 bit of information, the uncertainty is reduced, which means that the information entropy is changed to log16 = 4bit = 5bit-1bit.

What about the maximum entropy model? The principle is to retain all uncertainties and minimize risks.

The maximum entropy principle points out that when we need to predict the probability distribution of a random event, our prediction should meet all the known conditions, and do not make any subjective assumptions for unknown situations. (It is important not to make subjective assumptions .) In this case, the probability distribution is the most even, and the risk of prediction is the least. This model is called the "Maximum Entropy Model" because the entropy of probability distribution is the largest ".

We often say that "don't put all the eggs in one basket" is actually a simple saying of the maximum entropy principle, because when we encounter uncertainty, we need to keep various possibilities.

That is to say, when uncertain information is found, do not make any subjective assumptions about uncertain products so that their probability distribution is even, so as to obtain the most objective results. At this time, the risk will be minimized, and we can use this result for the most objective decision. In mathematics, it is the optimal lower bound.

The essence of this strategy can be summarized as "making the unknown world accessible ". It has no "weakness", and any branch of the answer is equal probability. On the contrary, once a branch contains more possibilities, you will be depressed when the situation falls onto that branch. The reason why a binary search is good is that it removes half of the possibility every time and can eliminate half of the possibility in any case (it is the best performance in the worst case ).

I will describe the maximum entropy principle with the time complexity of the algorithm. The time complexity of sorting n data using several mainstream algorithms is basically from O (nlogn) to O (n2 ). In general, why is O (nlogn) Optimal? (according to this, the average time complexity of fast sorting is O (nlogn), because the order of n data is random, we can see that uncertainty is equal, and we can use the maximum entropy principle to obtain the optimal (most stable) results. The information entropy is:

H (x) = Log (all possibilities) = Log (N !) While N-> 00 logs (N !) Similar to lognn = nlogn

Suppose that we can get 1 bit data each time, we need to get at least (nlogn) BIT data to cancel the uncertainty of the information, that is, we need to compare nlogn times. However, because different sorting algorithms have different policies, we cannot obtain 1 bit data each time. Therefore, according to the definition of information entropy, this is the best result in theory. The Optimal Sorting Algorithm is to obtain 1 bit data each time. The closer it is to 1, the more effective it is.

Although both fast sorting and heap sorting are time-complexity O (nlogn) algorithms, fast sorting is generally faster than heap sorting, this is because the average information obtained by heap sorting is lower than that obtained by fast sorting.

We didn't mention the specific algorithm above, even though it was the best time complexity. In our real life, although we do not think of specific strategies, we can at least know where the limit is and whether there is room for improvement. Any Algorithm for sorting and guessing can be understood as reducing the original entropy by obtaining information.

Understanding problems from the Information Entropy perspective

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.