Information is a very abstract concept. People often say that there is a lot of information, or there is little information, but it is hard to tell how much information there is. For example, the amount of information contained in a 0.5 million-Word document. In 1948, Shannon proposed the concept of "information entropy", which solved the problem of quantitative measurement of information. Three Kingdoms live entertainment city
The size of a piece of information is directly related to its uncertainty. For example, we need to know a lot about a very uncertain thing or something we know nothing about. On the contrary, if we already have a better understanding of something, we can clarify it without too much information. Therefore, from this perspective, we can consider that the measurement of the amount of information is equal to the amount of uncertainty.
The formula for calculating the information entropy pointed out by Shannon is as follows (the logarithm of this article is always subject to 2 ):
H (x) =-Σ p (xi) log (P (XI) (I = ,.. n) (where p (x) is the probability of occurrence of X events) Unit: Bit
In the beauty of mathematics, I explained the concept of information entropy with how to know who is the champion in 32 teams after the game.
When the probability is equal, the principle of half-lookup is used for each query (for example, "is the champion team between 1 and 16 ?") You can reduce the number of teams by half so that you can know the results five times. Here is log32 = 5.
Using information entropy to calculate the amount of information is indeed 5. But why does the information entropy formula represent the amount of information?
In my understanding, in an equi probability event, 1/p (x) represents all the possible occurrences at that time. In a team problem, there are 32 possibilities.
In an equi probability event, because Σ p (xi) = 1, the information entropy can be viewed:
Information Entropy h (x) =-Σ p (xi) log (P (XI) (I = ,.. n) =-log (P (I) =-(-log (1/p (x) = Log (1/p (x ))
That is to say, the amount of information in a probability event can be viewed:
H (x) = Log (all possibilities)
In order to deepen our understanding of the amount of information and return to the problem of the 32 teams above, we already know that the amount of information is 5 bit.
Once asked, we can know which 16 teams the champion belongs to. That is to say, after we get 1 bit of information, the uncertainty is reduced, which means that the information entropy is changed to log16 = 4bit = 5bit-1bit.
What about the maximum entropy model? The principle is to retain all uncertainties and minimize risks.
The maximum entropy principle points out that when we need to predict the probability distribution of a random event, our prediction should meet all the known conditions, and do not make any subjective assumptions for unknown situations. (It is important not to make subjective assumptions .) In this case, the probability distribution is the most even, and the risk of prediction is the least. This model is called the "Maximum Entropy Model" because the entropy of probability distribution is the largest ".
We often say that "don't put all the eggs in one basket" is actually a simple saying of the maximum entropy principle, because when we encounter uncertainty, we need to keep various possibilities.
That is to say, when uncertain information is found, do not make any subjective assumptions about uncertain products so that their probability distribution is even, so as to obtain the most objective results. At this time, the risk will be minimized, and we can use this result for the most objective decision. In mathematics, it is the optimal lower bound.
The essence of this strategy can be summarized as "making the unknown world accessible ". It has no "weakness", and any branch of the answer is equal probability. On the contrary, once a branch contains more possibilities, you will be depressed when the situation falls onto that branch. The reason why a binary search is good is that it removes half of the possibility every time and can eliminate half of the possibility in any case (it is the best performance in the worst case ).
I will describe the maximum entropy principle with the time complexity of the algorithm. The time complexity of sorting n data using several mainstream algorithms is basically from O (nlogn) to O (n2 ). In general, why is O (nlogn) Optimal? (according to this, the average time complexity of fast sorting is O (nlogn), because the order of n data is random, we can see that uncertainty is equal, and we can use the maximum entropy principle to obtain the optimal (most stable) results. The information entropy is:
H (x) = Log (all possibilities) = Log (N !) While N-> 00 logs (N !) Similar to lognn = nlogn
Suppose that we can get 1 bit data each time, we need to get at least (nlogn) BIT data to cancel the uncertainty of the information, that is, we need to compare nlogn times. However, because different sorting algorithms have different policies, we cannot obtain 1 bit data each time. Therefore, according to the definition of information entropy, this is the best result in theory. The Optimal Sorting Algorithm is to obtain 1 bit data each time. The closer it is to 1, the more effective it is.
Although both fast sorting and heap sorting are time-complexity O (nlogn) algorithms, fast sorting is generally faster than heap sorting, this is because the average information obtained by heap sorting is lower than that obtained by fast sorting.
We didn't mention the specific algorithm above, even though it was the best time complexity. In our real life, although we do not think of specific strategies, we can at least know where the limit is and whether there is room for improvement. Any Algorithm for sorting and guessing can be understood as reducing the original entropy by obtaining information.
Understanding problems from the Information Entropy perspective