The Mysterious Benford law

Source: Internet
Author: User

Count the number of people in the world's 237 countries, how much do you think it will take to start with 1, and what percentage of the numbers start with 9? If your answer is 1/9, congratulations, you're normal, but that's not true: the number starting with 1 is staggering to 27%, and the number starting with 9 is only 5%. It is possible to visualize how much of the number of people in each country has been the beginning of each number. Why does it differ so much? This is precisely the mysterious Benford law that is working.

This Ford's law, also known as the Law of Virtue, shows that a pile of data from real life, the number of the first number of 1 is the probability of the total number of 30%, close to the expected 1/9 of 3 times times, to promote, the larger the number, the number of its leading to the emergence of the lower the probability; The probability of starting with a number n is logb (n + 1) −logb (n).

In decimal, the probability of the first digit appearing is:

d

1

2

3

4

5

6

7

8

9

P

30.1%

17.6%

12.5%

9.7%

7.9%

6.7%

5.8%

5.1%

4.6%

The discovery of this law is said to be due to the fact that in the turn of the tables, Ford found that the front pages were turned black and ragged, and the colors were lighter and darker in the future. So he thought it would be 1. The number that begins is more than the others, and he counted the findings. In fact, this list of things true and false, it is like Newton said he was the Apple hit the head to find the law of gravitation, as long as the last law useful can be.

First, explain the scope of the Ford law.

This law is a very magical law, its scope of application is unusually wide, almost all the daily life of no man-made statistical data are satisfied with this law. For example, the number of countries around the world, national land area, ledger, physical and chemical constants, mathematical physics, the answer behind the textbook, Radioactive half-life and so on are consistent with the Ford law. It is worth mentioning that the scientists also found that the statistical physics of three important distributions, Boltzmann-gibbs distribution, Bose-einstein distribution, Fermi-dirac distribution, but also basically meet the Benford law!

And then, after all, there's a scope for this law.

First, the data must be large enough to span several orders of magnitude to produce this result.

Second, some people for the rule of the data is not satisfied with the law, such as mobile phone number, ID number, invoice number and other data, obviously do not satisfy this logarithmic distribution law. That is to say, this Ford law is just the law that shows up without any restriction, the more it is limited by the generation of data, the more it is not satisfied with the law. Third, the data can not be artificially modified, arbitrary human modification of the data generally do not meet the Ford law, such as the famous Enron company fraud cases, their books did not meet the Ford law, so this mysterious law can even be used to determine whether financial fraud.

So how do you understand this mysterious law? Why would a naturally generated data satisfy such a peculiar law, rather than evenly distributed?

The source of this Ford's law is exponential growth. This graph can be visually displayed, if a variable increases exponentially over time, then the number at the beginning of the variable should be as follows: (The horizontal axis represents the time, the vertical axes represent that variable)

Obviously, at some point you get the probability that it starts at 1 more than 9. And this is a case of just one value, if it is a large amount of data, at some point you observe that he starts with 1 more data than the number of 9. The exponential growth of the form in nature is very common, as long as the growth rate of a variable is proportional to his size, the result will be exponential growth. For example, the speed of human science and technology development is roughly proportional to the existing scientific and technological achievements, so the development of human science and technology is an exponential growth, population growth rate will be proportional to the number of existing population, so the growth of population without resource restriction is also exponential growth. Exponential growth is a very common law of change in nature, and this law of change can directly lead to the law of Ford.

Another kind of intuitive explanation (from Wikipedia) is this

From the number of numbers, the order starts from 1, the three-way,..., 9, from this point of view, all the first chance seems the same, but 9 after the two-digit 10 to 19, the number of 1 first and greatly out of the other number. And before the next pile of 9 first number appears, it is bound to pass a pile of 2,3,4,..., 8 first number. If such a number of methods have an end point, the incidence rate of the first number of 1 is generally greater than 9.

Take all the house number of a city as an example, some street numbers may end in more than 100, some end in more than 500, some end in more than 900. Notice that the 500 end of the street must contain 1, + + and 100~199 These 1 first house numbers, and not contain 9 the beginning of the number of hundred, only 9 and 90+ the number of the beginning of 9, so that the 1 starts with more than 9. Then make a synthesis of all the streets of the whole city, and finally meet the law of Ford.

The above is only an intuitive understanding, if you want to delve into its fundamental principles, you can see its proof

Hill, T. P. "A Statistical derivation of the Significant-digit law." Stat. Sci. 10, 354-363, 1996..

In addition, it is worth mentioning that the Ford law satisfies the scale invariance, that is, if we change a unit of units, the Ford Law is still established. In fact, this can also be used as the statistical data generated by nature to satisfy an explanation of the law: if we replace the original unit is the meter of the statistical data for a unit, such as in feet or meters, then the distribution of statistical data should be unchanged. The only distribution that satisfies this scale invariance should be some kind of logarithmic distribution, which is the Benford law of the protagonist in this paper.

(Reproduced from China Index net)

The Mysterious Benford law

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.