Luo chaohui (http://kesalin.github.io/) CC license, reprinted please keep the signature and source Introduction
Bayesian theorem is an important theory of probability proposed by Thomas Bayes, a British mathematician in the 18th century. Here is an overview of Wikipedia:
The so-called Bayesian Theorem originates from an article he wrote during his lifetime to solve a "inverse overview" problem. This article was published by a friend of his after his death. Before Bayesian wrote this article, people had been able to calculate "Positive Probability", for example, "Suppose there are n white balls and m black balls in the bag. You can reach out and touch them, how likely is a black ball ". A natural problem is the opposite: "If we don't know the ratio of the black and white balls in the bag beforehand, we close our eyes and find one or more balls, after observing the color of the obtained balls, we can speculate on the proportion of the black and white balls in the bag ". This problem is called the inverse probability problem.
The idea of Bayesian Theorem came into being in the 18th century, but the practical use of Bayesian theorem has to wait until the emergence of computers. This theorem requires large-scale data computing and reasoning to highlight the results. It has a great deal in many computer applications, such as natural language spelling checks, machine learning, recommendation systems, image recognition, and game theory.
Definition
Bayesian theorem is about the conditional probabilities of random events A and B:
Where P (A | B) is the possibility of a occurrence when B occurs.
In Bayesian theorem, each term has a common name:
P (A) is the prior probability of A. It is called "A prior" because it does not consider any factors of B. P (A | B) is the conditional probability of a after B is known. It is also called the posterior probability of a because of the value of B. P (B | A) is the conditional probability of B after occurrence of A. It is also called the posterior probability of B because of the value of. P (B) is the prior probability of B. It is also labeled as normalizing constant ).
According to these terms, Bayesian theorem can be expressed:
Posterior Probability = (similarity * prior probability)/standard constant
That is to say, the posterior probability is proportional to the product of the prior probability and similarity.
In addition, proportional P (B | A)/P (B) is also called standardised likelihood. the Bayes Theorem can be expressed as follows:
Posterior Probability = scalar similarity * anterior Probability
Conditional ProbabilityIt is the probability of event a happening when another event B has occurred. The conditional probability is expressed as P (A | B), and is read as "the probability of a occurring under B ".
Joint probabilityThe probability that two events occur together (the intersection of mathematical concepts. The joint probability between A and B is expressed.
Derivation
Bayesian theorem can be derived from the definition of conditional probability.
According to the definition of conditional probability, the probability of event a occurring under event B is:
Similarly, the probability of Event B occurring when event a occurs is:
By combining these two equations, we can get:
This theorem is sometimes called a probability multiplication rule. The two sides of the above formula are divided by P (A). If P (A) is non-zero, we can obtain the Bayesian theorem:
Explanation
Generally, the probability of event a under event B is different from that of Event B Under event a. However, there is a definite relationship between the two, bayesian theorem is a statement of this relationship.
Bayesian formula is used to predict the fourth probability by knowing three probabilities. Its content is: under the premise that B appears, the probability that a appears is equal to the probability that B appears under the premise that a appears, multiplied by the probability that a appears and then divided by the probability that B appears. Associate A and B to calculate the probability of another event in the case of an event, that is, the result is traced back to the source (that is, the reverse probability ).
Generally speaking, when you cannot determine the probability of an event, you can predict the probability of the Event Based on the probability of an event related to the event's essential attributes. The mathematical expression is: the more events that support a certain attribute occur, the more likely the event will occur. This reasoning process is also called Bayesian reasoning.
Example 1: update the prior probability based on the new situation
In Chapter 12th of "decision making and judgment", we all talk about conservative sentiments. Even if new information appears, we are unwilling to update the prior probability based on new information. In the above explanation, the new information is the occurrence of B events. People should update the probability of a event based on this information, however, people are more willing to stick to the probability of a event.
This is a case study:
Assume that there are two boxes with 100 balls each. There are 70 red balls and 30 green balls in Box A, 30 red balls and 70 green balls in Box B. Assume that one of the boxes is selected randomly, and a ball is taken out of the box, and the color of the ball is marked back to the original box. Repeat 12 times to record 8 red balls and 4 green balls. The question is, how likely is the selected Box?
The survey results show that most people underestimate the probability of choosing Box. According to Bayesian theorem, the correct answer is 96.7%. I will analyze and answer the question in detail.
At the beginning, the anterior probability of selecting Box A and Box B is 50%, because it is a random alternative (this is a special form of Bayesian theorem ). That is:
P (A) = 0.5, P (B) = 1-P ();
In this case, if the ball is red, we should update the prior probability of the choice based on this information:
P (A | red ball 1) = P (red ball | A) × P (A)/(P (red ball | A) × P (A) + (P (red ball | B) × P (B )))
P (red ball | A): the probability that a will get the red ball in the box
P (red ball | B): the probability of getting the red ball in Box B
Therefore, in the case of a red ball, the prior probability of the choice of Box A can be corrected:
P (A | red ball 1) = 0.7 × 0.5/(0.7 × 0.5 + 0.3 × 0.5) = 0.7
That is, after a red ball appears, the anterior probability of Box A and Box B being selected is corrected:
P (A) = 0.7, P (B) = 1-P (A) = 0.3;
Repeat this until after 8 red ball corrections (increased probability) and 4 after this green ball correction (reduced probability), the probability of selecting a box is 96.7%.
I wrote a piece of Python code to solve this problem:
Calculate the probability of selecting Box.
1234567891011121314151617 |
Def bayesfunc (pisbox1, pbox1, pbox2): Return (pisbox1 * pbox1)/(pisbox1 * pbox1) + (1-pisbox1) * pbox2) def redgreenballproblem (): pisbox1 = 0.5 # consider 8 red ball for I in range (1, 9): pisbox1 = bayesfunc (pisbox1, 0.7, 0.3) print "after red % d> in a box: % F "% (I, pisbox1) # consider 4 green ball for I in range (1, 5): pisbox1 = bayesfunc (pisbox1, 0.3, 0.7) print "after Green % d> in a box: % F" % (I, pisbox1) redgreenballproblem ()
|
In this investigation, the sequence of eight and four green balls is not important, because the appearance of red balls always increases the probability of selecting a box, the appearance of green balls always decreases. Therefore, in order to simplify programming, I put the appearance of the red ball and the appearance of the green ball together.
The program running result is as follows:
Continuously corrected the anterior probability of Box.
123456789101112 |
After red 1> in a box: 0.700000 after Red 2> in a box: 0.844828 after Red 3> in a box: 0.927027 after Red 4> in a box: 0.967365 after Red 5> in a box: 0.985748 after Red 6> in a box: 0.993842 after Red 7> in a box: 0.997351 after red 8> in a box: 0.998863 after Green 1> in a box: 0.997351 after Green 2> in a box: 0.993842 after Green 3> in a box: 0.985748 after Green 4> in a box: 0.967365
|
From the program running result, we can see that the appearance of the red ball increases the probability of selecting a box, while that of the green ball is the opposite.
Example 2: frequency is more suitable for Solving Probability Problems
Chapter 1 of evolutionary psychology (pp. 13th) describes the evolutionary nature of human psychology and preferences over the frequency of use (I have learned eight times of hunting in the past 10 times) instead of probability (I have a 80% success rate for hunting recently ).
The book mentions the same issue and uses different expressions to make it very difficult:
Statement 1: The incidence of one disease is 1‰, and the hospital has a laboratory technology to diagnose the disease, however, there is another 5% error rate (that is, although 5% of people are not ill, the test results show positive (false positive )). Now, assuming that a person's test results show that he is ill, just according to the test results, how likely is the person to be ill?
This problem can also be solved by Bayesian theorem. However, before looking at the analysis, You can estimate your own answers and then compare them with the correct answers.
The problem is analyzed as follows:
Known prior probability: P (diseased) = 0.001, P (normal) = 0.999;
The accuracy of this test technique (that is, the probability that the results show positive) is: P (accuracy) = 1.00;
The misdiagnosis rate of this test technique (that is, the probability that normal test results show positive) is: P (misdiagnosis rate) = 0.05.
Based on the above data, we can predict the probability of a person having a positive test:
P (diseased | positive) = P (diseased) × P (accurate)/(P (diseased) × P (accurate) + P (normal) × P (misdiagnosis ))
= 0.001 × 1.00/(0.001 × 1.00 + 0.999 × 0.05)
= 0.0198
= 2%
The results surprise you. If no other symptoms increase the probability of illness, the actual probability of illness is less than 2% if the test results show positive alone.
Using frequency as information to remember or recall is more vivid and easier to extract. Think about what the first hunting is and what the second hunting is like. Because frequency, as a carrier of information storage, retains the image of events and improves the availability of memory, human psychological mechanisms prioritize frequency rather than abstract probability during evolution. In addition, during the evolution of human beings for more than ten years, the civilized process of probability concept has not evolved to the point where abstract probability is more adapted.
So if this question is expressed by frequency, I believe your answer will be much closer to the correct answer.
Statement 2: one out of one thousand people suffers from X disease (that is, the incidence is 1‰) and there is a laboratory technique that can be used to test whether the disease exists. If a person does have the disease, the test results may be positive. However, there may also be Misdiagnosis, that is, fifty of the one thousand completely healthy people showed positive results (that is, the misdiagnosis rate was 5% ).
To express this question in frequency mode, the answer is obviously easy to understand:
P (diseased | positive) = 1/(1 + 50) = 1/51 = 0.0196 = 2%
Through this example, we can understand that if we can convert the probability problem into a frequency expression, we can easily solve the problem even if we need to use a complex theorem like Bayesian for calculation. Is your light on? The "Repeat problem" technique mentioned in it.
Example 3: Application in Game Theory
Challenger B does not know whether the original Monopoly A is a high obstruction cost (the cost spent to prevent B from entering) or a low obstruction cost type, but B knows, if a is a type of high obstruction cost, the probability of a obstruction when B enters the market is 20% (the obstruction cost is high, so the obstruction probability is low); If a is a type of low obstruction cost, when B enters the market, the probability of a blocking is 100%.
At the beginning of the game, B thought that a was a high-obstruction-cost enterprise with a probability of 70%. Therefore, B estimated that when he entered the market, he was blocked by:
P (Obstruction) = 0.7 × 0.2 + 0.3 × 1.0 = 0.44
0.44 is the probability that a may obstruct a given the prior probability of A's type.
When B enters the market, if a does obstruct it. According to Bayesian theorem, B can correct the probability that a is a high-obstruction-cost enterprise as follows ::
P (high-cost enterprises) = 0.7 × 0.2 × 0.44 × 1.0 × 0.32 =
Based on this new anterior probability, B estimates that when he enters the market, the probability of being blocked by a is:
P (Obstruction) = 0.32 × 0.2 + 0.68 × 1 = 0.744
If B once again entered the market, a again blocked. According to Bayesian theorem, B can correct the probability that A is an enterprise with high obstruction costs.
P (high-cost enterprises) = 0.32 × 0.2 × 0.744 × 1.0 × 0.086 =
In this way, according to a's obstruction behavior again and again, B constantly corrects and judges a's probability of high obstruction cost (lower and lower), and thus tends to regard a as a low obstruction cost enterprise.
Example 4: Application in the computer field
There are too many applications of Bayesian theorem in the computer field. It is useful in machine learning, natural language processing, image recognition, recommendation algorithms, search algorithms, and spam processing. These applications all have a feature, that is, finding the items with the highest matching degree (that is, the probability of occurrence) in a large existing database based on existing input. These topics are too big to be discussed here.
XU Yu translated an article by Peter norvig about how to write a spelling checker. The author only used 20 lines of Python code to perform spelling check/correction, which is quite powerful. The article is also written in depth. We recommend that you take a look at the translation version here.
Reference
- Wikipedia Bayesian Theorem
- Mbalib Bayesian Rule
- Decision Making and judgment
- Evolutionary Psychology
- The beauty of Liu weipeng's mathematics: an ordinary and magical Bayesian Method
- Peter norvig, Xu xiaotranslated how to write a spelling checker
Ddd Reprinted from: http://www.cnblogs.com/kesalin/p/bayes_rule.html
Start with Bayesian Theorem