Geometric distribution, two distribution and Poisson distribution of "data analysis" statistics

Source: Internet
Author: User

Geometric distribution of statistics, distribution of two items and Poisson distribution

Author Bai Ningsu
August 4, 2015 13:08:28

absrtact: in this paper, according to the discrete chapters of the study of statistics, during the course of studying discrete mathematics and probability theory, the author always thinks that software development and mathematics have a connection, so it is useless to learn it. However, into the data analysis, big Data processing only to find its importance. How to calculate and use probability distribution, using probability tree to increase the computational complexity, there is no better calculation method? In this paper, we introduce some special probability distributions, which have a fixed form, and we know that these patterns can be used to solve the problems of probability, expectation, and poor-side quickly. This article thought uses the individual analysis, the integration Combs, the formula realizes, the addition expands the way. Firstly, the basic problems of concept, formula and meaning are combed out because of practical problems. Then, according to its advantages and disadvantages and the applicable environment, the cyclic analysis of various distributions is gradual. Finally, the three distribution of the difference between the summary, the results of the actual case and the current application to end. This article is original, reproduced to indicate the source.

Directory:
  • Review of the problem cited
  • Geometric distribution
  • Two item distributions
  • Poisson distribution
  • Summary of this chapter
  • Content Extensions
  • Reference documents
First, review the issue of problem?

Xiao Ming Ski: each time (independent event) test slip Success probability 0.2, the probability of unsuccessful 0.8.

Success failed
0.2 0.8

1, two times the probability of successful test slip?
2, the probability of a test slip or two guesses?
3, Test slip 10,000 times, the probability of the first success?
4, Test slip The probability of success above the 10,000th time?

Probability tree:

Answer:1. Probability Tree seeking probability

Set x final Test slip success number, then:
P (x=1) =p (1th test slip successful) =0.2 " note : The probability of a successful test slip "
P (x=2) =p (1th test slip failure and 2nd Test slip success) =0.2 * 0.8=0.16 " note : Test slip two chance of success "
P (x<=2) =p (x=1) +p (x=2) =0.36 "
note
: The probability of a test slip or two guesses "

2, Test slip 10,000 times, the probability of the first success?

$$
P (x=10000) =q^{10000-1}p=0.8^{9999}*0.2
$$

3, test slip The probability of success above the 10,000th time?

$$
P (x>10000) =q^{10000}p=0.8^{10000}
$$

Geometric distribution 1, concept

What is geometric distribution?

"Baidu Encyclopedia" geometric distribution is a discrete probability distribution. In the N-time Bernoulli test, the first chance of success was obtained by testing k times. In detail, is: The first k-1 times have failed, the probability of K-times success.
"Textbook" If p represents the probability of success, then 1-p that Q represents the probability of failure using the following:

The formula is called the geometric distribution of probabilities.

2. Condition, majority, formula, variance, expectation
    • Geometric distribution conditions :
      1, a series of independent experiments.
      2, each experiment has both success and failure, and the probability of success of a single experiment is equal.
      3, in order to achieve the first successful need to conduct many experiments.
    • The majority:
      The majority of any geometric distribution is 1, because when R=1, P (x=1) is the largest
    • Expression (x conforms to the geometric distribution, where success probability p):
      x ~ G (p) or x ~ Geo (P)
    • Calculation formula: (Success probability is p, failure probability is Q, test number is R)
      1, the first R test success: P (x=r) =pq^{r-1}
      2, need to test r times more than the first success: P (X>r) =q^r
      3, Test r times or less than r times to succeed for the first time: P (x<=r) =1-q^r
    • Calculate Variance and Expectations:
      Expected: E (X) =1/p
      expected features : as x becomes larger, the cumulative total and closer to a specific value.
      Variance: Var (X) =q/p^2
      Variance Characteristics : as x becomes larger, the variance becomes closer to a specific value
3. Advantages and Disadvantages
  • Advantages:
    Simplifying the calculation of probability, mathematical expectation and variance
  • Disadvantage: The number of trials must be successful. or the success and failure events are not independent.
4. Example
    • Applied Sciences: Mathematics and related fields
    • Scope of application: Natural mathematics, Applied Mathematics, Advanced mathematics, probability theory
    • Shooting competitions, etc.
5. Core code
    /** * in the N-time Bernoulli test, the probability of the first success of the Test R Times p (x=r) =pq^{r-1} * @param p double type retains one decimal place, indicating the probabilities of success * @param q double retains one decimal place, indicating the probability of failure that is 1-p * @param r Integer, number of experiments * @return PX double type reserved Two decimal places, first chance of success */public static double firstsuccess (double p,double q,int R) {do    Uble px=0;    Double k= (double) (r-1);    px= p* (Math.pow (q, k)); return PX;} /** * in the N-time Bernoulli test, it is necessary to test r more than the first success: P (x>r) =q^r * @param q double type retains one decimal place, indicating the probability of failure is 1-p * @param r Integer, number of experiments * @return PX double type    Keep two decimal places, need to experiment r times above before first successful */public static double moresuccess (double q,int r) {double px=0;    px= Math.pow (q, R); return PX;}  /** * in the N-time Bernoulli test, the test R or less than r times first success: P (x<=r) =1-q^r * @param q double Type a decimal number, indicating the probability of failure is 1-p * @param r Integer, the number of experiments * @return MOREPX    Double type reserved two decimal places, need to experiment r times above before first successful */public static double lesssuccess (double q,int r) {double morepx=0;    morepx= Math.pow (q, R);    Double px=double.valueof (1.0-MOREPX);    return PX;} /** * in the N-time Bernoulli test, the geometric distribution of the desired E (X) =1/p * @param p double type retains a decimal number, indicating the probability of success * @return EX double type retains two decimal places, the desired */public of the geometric distribution is the static DOuble expectation (double p) {double ex=0;    Ex= 1.0/p; return EX;} /** * in the N-time Bernoulli test, the variance of the geometric distribution Var (X) =q/p^2 * @param p double retains one decimal place, indicating the probability of success * @param q double retains one decimal place, indicating the probability of failure that is 1-p * @return VX Dou    The BLE type retains two decimal places, the variance of the geometric distribution */public static double Variance (double p,double q) {double vx=0;    vx= Q/math.pow (P, 2); return VX;}
Two items distribution 1, concept

What is a two-item distribution?
"Baidu Encyclopedia" two-item distribution is repeated n times independent Bernoulli test. There were only two possible outcomes in each trial, and the two outcomes were antagonistic and Independent, unrelated to the results of the other tests, and the probability of the occurrence or absence of the event remained unchanged in each independent test.

"Textbook" In the Mutual independent event, each question answer probability is p, answer the wrong probability is Q. The probability of answering the r question in the n question is: This type of problem is called a two-item distribution.
"Statistics define two distributions" in probability theory and statistics, the two-item distribution is a discrete probability distribution of the number of successes in N Independent/non-trials, where the probability of success for each trial is p. Such a single success/failure test is also called a Bernoulli test. In fact, when n = 1 o'clock, two distributions are Bernoulli distributions, and two distributions are the basis of two trials with significant differences.

2. Condition, expression, two-point distribution, formula, variance, expectation
    • Conditions :
      1. A series of independent tests is under way;
      2. Each trial has the possibility of failure and success, the probability of success of each trial is the same;
      3. Limited number of trials.

    • expression (number of trials N, success probability P):
      Ξ~b (N,P)

    • Two-point distribution:
      When N=1, remember that X ~ B (1,p) is the two-point distribution.
    • Two item distribution shape features :
      p<0.5 the graph to the right, and when p>0.5, the graph shifts to the left.
    • Calculate the probability formula:
      which
    • Expected:E (X) =NP
    • Variance: Var (X) =npq (where q=1-p)
3. Advantages and Disadvantages

Advantages: When the number of trials is certain, when the number of success, the geometric distribution is not suitable for the situation, given the two distribution of such problems can be better solved.
Disadvantage: But the face of the test times is not fixed, the occurrence of event probability, it is obvious that the geometric distribution and two distribution can not be solved, here also reflects the advantages of Poisson distribution

4. Example
    • A certain period of time was born in 35 infants, of whom 19 were female (sex=0), 16 men (sex=1). Q. Is there a difference between the sex ratio of a baby born in this place and the usual ratio of male to female sex (the overall probability is about 0.5)? The data is shown in table 10-2. Two-item test of the sex of 35 infants (see SPSS demo)
    • N tests are carried out under the same conditions, the results of each observation unit are independent, and only one result is antagonistic to each other, and two distributions are commonly used in the field of medicine.
5. Core Code Implementation
    /** * in the N-time Bernoulli test, the probability of r times occurring in N independent Bernoulli trials is * P (x=r) =n-c_r*p^{r}*q^{n-r} and n-c_r=n!/r!* (N-R)! * @param n int, which represents the total independent event * @param R int, represents the occurrence of R times * @param p double retains one decimal place, indicating the probability of success * @param q double retains one decimal place, indicating the probability of failure as 1-p * @ret    Urn PX double type retains two decimal places, the first chance of success */public static double rsucess (int n,int r,double p,double q) {double px=0;    Double k= (double) (N-R);    int kk=n-r;    NCR is n-c_r=n!/r!* (N-R)!    Double ncr=numformat.factorial (n)/(Numformat.factorial (R) *numformat.factorial (KK));    px=ncr* (Math.pow (P, R)) * (Math.pow (q, k)); return PX;} /** * in the N-time Bernoulli test, the two-item distribution expects E (X) =NP * @param n int, indicating the number of trials * @param p double retains one decimal place, indicating the probability of success * @return EX double type retains two decimal places, the period of the geometric distribution    Wang */public static double expectation (int n,double p) {double ex=0;    ex= double.valueof (n) *p; return EX;} /** * in the N-time Bernoulli test, the variance var (X) =NPQ * @param n int of the two-item distribution, indicating the number of trials * @param p double retains one decimal place, indicating the probability of success * @param q double retains one decimal place, indicating that the failure Rate is 1-p * @return VX double type retains two decimal places, two-item distribution variance */public static double Variance (int n,double p,double q) {double vx=0;    vx= double.valueof (n) *p*q; return VX;}
Poisson distribution 1, concept

"Textbook" Individual events occur randomly at a given interval, with the average number of known events occurring and a finite number of times, calculated by the following calculation: $$ P (x=r) = {E^{-λ}λ^r\over r!} $$ A class of events called Poisson distributions.
Characteristics
1. No series of tests are required to describe the number of occurrences of a particular interval.
The addition of 2, two independent Poisson distributions also conforms to the Poisson distribution. (i.e. n>50 and p<0.1 or NP approximation equals NPQ)
3, under certain conditions can be used to approximate the substitution of two distribution.

2. Condition, expression, characteristic, formula, number, variance, expectation
    • Conditions :
      1. Individual events occur randomly and independently within a given interval, and the given difference can be time or space. (a week, a mile)
      2. The average number of occurrences (incidence) of events within the interval is known, and is a finite number. The average number of occurrences of the event is expressed in λ.
    • Expression (average number of occurrences in interval λ):
    • Poisson Distribution shape features:λ hour, distribution to the right skew, when λ large, distribution gradually symmetrical.
    • Calculation probability (e constant 2.718, average number of occurrences is λ, Interval R events):
      $$ P (x=r) = {E^{-λ}λ^r\over r!} $$
    • The majority:
      λ is an integer, then there are two majority λ and λ-1, if not integers, the majority λ.
    • Expected: E (X) =λ
    • Variance: Var (X) =λ
    • Independent random variables are combined:
    • What is the relationship between Poisson distribution and two-item distribution?
      When N of two distributions x~b (N,P) is large and p is very small, the Poisson distribution can be approximated as two distributions, of which λ is NP. Normally, when n≧10,p≦0.1,np<=5, the Poisson formula can be used to approximate the calculation, and x can be approximated to represent X~PO (NP).
Question: Why is n big enough for p to be small enough?

Because there is a hypothesis in the time window: at most one passenger per time window arrives. (Time interval passenger problem)

3. Advantages and Disadvantages

There is no need for a series of tests to describe the number of occurrences of a particular interval, especially. In addition, the substitution of two-item distributions under certain conditions brings simple operation.

4. Example
    • Applied disciplines: probability theory
    • A service facility in a certain period of time to reach the number of calls, the number of telephone exchanges, the number of visitors to the car platform, the number of failures of the machine, the number of natural disasters, a product defect, under the Microscope Unit division within the number of bacteria distribution.
    • In the application of traffic engineering, the prevalence and spread of SARS follows Poisson distribution
    • There is a widespread phenomenon of Poisson distribution in natural phenomena, which mainly refers to the number of rare events occurring in a large number of repeated experiments.
5. Core Code Implementation
    /** * 泊松分布的概率P(X=r) = {e^{-λ}λ^r\over r!(e常数2.718,平均发生次数为λ,区间内r次事件) * @param e常数2.718 * @param λ 整型,平均发生次数 * @param r 整型,区间内r次事件 * @return PX double型保留两位小数,泊松分布的概率 */public static double BosongSuccess(int λ,int r){    double PX=0;    double e=2.718;    PX= Math.pow(e, -Double.valueOf(λ))*Math.pow(λ, r)/NumFormat.factorial(r);    return PX;}/** * 泊松分布的期望E(X)=λ * @param λ double型保留两位小数,表示平均发生次数为λ * @return VX double型保留两位小数,泊松分布的期望 */public static double Expectation(double λ){    double EX=0;    EX= λ;    return EX;}/** * 泊松分布的方差Var(X)=λ * @param λ double型保留两位小数,表示平均发生次数为λ * @return VX double型保留两位小数,泊松分布的方差 */public static double Variance(double λ){    double VX=0;    VX= λ;    return VX;}
Ii. summary of the chapter geometric distribution

Application conditions:
Perform a series of independent tests , each successful or unsuccessful, with the same probability of success each time. Objective: To determine how many trials are required for the first success.
expression (x conforms to the geometric distribution, where success probability p):
X ~ Geo (P)
The probability formula of geometric distribution is established:
1, the first R test success: P (x=r) =pq^{r-1}
2, need to test r times more than the first success: P (X>r) =q^r
3, Test r times or less than r times to succeed for the first time: P (x<=r) =1-q^r
Expected variance:
E (x) =1/p and Var (x) =q/p^2

Two item distributions

Application conditions:
A series of independent trials with limited number of times , each successful or unsuccessful, with the same probability of success each time. Objective: How many times were successful in the nth experiment.
Expression (x conforms to two distributions, N is the number of trials, where success probability p):
X ~ B (n,p)
Two-point distribution:
When N=1, remember that X ~ B (1,p) is the two-point distribution.
the two-item distribution probability formula is established:
which
Expected variance:
E (x) =NP and Var (x) =NPQ

Poisson distribution

Application conditions:
A single event occurs randomly and independently within a given interval, and the average number of occurrences of a given interval event is known to be limited. Objective: To determine the number of events occurring within a given interval.
Expression (x meets Poisson distribution, where success probability p):
X ~ Po (λ)
The probability formula for Poisson distribution is established:

$$ P (x=r) = {e^{-λ}λ^r\over r!} $$ expected variance: E (x) =λ and Var (x) =λ if X~po (λx), Y~po (λy) and X and Y are independent, then X+Y~PO (λ_x+λ_y) if X~b (n,p) The n is large and the P is very small, and x can be approximated to represent X~PO (NP).

Relationship between Poisson distribution and two-item distribution and normal distribution
    • Poisson distribution instead of two-item distribution
      When n is large and p is very small, the X~PO (NP) approximation can be substituted for x~b (n,p). (N>50 and p<0.1) or (q approximately 1 and n is large, NP approximation equals NPQ)
    • Normal distribution instead of Poisson distribution
      If X~po (λ) and λ>15, it can be approximated with x~n (λλ)
    • Normal distribution instead of two-item distribution
      Two-item distribution X~b (N,P), when Np>5 and nq>5, the normal distribution replaces the two-item distribution. (Continuity revisions are required)
      Amendment
      Less than equals: P (x<=a) continuous scale a+0.5 i.e. P (x<a+0.5)
      Greater than equals: P (x>=b) continuous scale a-0.5 i.e. P (x>b-0.5)
      Between: P (a<=x<=b) continuous scale is P (a-0.5<=x<=b+0.5)

Summary : small increase and decrease

Third, content expansion
    • Bernoulli test : A series of repeated independent tests, each of the results of only two, one result of the probability is always p, another result is always Q, called Bernoulli test.
    • N-Heavy Bernoulli test : The Bernoulli test is repeated n times independently under the same conditions.
    • Two-point distribution : The random variable x can be only 0 or 1, where 0<p<1, then X is called the two-point distribution with the parameter p, remembering X~b (1,p).
    • Distribution classification
      Continuous random distributions: normal distribution, uniform distributions, exponential distribution, logarithmic normal distribution, Cauchy distribution, gamma distribution, Rayleigh distribution, Weber distribution
      discrete random distribution: Two- item distribution, geometric distribution, hypergeometric distribution, Poisson distribution
      three large sampling distributions: Chi-square distribution, F-distribution, T-distribution
Iv. Reference Documents

1. Geometric distribution random function
2. MATLAB generates random number function
3. Probability theory 05 discrete distribution
4, SPSS Eight commonly used non-parametric test two: two distribution (binomial) test
5, exponential distribution and Poisson distribution of random values of the generation program principle analysis
6, a few common distribution
7. Statistics in layman's

Geometric distribution, two distribution and Poisson distribution of "data analysis" statistics

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.