Probability Theory and mathematical statistics,

1. Random Events

Deterministic phenomenon: a phenomenon that inevitably occurs under certain conditions is called a deterministic phenomenon. Features: conditions completely determine the results.

Random phenomenon: a phenomenon that may or may not occur under certain conditions is called a random phenomenon; feature: the condition cannot completely determine the result.

Random phenomena are investigated through random experiments. A test with the following three characteristics is called a random test:

(1) it can be repeated under the same conditions;

(2) Each test may produce more than one result, and all possible results of the test can be specified in advance;

(3) Before performing an experiment, you cannot determine which result will appear.

Sample Space and sample points: define a set of all possible results of random test E, called the sample space of E, as $ \ Omega $. The elements in the sample space, that is, each result of test E, is called the sample point $ \ omega $.

**Random Event: a subset of the sample space of random test E is called a random event of E.**

**For a sieve drop test: The sample space is {1, 2, 3, 4, 5, 6}, each element is the sample point, and "probability greater than 3" is a random event. Therefore, $ \ Omega \ ge A \ omega I $**

2. Relationship between random events

Event intersection: $ when event A and Event B occur at the same time, such an event is called intersection or product and recorded as A \ cap B or AB $;

Event occurrence: $ event A and Event B have at least one occurrence, that is, A collection composed of all the sample points of A and B, which is called Sum and is recorded as A \ cup B $;

Event inclusion: $ event A contains event B, which is recorded as A \ supset B $;

Event equality: $ event A is equal to Event B and is recorded as A = B $

Event mutex: $ if the intersection of event A and Event B is empty (AB = \ phi), A and B are mutually exclusive. $;

Event difference: $ event A occurs while B does not, recorded as A-B $;

Confrontation between events $ if event A and Event B have only one occurrence, and their union is the entire set (A \ cup B = \ Omega, and A \ cap B = \ phi) $

The independence of random events is the basic premise assumption of various mathematical models.

2. Regularity of random events-Probability

** **Frequency definition: n tests were conducted under the same conditions. In the n tests, the number of times event A occurred $ n_A $ is called the frequency of event, the ratio $ \ frac {n_A} {n} $ is the frequency of event A, and is recorded as $ f_n (A) $

**Frequency is not Probability**

Probability of random event A: Generally, in A large number of repeated experiments, if the frequency of event A m/n is stable in A constant p attachment, the constant p is called the probability of event A, and is recorded as $ P (A) = p $

Probability properties:

(1) for any event A, there are: $0 \ le P (A) \ le 1 $

(2) For inevitable event A and impossible event B, $ P (inevitable event) = 1 $, $ P (impossible event) = 0 $

(3) For mutually exclusive events $ A_1, A_2 ,..., a_n, with P (A_1 \ cup A_2 \ cup... \ cup A_n) = P (A_1) + P (A_2) +... + P (A_n) = P (A) $, indicating the probability of $ P (A_n) $ as event

(4) $ P (\ overline A) = 1-P (A) $

(5) $ A \ subset B, P (A) \ ge P (B) $

Event independence and conditional probability:

Set A and B as two events, and $ P (A)> 0 $, called $ P (B | A) =\ frac {P (AB )} {P (A)} $ indicates the probability of Event B occurring under event;

If A and B are set to two events and the formula $ P (AB) = P (A) P (B) $ is met, event A and Event B are independent.

Set $ A_1, A_2 ,..., a_n is n events $. If they are mutually exclusive, $ P (A_1 A_2... a_n) = P (A_1) P (A_2 )... P (A_n) $

The following formula (extremely important ):

(1) Addition formula:

$ P (AUB) = P (A) + P (B)-P (AB) $

$ P (AUBUC) = P (A) + P (B \ cup C)-P (A \ cap B) U (A \ cap C) = P () + P (B) + P (C)-P (BC)-P (AB)-P (BC) + P (ABC) $

(2) subtraction formula:

$ P (A-B) = P (A)-P (AB) $

(3) multiplication formula:

$ When P (A)> 0, P (AB) = P (A) P (B | A) $

$ When P (A_1 A_2... when A_n)> 0, P (A_1 A_2... a_n) = P (A_1) P (A_2 | A_1 )... P (A_n | A_1 A_2... A _ {n-1}) $

(4) Full probability formula [prior probability formula]:

Set $ B _1, B _2 ,..., B _n meets \ cup _ {I = 1} ^ {n} B _ I = \ Omega, B _iB_j = \ phi (I \ neq j) and P (B _ I)> 0 $, for any event:

$ P (A) = \ sum _ {I = 1} ^ {n} P (B _ I) P (A | B _ I) $

(5) Bayesian formula [posterior probability formula]:

Set $ B _1, B _2 ,..., B _n meets \ cup _ {I = 1} ^ {n} B _ I = \ Omega, B _iB_j = \ phi (I \ neq j) and P (B _ I)> 0 $, for $ P (A)> 0 $, there are:

$ P (B _j | A) = \ frac {P (B _j) P (A | B _j)} {\ sum _ {I = 1} ^ {n} P (B _ I) P (A | B _ I)} $

Ii. Random Variables and their probability distribution 1. Random Variables

Definition: The real-Value Function X = X (\ Omega), \ omega \ in \ omega on the sample space $ \ Omega, which is called X (\ omega) as a random variable and is recorded as X $

2. Distribution Functions

Definition: For any real number x, the recording function $ F (x) = P \ {X \ le x \},-\ infty <x <+ \ infty, called F (x) is the distribution function of the random variable X. The value of F (x) is equal to the probability that the random variable X has a value in the range (-\ infty, x, probability of event "X \ le x" $

Obviously, F (x) has the following properties:

(1) $0 \ le F (x) \ le 1 $

(2) $ F (x) is a monotonic non-subtraction function, that is, when x_1 <x_2, F (x_1) \ le F (x_2) $

(3) $ F (x) is right continuous, that is, F (x + 0) = F (x) $

(4) $ for any x_1 <x_2, P \ {x_1 <X <x_2 \} = F (x_2)-F (x_1) $

(5) $ for any x, P \ {X = x \} = F (x)-F (x-0) $

3. Probability Distribution of discrete random variable X

The possible values of random variable X are $ x_1, x_2 ,..., x_n $, the possible probability of getting X is $ P \ {X = x_k \} = P_k, k = 1, 2 ,.. $ is the probability distribution or distribution law of discrete random variable X.

4. Continuous Random Variables and their probability distribution

For the distribution function $ F (X) of random variable x, there is a non-negative product function f (x), so that any function x has F (x) = \ lmoustache _ {-\ infty} ^ {x} f (t) d (t),-\ infty <x <+ \ infty $, which is called X as a continuous random variable, the function f (x) is called the probability density of X.

The properties of the probability density function f (x:

(1) $ f (x) \ ge 0 $

(2) $ \ lmoustache _ {-\ infty} ^ {+ \ infty} f (x) dx = 1 $

(3) $ for any real number x_1 <x_2, there are P \ {x_1 <X \ le x_2 \}=\ lmoustache _ {x_1} ^ {x_2} f (t) dt $

(4) $ f '(x) = F (x) $ at the continuous point of f (x). If X is a continuous random variable, obviously, $ P \ {x_1 <X \ le x_2 \}= P \ {x_1 \ le X <x_2 \}= P \ {x_1 <X <x_2 \}= P \ {x_1 \ le X \ le x_2 \} $

3. Digital features of random variables 1. mathematical expectation:

Mathematical expectation of discrete random variables:

The Probability Distribution of random variable X is known to be $ P \ {X = x_k \} = P_k, k = 1, 2 ,... $, then $ E (X) = \ sum _ {k = 1} ^ {+ \ infty} x_k P_k $

Mathematical expectation of continuous random variables:

The probability density of the known random variable X is $ f (x) $, and its probability distribution is $ \ int _ {-\ infty} ^ {x} f (t) dt $, then $ E (X) = \ lmoustache _ {-\ infty} ^ {+ \ infty} xf (x) dx $

Expected nature of mathematics:

If X is a random variable and C is a constant, $ E (CX) = CE (X) $

If X and Y are any two random variables, they are: $ E (X \ pm Y) = E (X) \ pm E (Y) $

If the random variables X and Y are independent of each other, $ E (XY) = E (X) E (Y) $

2. Variance:

Set X to a random variable. If the mathematical expectation $ E \ {[X-E (x)] ^ 2 \} $ exists, it is called the variance of X, it is recorded as $ D (X) $, that is, $ D (X) = E \ {[X-E (X)] ^ 2 \} $. $ \ Sqrt {D (x)} $ is the standard deviation or mean variance of random variable X, recorded as $ \ sigma (X) $

Formula for Calculating Variance: $ D (X) = E (X ^ 2)-[E (X)] ^ 2 $

3. Moment, covariance, correlation coefficient

Moment:

Original moment: set X to a random variable. If $ E (X) ^ 2 $, k =,... exists, it is called the k-order original moment of X.

Center Distance: set X to a random variable. If $ E \ {[X-E (X)] ^ k/\} $ exists, it is called the k-order center distance of X.

Covariance:

For random variables X and Y, if $ E \ {[X-E (X)] [Y-E (Y)] \} $ exists, it is called the covariance of X and Y. It is recorded as $ cov (X, Y) $, that is:

$ Cov (X, Y) = E \ {[X-E (X)] [Y-E (Y)] \} $

Apparently, $ X-E (X) and Y-E (Y) $ are two standard deviations in vector representation (standard deviation is inner product ), its physical meaning is to reflect the relationship between the angle between two vectors and Their modulus.

Correlation coefficient:

For random variables X and Y, if $ D (X) D (Y) \ neq 0, \ frac {cov (X, Y )} {\ sqrt {D (X)} \ sqrt {D (Y)} $ is the correlation coefficient between X and Y. It is recorded as $ \ rock _ {XY} $, that is:

$ \ Rho_{ XY }=\ frac {cov (X, Y) }{\ sqrt {D (X) }\sqrt {D (Y) }}$

For the relationship between them and their derivation formula, see: https://blog.csdn.net/dcrmg/article/details/52416832

Iv. Basic concepts of mathematical statistics 1. Basic Concepts

Overall: the total number of indicators X of the objects studied in mathematical statistics is called population.

Sample: If $ X_1, X_2 ,..., x_n $ is independent of each other and is distributed in the same way as X, it is called $ X_1, X_2 ,..., x_n $ is a simple random sample from the population, and n is the sample size. The specific observed values of the sample are $ x_1, x_2 ,..., x_n $ is called the sample value or n Independent observations of population X.

Statistic: $ X_1, X_2,..., X_n $ functions without unknown parameters $ T = T (X_1, X_2,..., Xn) $ are called statistics.

Sample Digital features: If $ X_1, X_2,..., X_n $ is a sample from population X, it is called:

(1) sample mean:

$ \ Overline {X }=\ frac {1} {n} \ sum _ {I = 1} ^ {n} X_ I $

(2) sample variance:

$ S ^ 2 = \ frac {1} {n-1} \ sum _ {I-1} ^ {n} (X_ I-\ overline {X}) ^ 2 $, the sample standard deviation can be indicated by the root number;

(3) sample k-order original moment:

$ A_k = \ frac {1} {n} \ sum _ {I = 1} ^ {n} X _ {I} ^ {k}, k = 1, 2, a_1 = \ overline X $

(4) sample k-order center distance:

$ B _k = \ frac {1} {n} \ sum _ {I = 1} ^ {n} (X_ I-\ overline X) ^ k, k = 1, 2, B _2 = \ frac {n-1} {n} S ^ 2 \ neq S ^ 2 $

Features of sample data:

(1) If overall X has mathematical expectation $ E (X) = \ mu $, then:

$ E (\ overline X) = E (X) = \ mu $

Note: If the mathematical expectation of population X exists, its mathematical expectation is equal to the average value of the sample, that is, the average value of the sample is the unbiased estimator of the population mean.

(2) If population X has variance $ D (X) = \ sigma ^ 2 $, then:

$ E (\ overline X) = E (S ^ 2) = D (X) = \ sigma ^ 2 $

Note: If the variance of population X exists, the variance divided by the sample size is equal to the variance of the sample, and the sample variance is the unbiased estimator of population variance.

(3) average deviation: $ \ frac {\ sqrt {| X-u |}}{ N} $

(4) discrete coefficient: the ratio of standard deviation to the corresponding mean, expressed as a percentage. Used to compare the two sets of data discretization [degree of variation]

V. Parameter [sampling] estimation 1. Theoretical Basis:

Sample estimation is the process of calculating the sample mean, variance, number of components, and other parameters from the population sampling.

Theoretical Basis of sampling inference:

1. law of large numbers: the frequency and the arithmetic mean values of a large number of measured values are stable and independent of individual measured values.

2. Distribution of a large number of random variables is similar to normal distribution. Various Limit Theorems of independent and same distribution are derived here.

2. Parameter Estimation Method

**Point estimation**:

Sample $ X_1, X_2 ,..., x_n $ constructed statistic $ \ hat \ theta (X_1, X_2 ,..., x_n) $ to estimate unknown parameters $ \ theta $ is called point estimation. The statistic $ \ hat \ theta (X_1, X_2 ,..., x_n) $ is called an estimator.

Unbiased estimator:

Set $ \ hat \ theta to the estimator of \ theta $. If $ E (\ hat \ theta) = \ theta $, $ \ hat \ theta = \ hat \ theta (X_1, X_2 ,..., x_n) $ is the unbiased estimator of the unknown parameter $ \ theta $.

Consistent Estimator:

Set $ \ hat \ theta (X_1, X_2 ,..., x_n) $ is the estimated value of $ \ theta $. If $ \ hat \ theta $ converges to $ \ theta $ based on probability, it is called $ \ hat \ theta (X_1, x_2 ,..., x_n) $ is the Consistent Estimator of $ \ theta $.

** It is proved that the sample mean is the unbiased estimator of the overall mathematical expectation:

Known: $ E (\ overline X) = E (X) = \ mu $

Derivation: $ E (X) = E (\ frac {1} {n} \ sum _ {I = 1} ^ {n} X_ I) = \ frac {1} {n} \ sum _ {I = 1} ^ {n} E (X_ I) = \ frac {1} {n} \ sum _ {I = 1} ^ {n} \ mu = \ mu $

** It is proved that the sample variance is the unbiased estimator of the population variance:

Known: $ E (\ overline X) = E (S ^ 2) = D (X) = \ sigma ^ 2 $

Derivation: $ E (S ^ 2) =\frac {1} {n-1} E \ {\ sum _ {I = 1} ^ {n} [(X_ I-\ mu) -(\ overline X-\ mu)] ^ 2 \ }=\ frac {1} {n-1} E \ {\ sum _ {I = 1} ^ {n} [(X_ I-\ mu) ^ 2-2 (X_ I-\ mu) (\ overline X-\ mu) + (\ overline X-\ mu) ^ 2] \ }=\ frac {1} {n-1} E [\ sum _ {I = 1} ^ {n} (X_ I-\ mu) ^ 2-n (\ overline X-\ mu) ^ 2] = \ frac {1} {n-1} [\ sum _ {I = 1} ^ {n} E (X_ I-\ mu) ^ 2-nE (\ overline X-\ mu) ^ 2] = \ frac {1} {n-1} [n \ sigma ^ 2-nD (\ overline X)] = \ sigma ^ 2 $

Average sample error: $ \ mu _ {\ overline x }=\ frac {\ sigma (X) }{\ sqrt {N }}$

**Interval Estimation**: To ensure a certain degree of probability, select a range $ \ delta $, A statistical inference method that estimates the possible range of the Population Index value based on the sample index value and $ \ delta $.

(1) Confidence Interval: Set $ theta to an unknown parameter of population X, X_1, X_2 ,..., x_n is a sample from population X. For the given \ alpha (0 <\ alpha <1) $, if the two statistics meet:

$ P {\ theta_1 <\ theta <\ theta_2} = 1-\ alpha $

The random interval $ (\ theta_1, \ theta_2) $ is the confidence level (or confidence level) of the parameter $ \ theta $. It is $1-

The confidence interval (or interval estimation) of \ alpha $, referred to as $] \ theta's 1-\ alpha confidence interval, \ theta_1 and \ theta_2 are called the lower confidence limit and the upper confidence limit respectively. $

(2) sorting:

Upper and lower limits of the estimated range: $ \ Delta _ {\ overline x }, equivalent to \ frac {\ sigma} {\ sqrt {n} Z _ {\ frac {\ alpha} {2} $ in the first row of the second table below

Confidence Interval: $ [\ overline x \ pm \ Delta _ {\ overline x}] $

Confidence Level $ F (t) = P (| \ overline x-\ overline X | \ le t \ mu _ {\ overline x}) $

T is called the probability level. It has a transformation relationship with the confidence level distribution, as shown in. $ \ Mu _ {\ overline x} $ here is equivalent to $ \ frac {\ sigma} {\ sqrt {n} $ in the first row of the second table below, that is, the population standard deviation.

(3) process of solving interval estimation:

The prerequisites for the first row in the following table are as follows.

Calculate $ \ overline x $ and $ \ frac {sigma }{\ sqrt (n) }$ based on the sample data;

Calculate the probability of a normal distribution table based on the given confidence level

Calculate the estimated interval based on the above formula.

Note: according to the law of large numbers, the distribution of a large number of samples is close to the normal distribution, and various statistics are constructed on the normal distribution to calculate the mean and confidence interval of the variance under the given confidence level.

3. Common statistical sampling distribution and normal population sampling distribution

Chi-square distribution:

Set random variables $ X_1, X_2 ,..., x_n $ is independent of each other and follows the standard normal distribution of N (). It is called the random variable $ \ chi ^ 2 = X_1 ^ 2 + X_2 ^ 2 +... + X_n ^ 2 $ obey the card distribution where the degree of freedom is n, which is recorded as $ \ chi ^ 2 \ sim \ chi ^ 2 (n) $.

Nature:

$ E (\ chi ^ 2) = n, D (\ chi ^ 2) = 2n $

Set $ \ chi_1 ^ 2 \ sim \ chi ^ 2 (n_1), \ chi_2 ^ 2 \ sim \ chi ^ 2 (n_2 ), \ chi_1 ^ 2 and \ chi_2 ^ 2 are independent of each other, then \ chi_1 ^ 2 + \ chi_2 ^ 2 \ sim \ chi ^ 2 (n_1 + n_2) $.

Tdistribution:

Random Variables X and Y are independent of each other, and $ X \ sim N (0, 1), Y \ sim \ chi ^ 2 (n) $, the random variable $ T = \ frac {X} {\ sqrt {Y/n} $ follows the tdistribution where the degree of freedom is n and is recorded as $ T sim t (n) $.

Nature:

The probability density of the tdistribution is an even function, which is very similar to the probability density function of the normal distribution. When n is sufficiently large, the tdistribution is approximately the standard normal distribution.

F distribution:

Random Variables X and Y are independent of each other, and $ X \ sim \ chi ^ 2 (n_1), Y \ sim \ chi ^ 2 (n_2) $, the random variable $ F =\frac {X/n_1} {Y/n_2} $ follows the F distribution where the degree of freedom is $ (n_1, n_2) $, it is recorded as $ F \ sim F (n_1, n_2) $. $ n_1 and n_2 $ are called the first and second Degrees of Freedom respectively.

Nature: its derivative is also F distribution

Measure the role of the three muskeys:

Obviously, a new statistic can be constructed for the mean and variance to conform to the above distribution, so as to carry out interval estimation and subsequent significance tests.

Normal Distribution is generally used to test the distribution of continuous data under a large sample size.

Chi-square distribution is used for the chi-square test of classification variables. F distribution is mostly used for the variance Homogeneity test. The tdistribution is used to test the overall mean of small samples.

Vi. Hypothesis Test

The statistical principle based on the hypothesis test is that small probability events do not occur in one experiment, also known as the small probability principle.

Assume two types of errors: the first type of errors, the rejection is actually true, and the second type of errors, the receipt is actually false.

Significance level: the probability that the first type of error is allowed in the hypothesis test. It is recorded as $ \ alpha (0 <\ alpha <1) $, and $ \ alpha $ is called the significance level, it shows the degree of control for the assumption $ H_0 $. Generally, $ \ alpha takes 0.1, 0.05, 0.01, 0.001 $, and so on.

Significance test: only controls the statistical test of the first error probability $ \ alpha $. It is called significance test.

General steps for significance test:

1) make the original assumptions as required $ H_0 $

2) give the significance level $ \ alpha $

3) determine the test statistic and rejection form

4) Calculate the rejection Domain Based on the probability of making the first type of error $ \ alpha $

5) Calculate the observed value of the test statistic T based on the sample value. When $ t \ in W $, reject the original hypothesis $ H_0 $. Otherwise, the original hypothesis $ H_0 $ is received.

Differences between the test and interval estimation:

Assuming that the test is the opposite of the interval estimation process, it can be considered as a inverse operation.

The interval is used to estimate the mean or confidence interval of the population based on known population parameters and sample parameters. In the first row of the preceding table, assume that the mean value of the sample is $ \ overline x $, sample size n and population variance $ \ sigma ^ 2 (that is, sample variance \ frac {\ sigma ^ 2} {n}) $, and the given confidence level $1-\ alpha $, and the constructed statistic Z follows the standard normal distribution, so we can infer that the confidence interval of the overall mean is the confidence interval of the first row in the table above.

Similarly, it is assumed that the confidence interval of the average or variance of the sample is estimated with known population parameters and sample parameters. In the first row of the table above, the given significance level $ \ alpha $ and the mean, variance, and sample size of the population can be in turn calculated in the above formula $ \ overline x $

Because $ F (t) = P (| \ overline x-\ mu | <t * z _ {\ alpha/2}) $

The two are nothing more than the calculation of $ \ overline and \ mu $. Assume that the table to be tested is the same as the preceding table.

**P value:**

Simply put, that is, the probability value, that is, the probability density of the confidence interval, that is, the significance level $ \ alpha $. The p value usually needs to be converted into a probability degree. For example, if p = 0.05, the upper limit is 1-0.05 = 0.975. The probability density value corresponding to this point is 1.96. The normal distribution function is a probability density function. Therefore, we usually use the z value to directly calculate the probability degree to see if it is between the probability degree of the given P value.

Z value: $ \ frac {\ overline x-\ mu} {\ sqrt {\ sigma/n} $, the endpoint of the confidence interval, P value/significance level. Similarly, other statistical distributions.