Guidance:
Chapter 6 sampling Inference
I. Parameters and statistics
The number of parameters that describe the overall distribution;
A statistical value refers to the number of samples.
For example, if the average age of a class is 22 years, the average age is a parameter of the overall team. 10 students in the class are chosen to learn about their age, the average age of 10 students is 21.5 years, and 21.5 is the statistical value constructed from the sample.
Sampling inference refers to the process in which statistics are used to deduce the population.
2. simple random sampling with replacement conditions (Simple random sampling) Error Calculation
1. Distribution of sample averages
A unit is extracted from a population to form a sample. An average number of samples can be calculated.
The results of numerous selections will generate an average of countless samples, which have their own distribution form. According to the big number theorem, when the sample size exceeds 30, the distribution of the sample average is normal.
2. Basic Formula
The formula for calculating the error of simple random sampling with the condition of sampling with replacement is as follows:
3. Statistical Inference
The features of normal distribution can be used to calculate the probability of falling in each interval. Generally, the interval size can be expressed by a probability.
Query the standard normal distribution table to obtain the correspondence between the following probability degree and probability value.
Probability (t) probability value (P) probability degree (t) probability value (P)
1.28 80% 1 68.27%
1.64 90% 2 95.45%
1.96 95% 3 99.73%
2.58 99%
Evaluate the knowledge of this example:
A sample with a population standard deviation of 100 and an average value of 40 is taken out of 36 units. The average sample range is estimated at a confidence level of 95%.
In this example, we can reverse:
A population standard deviation is 100, and the average number of samples in 36 units is 40. The range of the population average is estimated at the confidence level of 95%.
4. Use the sample standard deviation to deduce the population standard deviation.
In practice, the sample standard deviation can be used as the unbiased estimator of the population standard deviation.
Evaluate the knowledge of this example:
In a batch of materials, spot check on 20 pieces of Measured Weight Values as follows (unit: kg)
110 111 111 112 113 114 114 114
116 116 117 118 119 119 119 119
121 124
Estimate the average weight of this batch of materials and list the confidence intervals at a confidence level of 95%.
5. Simple random sampling error with no replacement Condition
Can be simplified
Iii. Factors affecting sampling error
According to the formula for calculating the sampling error, the main factors that affect the sampling error are as follows:
1. Overall variation of the target
The overall variation degree of the target () is one of the most important factors affecting the sampling error. The larger the overall variation degree, the larger the sampling error under the Determined sample.
2. Sample Size
For a definite population, the main means to reduce the sampling error is to increase the sample size. From the formula, we can know that the sampling error is proportional to the square root of the sample size. To reduce the sampling error by half, the sample size must be increased to four times the original one.
3. Sampling Method
The formula for sampling with and without sampling with replacement is slightly different. If sampling without sampling with replacement is adopted, the sampling error is slightly smaller.
Note that when the sampling ratio () is very small, the error between sampling without sampling with replacement is basically the same, the formula for calculating the error of sampling with replacement can be used to replace the case where sampling with replacement is not performed. In this formula, there is no total unit number N. That is to say, when the ratio is very small, the total unit quantity does not affect the sampling error.
This shows why the sample size is almost the same for large cities and small cities with the same precision.
4. Sample organization.
The samples are organized in the form of pure random sampling, stratified sampling, cluster sampling, or multi-phase sampling. there are corresponding error calculation formulas for each method, the sampling error varies greatly in different situations.
Iv. Sample Size Calculation
1. Formula for Calculating sample size under simple random sampling
The sample size is calculated based on the formula for calculating the error of simple random sampling without replacement.
To make the error range of the sample estimation smaller than a specified one, the value of sample n must satisfy the following formula:
The resulting n is the sample size under simple random sampling.
Consider the following example:
We know that the standard deviation of a population is 100. To reduce the error range of the sample estimation by less than 5 at the confidence level of 95%, we try to obtain the sample size.
2. Estimation Method of Population Standard Deviation
(1) based on past empirical values
For some continuous surveys, it is possible to use past empirical data for estimation.
(2) estimation through trial access
Through the try-on method, first obtain the error data of a few samples, then calculate the final sample size based on the data, and then complete the required sample size.
(3) Maximum Value Method Used in the case of statistical estimation
The maximum variance is 0.25 under the number-based estimation condition, so the maximum variance can be used as the basis for inferring the maximum sample size.
(4) Sequential Sampling Method
The so-called sequential sampling refers to the extraction of samples in sequence, each time, an error is calculated until the expected precision is reached.
V. Calculation of sampling errors for other sampling methods
1. Stratified sampling (Stratified sampling)
From the formula, we can see that the error between layers does not affect the final sampling error. Therefore, stratified sampling should try to make the Inter-layer difference large and the intra-layer difference small.
When the investigation cost at each layer is equal, the optimal distribution of samples is
This allocation formula is called Neyman allocation.
2. Cluster sampling (Cluster Sampling)
3Multi-phase sampling (Multi-Stage Sampling)
The Error Calculation of multi-stage sampling depends on the sampling method of each stage. Taking the simplest two-stage sampling as an example, if the sampling method of each stage is simple random sampling, the scale of the first-stage unit is the same, there are the following formulas:
The ratio of the first stage is the ratio of the second stage.
The variance between the first-order units of the population;
Is the inter-unit variance of the second stage.
4Calculation of design effects
When the factor is less than 1, the sampling design efficiency is higher than SRS.
If a complex sampling factor can be estimated, the sample size corresponding to a simple random sampling with the same precision is: