Introduction
"A more official introduction" Mathematical statistics is a subject based on probability theory and highly applied. It studies how to collect, collate and analyze the data with randomness in an effective way so as to make the correct inference and prediction for the problems examined, and provide the basis and suggestion for the correct decision-making and action. Mathematical statistics is different from the general data statistics, it is more focused on the use of random phenomena of the regularity of the data collection, collation and analysis.
"Simply speaking" is the inference of the whole through sample analysis.
"Meaning or importance" in this big data age, data is very important. How to excavate the internal laws of the data or the implicit information becomes particularly important. We were not able to get the whole data at that time, so we could only deduce the whole law by sampling the sample.
Directory
Chapter I, samples and statistics
First, Introduction:
Ii. General and Sample:
Three, the statistic quantity:
Four, commonly used distribution:
Chapter two, parameter estimation
First, Introduction:
Second, point estimation-- moment estimation method :
Three, point estimation-- maximum likelihood estimation :
Iv. criteria for good quality of estimates
V. Interval estimation--Normal distribution
1. Introduction
2. interval Estimation of single normal general parameters
Interval estimation of 3 and two normal populations
Vi. Interval Estimation-non-normal distribution:
1. Large sample normal approximation method
2, two items distribution
3. Poisson distribution
Chapter III, hypothesis testing
First, Introduction:
Hypothesis test of the mean value of the normal population in two states
1, single normal population N (μ,σ2) mean value μ test
(1) Bilateral inspection h0:μ=μ0;h1:μ≠μ0
(2) Unilateral inspection h0:μ=μ0;h1:μ>μ0
Comparison of 2, two normal population n (μ1,σ12) and N (μ2,σ22) mean values
(1) Bilateral inspection H0: μ1 = μ2; H1: μ1≠μ2
(2) Unilateral inspection H0: μ1 >= μ2; H1: μ1 <μ2
(3) Unilateral inspection H0: μ1 <= μ2; H1: μ1 >μ2
test of the general variance of the normal state
1. χ2 test of the total variance of a single normal state
(1) h0:σ2 =σ02;h1:σ2≠σ02
(2) h0:σ2 =σ02;h1:σ2 >σ02
(3) H0: σ2 ≤σ02; H1: σ2 > σ02 (same as 2.)
2, two positive total variance ratio F-Test
(1). h0:σ12 =σ22;h1:σ12≠σ22.
(2) h0:σ12 =σ22;h1:σ12>σ22
(3) h0:σ12≤σ22;h1:σ12>σ22
Fourth chapter, regression analysis
First, Introduction
Fourth chapter, regression analysis
First, Introduction:
Two types of relationships between variables : In 19th century, the British biologist and statistician Galton study found:
where x indicates the height of the father, Y indicates the height of the adult son (in inches, 1 inches = 2.54 centimeters). This indicates that the average height of the offspring has a mean return to the center, which makes the height of my wife stable for a while. Then the thought of regression analysis penetrated into the other branches of mathematical statistics.
The relationship between variables and variables is dealt with by regression analysis.
There are two common types of relationships between variables: deterministic relationships and related relationships。 Ø the correlation between variables cannot be expressed in a fully exact function form, but
a certain quantitative relation expression in the mean sense, finding this quantitative relation expression is the main task of regression analysis. The regression analysis is the research
Correlation between variablesof a subject. It uses the data obtained by a large number of observations or experiments of variables in objective things to find the relevant relationships hidden behind the data, and to give their expressions.
Estimation of regression function。
two or one USD linear regression1, one-dimensional linear regression model
The relationship between Y and X, called X is an independent variable (predictor), Y is the dependent variable (response variable), Y has a distribution P (y|x) after knowing the X value, and we care about the mean value of y (y|x) :
This is Y's theory regression function for X-the conditional expectation, which is the expression of the related relationship we are looking for. In general, related relationships can be expressed in the following formula:y =f (x) +ε, where ε is a random error, generally assuming ε~n (0,σ2).
Regression analysis is the first choice in the form of regression functions . When there is only one argument, it is usually possible to use the scatter plot method to make a selection.
"Example 1" the strength of the alloy Y (X107PA) and the content of carbon in the alloy x (%) About. To study the relationship between two variables. The first is to collect data, we record the collected data as (Xi,yi), i=1,2, ..., N. In this example, we collect 12 sets of data, listed in table 1
In order to find out the form of regression function that exists in two quantities, we can draw a picture: each logarithm (xi,yi) is regarded as a point in the Cartesian coordinate system, and n points are drawn on the graph, which is called scatter plot, see figure 1
From the scatter plot we find that 12 points are basically near a straight line, which indicates that there is a linear correlation between the two variables, and this correlation can be expressed as
Y =β0+β1x+ ε (2)
This is the data structure of the unary linear regression of the y about X. Usually assume that
E (ε) =0, Var (ε) =σ2 (3)
When an interval estimation or hypothesis test is made for an unknown parameter, it is also necessary to assume that the error is normally distributed, i.e.
Y ~n (β0+β1x, σ2) (4)
Obviously, the assumption (4) is stronger than (3).
Since β0,β1 are unknown, we need to estimate from the data collected (Xi,yi), i=1,2,..., N. When collecting data, we generally require observation to proceed independently, that is, assuming Y1, Y2,..., yn, independent of each other. Combining the above assumptions, we can give the simplest and most commonly used mathematical model of linear regression:
by data (Xi,yi), i=1,2,..., N, you can get an estimate of β0,β1, called
For y, the empirical regression function for x, referred to as the regression equation , is called the regression line . After a given x=x0, it is called a regression value (it is also called a fitting value, a predicted value, on different occasions).
2. Least squares estimation of regression coefficients:
Advanced Mathematical Statistics (IV.)