In 2008, during the modeling competition organized by the National Bureau of Statistics, the bookstore shelves of "Women's tea (The Lady tasting Tea) ――20 century statistics How to change the science" aroused our concern, "Women's Tea" title of the previous statistical books of the dull and gloomy, innovative. After a hasty turn, he bought it. "Tea for Women" is not a female book, nor is it a book devoted to tea, but a popular science books of the history of statistics in the 20th century, and the subtitle of this book can be known. Why did the author take such a name? Its ingenious idea is amazing, the original "Ladies Tea" is a statistical history of very well-known statistical experiments, and is presided over by the famous Fisher (Fisher). The book "Tea for Women" begins with an early statistical experiment of "women's Tea", detailing the history of the birth and development of statistics for more than a century, and through some interesting statistical stories, a brief introduction to the reader of various fields of statistics in a number of fascinating figures. But it is not the academic analysis that makes this book a classic, but the uniqueness and breadth of its vision.
The preferred reader for the translator, Mr. Chu Dong, is: Students, postgraduates, teachers and researchers of statistics majors. Second, the cultural heritage of scientific development interested in the readers of all walks of life, the location of the hierarchy so wide disparity? It seems that the author and the translator each have different living conditions, as the author said: "The statistical perspective is so widely used that the basic hypothesis has become part of the popular culture of the Western world, standing there like a a clay Buddha, complacent." "So this book is classified as" popular science books ".
I. Statistical experiments on "women's Tea"
In the late the 1920s, a summer afternoon in Cambridge, England, a group of gentlemen and ladies were enjoying a leisurely cup of tea, and a woman claimed that the tea was added to the milk, and the milk was added to the tea, and the two methods had different tastes. How is it possible that the scientific elites present here scoff at this, and that only a small, thick-rimmed gentleman-Fisher proposed to test the lady's hypothesis with an experiment.
"Tea for Women" takes the tea-drinking British lady as the starting point, leading to the modern mathematical statistics of the Pathfinder-Fisher, and Fisher to solve similar problems and invented the experimental design method. On the basis of reviewing the development process and application of several important theories of statistics, the readers are guided to appreciate the most widely used science of statistics and what kind of changes have been brought to the present world.
Second, about "Tea for women"
Salsberg, author of the book, said: "The main line I have chosen throughout the 20th century statistical complex theory is different from others." I hope that after reading this book can be inspired to further understand the connotation of the statistical revolution. "As a statistical book, averages, standard deviations, estimates, probability distributions, random variables, confidence intervals, large number laws, central limit theorems, normal probability distribution random variables and so on a series of concepts and terminology, is stretching, but unlike other books, these concepts and terminology behind, is a statistical master of the vivid image, is a section of their exploration of innovation, experienced bumpy life story. The story is interspersed with the wise words of the Masters, the feelings of friendship, the details of humor, the misfortune of individuality .... Read the story of the Masters of Statistics, their rich, knowledgeable, wide range of research, can be said all-encompassing, no wonder the subtitle of the book Ask: "20th century statistics how to change the science."
The book has counted the representative figures and deeds that participated in this scientific change in the 20th century, they have studied mathematics, have studied skiing, have studied genetics ... There are entomological scientists, cryptographic analysts, physicists, pharmacists, computer engineers ... Their life background is complex, the research topic is broad, the idea is diversified, among them "Mozart of Mathematics"-Kolmogorov of young genius who lays the foundation for probability theory, "Picasso of Statistics"-the EOG of the style of work ... Let us see how statistics have changed our perceptions of nature, humanity, and society in the Hundred Years of the 20th century, while expressing deep respect for those who dare to discover and innovate.
The book also depicts the dealing things of Fisher on the farm, analysing the relationship between crops and climate, rainfall, pesticides and fertilizers, as well as improving agricultural production while publishing a series of world-famous papers on the statistical methods of research workers. Mr. Gossett in a chemical company such as the Guinness Brewing Company, the application of the Poisson distribution in real life and the new concept of statistical distribution is solved by solving the problem of measuring the amount of yeast used in malt fermentation. Tippett in the Cotton Industry Research Association, to find the most vulnerable fiber strength, found that the ultimate degree of the most fragile fiber strength, increase the yield of cotton. The book also mentions the use of statistics in medicine to find the dose of the same drug to the human response; statistics are used in the Second World War to measure what kind of gas is affecting the enemy; statistics are used to decipher passwords, use logic and mathematical models to solve the best use of a long-range bomber against submarines, and to solve the army's food supply problems. In the aftermath of the war, operational research, derived from mathematical statistics, has been applied commercially, addressing such business issues as identifying the optimal relationship between the warehouse and the sales department, Balancing limited resources, improving production and increasing output.
Reading "Tea for women" is like walking in the Milky Way of a star-studded galaxy of knowledge. Throughout this book, the Legends and stories of the Masters of the Mathematical Statistics world are introduced. In contrast to these "big shots", they find that they have a common denominator in addition to their lifetime of excellence and a great contribution to the world, and that they are pursuing their own interests throughout their lives and have made extraordinary achievements. Some have shown extraordinary talents in mathematical statistics since childhood, as Russian mathematician Andre Kolmogorov his first mathematical discovery at the age of 5, and at the age of 10, the 2 square root was an irrational number and a solution to the problem of the "Pell equation". There was also the day after tomorrow to become interested in this area, such as Fisher's son-in-law George Box in the Chemical Defense Laboratory in the Second World War and the need to start contact, study, and finally like statistics, and for a lifelong career; Bliss initially interested in biology, because of the pesticide experiments, However, by many uncontrollable variables, after learning Fisher's "Statistical methods of researchers", a strong interest in mathematical statistics, based on the invention of a "probability" analysis method.
III. statistical test of the model
"The Women's Tea" "Hypothesis Test" mentions: "Pearson often uses his chi-square goodness-of-fit test to ' prove ' that certain materials conform to certain distributions. After Fisher introduced more precise methods into mathematical statistics, Pearson's approach was no longer acceptable. But the problem still exists. In order to know what parameters should be estimated, in order to determine the relationship between these parameters and the scientific problem studied, we must assume that the data conforms to a particular distribution. Statisticians often use significant tests to prove the distribution of data. ”
1. Summary of statistical tests
The statistical tests we have to carry out include two aspects, on the one hand to test the regression equation on the sample data fitting degree, through the analysis of the parameters of the regression, on the other hand, test the significance of the regressive equation, through hypothesis test on the model between the explanatory variable and the linear relationship between the explanatory variables in the overall existence of a significant inference, Including the test of linear relation of regression equation and the significance of regression coefficient.
In the actual operation, because of manpower, material resources, time and other problems, generally using a sample sampling method to extract a certain number of representative groups, to obtain sample data to carry out research, and the overall characteristics of the statistical inference, there will be two questions, one is the characteristics of the sample can reflect the overall characteristics? The second is the number of two different samples of the flag parameter is there a difference? Only by solving these two problems can we correctly infer the overall characteristics, and then we can find out the difference of the demand of different characteristic groups, which needs to be solved by the statistical significance test.
"Tea for women" mentioned that the significance of the test is used to determine the sample and sample, the difference between the sample and the population is caused by sampling errors or essential differences caused by statistical inference method. The results of estimating the overall pointer with sample pointers are completely reliable, and some have only different degree of reliability, which need to be tested and confirmed. By examining whether there is any difference between the sample pointer and the hypothetical overall indicator, whether to accept the original hypothesis, to analyze whether there is significant difference between the sample pointer and the overall pointer. In this sense, hypothesis testing is also called the significance test. Hypothesis testing is another important statistical inference problem in addition to parameter estimation. Its basic idea can be explained by the principle of small probability. The so-called small probability principle is that small probability events are almost impossible in a single experiment. In other words, the fact that an assumption of the whole is real, then an event a that is detrimental or cannot support this hypothesis is almost impossible in one experiment; if event a happens in one experiment, we have reason to doubt the authenticity of this hypothesis and reject it.
2. Statistical test of multivariate linear regression model
From the "Women's tea" linked to how to analyze the statistical test results, determine whether the equation is established, whether the model passed the test. When we import the dependent variable and several independent variable data into the statistic analysis software, in the output result, have R value, T value, f statistic, etc., how do we define whether pass test, the regression equation is feasible? Is it possible to analyze data or make predictions? Now on the study of "Women's tea" experience and engaged in statistical analysis of the actual writing and the interest in mathematical statistics, the various types of statistical testing to more popular expression categorized as follows:
1. Test statistics of goodness of fit: R2
The R2 of the goodness of fit is tested by the determined coefficient, the closer the 0﹤R2﹤1,R2 is to 1, the higher the goodness of the model.
2. Probability: P-value
Fisher uses the significance test to produce a number that he calls P-value. The P-value is a probability that is used to indicate the probability that a hypothesis is false. This is a calculated probability, which is a probability associated with the observed data under the assumption that the 0 hypothesis is true. In many cases, the purpose of the hypothesis test is to overturn the 0 hypothesis. For example, suppose we want to test a new drug that can prevent the recurrence of breast cancer in women undergoing mastectomy. We compare the effect of this medicine with a placebo. At this point the 0 hypothesis is that the new drug is no better than a placebo. It is now assumed that after 5 years, half of the women with a placebo have relapsed, but the new drug has not recurred at all. We observe that the P-value at this point, if small, is close to 0, can reject the original hypothesis, "The new drug is not better than a placebo" hypothesis is not established to prove that the new drug "effective".
3. Test of the significance of the equation: F-Test
The general significance test of the equation looks at f-Statistic, the statistic F obeys the degree of Freedom (k,n-k-1) (k independent variable, n Group of data) F distribution, therefore, given the significance level a, the table is Fa (k,n-k-1), according to statistical analysis software to obtain F value, can be obtained by F>FA (k,n-k-1) (or F<FA (k,n-k-1) to reject or accept the original hypothesis, rejecting the original hypothesis, that is, the equation is generally linear significant, the model through the F-Test, accept the original hypothesis, the overall linearity of the equation is not significant, the model does not pass F test.
4. The significance of the variable test: t test
For multivariate linear regression models, if the overall linear relationship of the equations is significant, it does not mean that each explanatory variable has a significant effect on the interpreted variable, so a significant test must be made of each explanatory variable to determine whether it is retained in the model as an explanatory variable. If the effect of a variable on the interpreted variable is not significant, it should be eliminated to create a simpler model, which we can do with the T-Test.
Given a significant level A, look up the TA/2 (k,n-k-1), according to the statistical analysis software to get t value, can be |t|> TA/2 (n-k-1) (or t< TA/2 (k,n-k-1) to reject or accept the original hypothesis, the T value of the variable by comparison, Determined whether to reject or retain. Of course, the actual application, there is no absolute significance level, the key is still to examine whether the variable in economic relations on the explanatory variables have an impact.
In short, in the practical application of statistical analysis, the flexible use of statistical testing is the key. Statistical model to pass the test has many, but can be done by the data preprocessing, such as standardization, normalization, or another angle analysis, you can get different test results and through different tests.
Looking at the history of the development of statistics science from the "Tea for women", it is a history that scientists are constantly appearing, inexhaustible to the exploration of statistical science, rising, success and failure, the author strung them into beautiful necklaces such as pearls concatenating, so that the laurel of statistical science sparkles with the light of life and inspiration. From the early "Tea for women" test development to the present statistical testing, all permeated with the efforts and wisdom of statisticians.
"When we entered the 21st century, the statistical revolution won in the field of science ... And in one of the hidden corners of the future, another scientific revolution is being nurtured, and the men and women who are about to launch this revolution may be living among us. To conclude by quoting the concluding remarks of the ladies ' tea, try to give each of us a statistic to find some excitement, a responsibility, and a new hope.
Reference documents:
1, "Women's Tea" Salsberg Chu Dong translation of China Statistics Press 2004;
2, "Let a person in front of a bright book" read "Ladies Tea" essay Han Tepping China Statistics 2005 fifth.
"Tea for Women" and statistical examination