Difference between T-Test and F-Test _f Test and T-Test

Source: Internet
Author: User

The origin of the 1,t test and F-Test in general, in order to determine the probability of a mistake from the statistical results of samples (sample), we use statistical methods developed by statisticians to perform statistical verification. By comparing the obtained statistical calibration values with the probability distributions of some random variables established by statisticians (probability distribution), we can see how much of the chance will get the current results. If, after comparison, it is found that there is little chance of such a result, that is to say, it arises only when there are few and very few opportunities, then we can confidently say that this is not a coincidence, it is statistically significant (in statistical words, it is possible to reject null hypothesis Hypothesis,ho). On the contrary, it is not uncommon for a comparison to be found to have a high probability of occurrence, and we cannot be very confident that this is not a coincidence, perhaps a coincidence, perhaps not, but we are not sure. The F-value and T-value are the statistical calibration values, and their corresponding probability distributions are F-distributions and T-distributions. Statistical significance (SIG) is the probability that the results of the present sample appear. 2, statistical significance (P-value or sig value) the statistical significance of the results is an estimation method of the true extent of the results (which can represent the population). professionally, the P-value is a diminishing indicator of the reliability of the result, the greater the P-value, the more we cannot think that the correlation of the variables in the sample is a reliable indicator of the correlations among the variables in the population.。 The P-value is the probability that the observed result is considered effective, which has the overall representation of the error. If the p=0.05 hints that the variables in the sample are associated with 5% may be caused by contingency. That is, assuming that there is no association between any of the variables in the population, we repeat similar experiments, we will find that about 20 experiments have an experiment, we study the variable association will be equal to or stronger than our experimental results. (This is not to say that if there is an association between the variables, we can get the same result of 5% or 95% times, when the variables in the population are correlated, the possibility of repeating the study and discovering correlations is related to the statistical effectiveness of the design.)  In many research areas, the P-value of 0.05 is generally considered to be an acceptable false boundary level. 3,t inspection and F-Test the specific content to be verified depends on which statistical procedure you are doing. For example, if you want to test whether two independent sample mean differences can be inferred to the general, the T-test of the line. Two samples (such as a class of boys and girls) the mean of a variable (such as height) is not the same, but whether the difference can be inferred to the general, there is a difference in the overall situation? Will the overall male and female students have no difference, but you are so skillful to draw the number of the 2 samples are different? To do this, we perform a T-Test to calculate a T-Test value. Compare the random variable T distributions established by statisticians with "No differences in general" to see how much of the chance (i.e., the significant sig value) will be present. If the significant sig value is very small, such as <0.05 (less than 5% probability), that is, "if" the overall "really" no difference, then only if the opportunity is very few (5%), very rare circumstances, the current situation will occur. Although there is still a 5% chance of error (1-0.05=5%), we can still "more confident" said: The current sample of the situation (the difference between male and female students) is not a coincidence, is statistically significant, "the overall gender is not the difference" of the void hypothesis should be rejected, in short, there should be differences in general. Each statistical method of the verification of the content is not the same, is also a T-test, it is possible that the above-mentioned verification of the overall existence of differences, but also the same can be a single value in the overall verification is equal to 0 or equal to a certain value. As for the F-Test, the ANOVA (or the analysis of variance, analytical of Variance), its principle is basically said above, but it is through the variance of the view variable. It It is mainly used for: the significance test of the mean difference, the separation of the relevant factors and the estimation of its effect on the total variation, the interaction between analysis factors, homogeneity (equality of the variances) test, etc.。 Relationship between 4,t test and F-Test t test process, is the difference between the two mean (mean) the significance of the test. The T-TestIt is necessary to know whether the variance of the two populations (variances) is equalThe calculation of the T test value will vary depending on whether the variance is equal. In other words, the T test depends on the variance homogeneity (equality of variances) results. Therefore, SPSS in the t-test for equality of means, but also to do Levene ' s test for equality of variances. 1. In the Levene's test for equality of variances column, the F value is 2.36, Sig. 128, indicating variance homogeneity test "no significant difference", that is, two variance qi (Equal variances), Therefore, the following T test results table to see the first row of data, that is the case of homogeneity T test results. 2. In t-test for equality of means, the case of the first row (variances=equal): t=8.892, df=84, 2-tail sig=.000, Mean difference=22.99 since sig=. 000, that is, two sample mean difference has significant significance! 3. To see which Levene's Test for equality of variances column in the sig, or see t-test for equality of means in the sig. (2-tailed) Ah? The answer is: two to see. First look at Levene's test for equality of variances, if the variance homogeneity test "there is no significant difference", that is, two variance qi (Equal variances), so the result table of the T test to see the first row of data, That is, the result of the T test in the case of homogeneity. Conversely, if the variance homogeneity test "there is a significant difference", that is, two variance is not homogeneous (unequal variances), so the results of the T-Test table to see the second row of data, that is, variance is not the case of the T test results. 4. You do a T test, why do you have an F value? Just because you want to evaluate whether the variance of two population (variances) is equal, do Levene's test for equality of variances, to test the variance, so there is an F value。 The relationship between T-Test and F-Test is another explanation: T-Test has single-Sample t-Test, paired T-Test and two-sample T- test。 Single Sample T test: is represented by the sample mean unknown population mean and known population meanTo observe this group of samples and the overalldifference of。 Paired T test: The following cases were observed by pairing design method, 1, two homogeneous subjects were accepted two different types of2, accepted by the same subjects two different types ofTreatment of 3, the same subject Front and rear。The F-Test is also called the variance homogeneity test .。 The F-test is used in the two-sample T-Test. in order to compare the two samples randomly, we should first determine whether the two population variances are the same, that is , the variance homogeneity. If the variance of the two populations is equal, the T -Test or variable transformation or rank and test can be used directly. to determine whether the two population variances are equal, the F test can be used. In the case of a single set of designs, a standard value or a general mean must be given, and a set of quantitative observations are provided,The precondition of applying t test is that the group data must obey the normal distribution.; if PairingDesign, the difference between each pair of data must obey Normal Statethe distribution; into GroupsDesign, the individual between each other Independent, both groups of data were taken from the general distribution and the homogeneity of variance was satisfied. These prerequisites are required because the T-Statistic must be calculated under such a premise, and T-Test is based on the T-distribution as the theoretical basis of the test method.in short, the practical t-test is conditional, one of which is to conform to the homogeneity of variance, which requires an F-test to verify. 1, Q: What is the degree of freedom? How to determine? A: (definition) The number of independent sample observations or samples of free movement that constitute a sample statistic number of observed values。 expressed in DF. The setting of degrees of freedom is based on the reason that, when the population averages are unknown, the calculation of the dispersion using a sample average (small s) is constrained by the fact that the standard deviation (small s) must first be known, and the sum of the data is a constant when the average of the samples and n are known. Therefore, the "last" sample data can not be changed, because if it changes, the sum is changed, and this is not allowed.    As for some degrees of freedom is n-2 or something, is the same reason.    When a statistic is calculated as an estimate, the introduction of a statistic loses one degree of freedom. Popular Point said that there are 50 people in a class, we know that their Chinese scores on average divided into 80, now only need to know the results of 49 people can infer the results of the remaining person.    You can quote 49 people, but the last one you can't lie about, because the average is fixed, the degree of freedom is one less. The simple point is like you have 100 pieces, which is fixed, known, if you are going to buy five things, then the first four pieces you can buy whatever you want, as long as you have money, such as you can eat KFC can buy pens, can buy clothes, these flowers to the amount of money, when you left only 2 dollars, Perhaps you can only buy a bottle of Coke, of course, but also to buy a meat floss omelet, but no matter how to spend, you have only two dollars, and this you spend 98 dollars at that time has been settled. (This example is really good!!)    2. Q: The question of freedom in X-square test answer: In the normal distribution test, here the M (three statistics) is n (total), the average and the standard deviation.    Because we are doing the normal test, we need to use the mean and standard deviation to determine the normal distribution pattern, in addition, to calculate the theoretical times of each interval, we also have to use to N. Therefore, in the normal distribution test, the degree of freedom is K-3. (This one is more special, remember!)    In the overall distribution of the coordination degree test, the degree of freedom is K-1. In the independent test and homogeneity test of the cross-table, the degree of freedom is (R-1) x (c-1). 3. Q: What is the difference between T test and variance analysis?The t test is suitable for the difference test between the two variables, and the variance analysis is used to compare the mean of more than two variables. The T-Test used to compare the mean can be divided into three categories, the first of which is for Single GroupDesign quantitative data; the second type is for PairingDesign quantitative data; The third category is for the design of quantitative data for groups. The difference between the latter two design types is whether the two groups of subjects are paired in a way that is similar in character to one or several aspects. No matter what type of T test, it is reasonable to apply the application under certain preconditions. If a single group of design, must give a standard value or the overall mean, at the same time, provide a set of quantitative observations, the application of T test is the precondition is that the group of data must obey the normal distribution; if paired design, each pair of data must obey the normal distribution, if the group design, the individual is independent of each other, Both groups of data were taken from the general distribution and the homogeneity of variance was satisfied. These prerequisites are required because the T-Statistic must be calculated under such a premise, and T-Test is based on the T-distribution as the theoretical basis of the test method. It is important to note thatThe precondition of variance analysis and T test of group design is the same, that is, normality and variance homogeneity。 T-Test is the most frequently used method in medical research, and the most common way to deal with quantitative data in medical papers is hypothesis test. T test has been so widely used, the reasons are as follows: The existing medical periodicals have made more statistical demands, the research conclusions need statistical support; The traditional medical statistics teaching has introduced T-Test as an introductory method of hypothesis testing, which makes it become the most familiar method for the general medical researchers. The T test method is simple and the result is easy to explain. Simplicity, familiarity and external requirements have led to the popularity of T-Test. However, because some people understand this method is not comprehensive, leading to a lot of problems in the application process, some even very serious errors, directly affect the reliability of the conclusion. The classification of these questions can be broadly summarized in the following two cases: without considering the application of T-Test, the comparison of the two groups with T-Test, the various types of experimental design are regarded as multiple single-factor two-level design, multiple 22 comparison between mean values with T-Test。 In both cases, the risk of concluding a false conclusion is increased to varying degrees. Moreover, in the number of experimental factors greater than or equal to 2 o'clock, it is impossible to study the interaction between the experimental factors of the size. Q: Statistical significance (P-value) A: the statistical significance of the results is an estimation method of the true extent of the results (which can represent the population). Professionally, the P-value is a diminishing indicator of the confidence of the result, the greater the P-value, the more we cannot think that the correlation of the variables in the sample is a reliable indicator of the correlations among the variables in the population. The P-value is the probability that the observed result is considered effective, which has the overall representation of the error. Such as p=0.05 hint that the variables in the sample are associated with 5% may be caused by contingency. Which assumes that there is no association between any of the variables in the population, we repeat similar experiments, we will find that about 20 experiments have one experiment, we study the variable association will be equal to or stronger than our experimental results. (This is not to say that if there is an association between the variables, we can get the same result of 5% or 95% times, when the variables in the population are correlated, the possibility of repeating the study and discovering correlations is related to the statistical effectiveness of the design.) In many research areas, the P-value of 0.05 is generally considered to be an acceptable false boundary level.  4, Q: How to determine the result has a real significance a: In the final conclusion, judging what the significance of the level is statistically significant, inevitably with arbitrariness. In other words, the choice of a level that is rejected as an invalid result is arbitrary. In practice, the final decision usually depends on the data set comparison and analysis process of the result is a priori or only for the 22 > Comparison between the mean, depending on the overall data set the consensus of the number of supporting evidence, relying on previous practice in the field of research. Often, the results of P-values in many scientific fields are ≤0.05 considered to be statistically significant, but this significant level also contains a fairly high probability of error. Results 0.05≥p>0.01 was considered to be statistically significant, while 0.01≥p≥0.001 was considered highly statistically significant. However, it should be noted that this classification is only an informal judgment rule based on research.  5, Q: All the test statistics are normally distributed? A: not entirely, but most tests are directly or indirectly related and can be deduced from the normal distribution, such as T-Test, F-Test, or chi-square test. These tests are generally required: the analyzed variables are normally distributed in general, which satisfies the so-called normal hypothesis. Many of the observed variables are normally distributed, which is why the normal distribution is the basic feature of the real world. The problem arises when people use a test that is established on the basis of a normal distribution to analyze data from non-normal-distribution variables (see normality tests for nonparametric and variance analysis). There are two methods under this condition: one is to use an alternative nonparametric test (i.e. no distribution test), but this method is inconvenient, because from the conclusion it provides, this method is inefficient and inflexible. Another approach is to use a test based on a normal distribution if the sample size is determined to be large enough. The latter method is based on a rather important principle, which plays an important role in the general test on the basis of the normal equation. That is, as the sample size increases, the sample distribution shape tends to be normal, even though the distribution of the variables studied is not normal.  6, Q: The connotation and steps of hypothesis testing a: In hypothesis testing, because of randomness we may make two kinds of mistakes in decision-making, one is the assumption is correct, but we reject the assumption that such errors are "Discard true" errors, known as the first kind of error, the class is incorrect, but we do not refuse to assume that such errors are " Pseudo "error, referred to as the second type of error。 In general, in the case of a sample determination, any decision cannot simultaneously avoid the occurrence of two types of errors, that is, to avoid the probability of the first type of error, while increasing the probability of the second type of error, or to avoid the probability of the second type of error, but also increase the probability of the first type of error occurs. People often choose to control that type of error as needed to reduce the chance of such errors. In most cases, people control the probability that the first category of errors will occur.      the probability that the first type of error occurs is called the significance level, which is generally expressed in α, and when a hypothesis test is performed, the probability of the first type of error occurring is controlled by giving a value of the significant level alpha beforehand. In this premise, the hypothesis test is carried out according to the following procedure:    1), the determination hypothesis;    2), carries on the sampling, obtains the certain data;    3), according to the hypothesis condition, Construct the test statistic and calculate the specific value of the test statistic in this sample according to the sampled data;    4), determine the reject domain and its critical value according to the sampling distribution of the test statistic, and the given significance level;     5), comparing the value of the test statistic in this sample with the size of the critical value, if the value of the test statistic is within the reject domain, then the hypothesis is rejected; to this step, the hypothesis test is basically done, but since the test is to control the probability of the error by using a method of pre-given significant level, So for the hypothesis test, which is similar to the two data, we cannot know that the hypothesis is more apt to err, that is, we can only know the maximum probability of the first kind of error based on this sample (that is, given the significance level), and we cannot know the exact probability level of the mistake. The calculated P-value solves this problem effectively, and the P-value is actually a probability value calculated according to the sampling distribution, which is calculated from the test statistic. By directly comparing the P-value with the given significance level α, it is possible to know whether to reject the hypothesis, which obviously replaces the method of comparing the value of the test statistic with the size of the critical value. And by this method, we can also know the actual probability of making the first kind of error in the case of P-value less than α, p=0.03<α=0.05, then the refusal hypothesis, the probability that this decision may err is 0.03. It should be noted that if p>α, then the assumption is not rejected, in which case the first type of error does not occur.  7, Q: Chi-square test results, the value of the bigger the better, or the smaller the better? A: As with other tests, the larger the calculated statistic, the smaller the probability value, the closer the distribution is to the tail end. If the test design is reasonable, the data is correct, significant or not significant are objectively reflected. Didn'tThere is nothing good or bad.  8, Q: What is the difference between the T-Test and the related sample test for paired samples? A: Paired samples have homologous pairings (such as twins in an animal experiment), conditional pairing (such as the same environment), self-pairing (e.g. before and after the drug in a medical experiment), etc. (as if there is no clear explanation, what is the difference between asking this question?)   9, Q: What is the difference between the two distributions and the Chi-square test when comparing the rates of two sets of data? A: Chi-square distribution is mainly used for multi-group comparison, is to examine the total number of research objects and a category group of the observed frequency and the expected frequency is significantly different, requires that the if number of each block is not less than 5, if less than 5 merge adjacent groups. Two distributions do not have this requirement. If there are only two classes in the classification, it is better to take two Tests. If the 2*2 table can be accurately tested with Fisher, it works better under small samples. &NBSP;10, Q: How to compare the difference between two sets of data a: answer from four aspects,    1). Design type is completely random design two sets of data comparison, do not know whether the data is a continuous variable?     2). Comparison method: If the data is continuous data, and the two sets of data obey normal distribution & homogeneity (Variance homogeneity test), then the T test can be used, if not obey the above conditions can be used rank and test.     3). Do you want to know if there are any significant differences between the two sets of data? Do not know what this obvious difference means? Whether the difference is statistically significant (that is, the probability of the difference) or the two population mean difference in which range of fluctuations? If the former, you can use the 2nd step can get the P value, if the latter, it is the mean difference between the confidence interval to complete. Of course, the results of both can be obtained in SPSS. &NBSP;11, Q: The relationship and difference between regression analysis and related correlation A: The main links are: regression analysis and correlation analysis are complementary and closely related, the correlation analysis needs to be regressive to indicate the specific form of the phenomenon quantity relation, and the regression analysis should be based on the analysis.

The main differences are:

First, in the regression analysis, not only according to the position of the variable, the role of different from variable and dependent variable, the dependent variable is placed in the special status of interpretation, and the dependent variable is a random variable, while the total assumption that the independent variable is a non-random controllable variable. in the correlation analysis, the position between the variables is completely equal , not only the independent variable and the dependent variable, but also the related variables are all random variables.

Secondly, the correlation analysis is limited to describe the relationship between variables , and the quantitative relationship between the variables cannot be clearly reflected. The regression analysis can not only quantitatively reveal the influence size of the independent variable, but also can predict and control the variable value by the regression equation.

Correlation analysis and regression analyses are all the methods to study the correlation between 2 or more variables, but there are some essential differences between the 2 methods of mathematical statistics, that is, they are used for different research purposes.

The purpose of the correlation analysis is to test the common variation of the two random variables (i.e. the degree of common change), and the purpose of regression analyses is to try to predict the value of the dependent variable with the independent variable.

In correlation analysis, two variables must both be random variables , if one of the variables is not a random variable, it can not be analyzed, which is determined by the relevant analytical method itself.

For regression analysis, the dependent variable is definitely a random variable (which is determined by the regression analysis method itself), whereas the argument can be a normal variable (with a definite value) or a random variable.

For the difference between the two, I would like to be easily understood by the following analogy: for the relationship, the relationship can only know that they are lovers relationship, as to them who is the leader, who is the dominant, who is the follower, a sneeze, the other will react, the correlation is not competent, Regression analysis is a good solution to this problem.regression is not necessarily a causal relationship .。 There are two main regression: one is the explanation, the first is the prediction。 is to predict unknown dependent variables with known self-variation. Correlation coefficients, mainly in the understanding of two variables of the common change situation.      If there is a causal relationship, a path analysis is usually performed or a linear structural relationship pattern. I think we should look at this, we do regression analysis is based on a certain theory and intuition, through the relationship between the number of independent variables and dependent variables to explore whether there is a causal relationship. The man upstairs said, "The return is not necessarily a causal relationship ...    If there is a causal relationship, usually a path analysis or linear structure relationship pattern "is a bit debatable, in fact, regression analysis can be seen as a special case of linear structural relationship patterns." I think that the return is to explore the causal relationship is correct, because in fact, in the end we are not completely based on statistical results to determine causality, only in the statistical results and theory and reality on the basis of a more consistent with the cause we are certain of this causal relationship. Any statistical method is just a tool, but it is not entirely dependent on the tool. Even if it is SEM, we can not say that fully determine its accuracy, because even if the method is good, but the complex relationship of variables presented in a variety of ways, perhaps statistics can only tell you a direction of the optimal solution, may not be the most practical, not to mention the quality of the sample data will also make the results do not conform to the facts,  This leads to doubts about the accuracy of statistical methods. Statistics only describe statistical associations. Does not prove a factor relationship. regression has a causal relationship, the correlation is not necessarily. regression analysis is a statistical method to deal with the linear dependence between two and more than two variables。 Such problems are common, such as the content of a metal element in human hair and the content of the elements in the blood, human body surface area is related to height, weight, and so on. regression analysis is the mathematical relationship used to illustrate this dependent change.。 The existence of any thing is not isolated, but is interrelated, mutual restraint. Height and body weight, body temperature and pulse, age and blood pressure have some connections.The relationship between objective things and the relative degree of correlation with appropriate statistical indicators, the process is related to correlation analysis.Transferred from: http://www.cdadata.com/9116

Difference between T-Test and F-Test _f Test and T-Test

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.