R language T Test _ data analysis

Last Update:2018-08-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Self-compiling the R language commonly used data analysis model template, the original file for the RMD format, directly copied paste over, as a personal learning notes to save and share. I. Single sample t-test example 1. T-Test with raw data

It is known that the true value of calcium carbonate in a water sample is 20.7mg/l, and the content of calcium carbonate is repeatedly determined by a method of 12 times. The mean value of the method for determining calcium carbonate content was not significantly different from that of diagnosis and treatment.

x <-C (20.99,20.41,20.10,20.00,20.91,22.60,20.99,20.42,20.90,22.99,23.12,20.89)
t.test (x, alternative = " Greater ", mu = 20.7)

The test result is t=1.5665, the significant P value =0.07276>0.05, accept the original hypothesis, the content of calcium carbonate determined by this method has no significant difference with the whole. Example 2. T-Test without original data

The pulse average of healthy adult males is 72 times/min. A doctor randomly sampled 25 healthy adult men in a mountainous area with a pulse average of 74.2 times/min and a standard deviation of 6.5 times/min. Based on this information, whether the number of healthy adult men in a mountainous area is different from that of healthy adults.

#根据公式算出t值
x <-74.2
mu <-
thita <-6.5
n <-
t <-(X-MU)/(THITA/SQRT (n))   #或者 Using n-1 instead
of n T
#用pt () function, input t value and DOF DF (n-1) to get p value p
<-pt (t,df=24)
p

The test results were t=1.692308, significant p value =0.9482341>0.05, the original hypothesis was accepted, the pulse number of adult men determined by this method was not significant, and the pulse number of healthy adult men in a mountainous area was the same as that of general healthy adult men. Ii. paired sample T-Test

This is a significant test for the known population mean, but sometimes we don't know the overall mean, and medical data is more common in pairs of information, if a group of patients before the treatment of a certain record, after treatment again measured to observe the efficacy, so that the observation of n cases have n on the data, which is a pair of information. If there are two treatments to be compared, divide each specimen into two parts and receive one treatment, so that a batch of data observed is also paired. In medical research, it is sometimes impossible to observe the same objects before and after, instead, the patient should be paired to try to make the same pairs of both in terms of gender, age or other conditions that may affect the effect of the treatment, and then deal with the observation of the reaction, so that many of the data can not be broken into the same information. Because the pair of data can control individual differences to make it smaller, so the test efficiency is high.

In medical research, paired design is commonly used. There are 4 kinds of pairing design: the data before and after the same subject, the data of two parts of the same subject, the same sample was tested by two methods, and the matched two subjects received two kinds of processed data respectively. Example 1. Paired T-Test with raw data

The difference between the simple method and the conventional method for determination of lead in urine was statistically significant, and the results of 12 human urine were measured at the same time by two methods, as shown in the following table, please analyze whether the measurement results of the two methods are different.

#输入两组值
x <-C (2.41,2.90,2.75,2.23,3.67,4.49,5.16,5.45,2.06,1.64,1.06,0.77)
y <-C ( 2.80,3.04,1.88,3.43,3.81,4.00,4.44,5.41,1.24,1.83,1.45,0.92)
#配对样本t检验
t.test (x,y,paired=t)

The result of paired T test is: t=0.16232, significant P value =0.874>0.05, cannot reject original hypothesis H0, it is not considered that the results of two methods for determination of lead content in urine are different. Example 2. Paired T-Test with no original data

The activity of cholinesterase in patients with chronic bronchitis is often high. The Department of Pharmacology of a certain school with the same sex and the same age of patients and healthy people with 8 pairs, measured this value to compare, paired two groups of the difference between the mean value of 0.625, the standard deviation of 0.78, ask whether this information can draw a clearer conclusion.

#依次输入配对样本的差值d, standard deviation s, matching logarithm n
d <-0.625;s <-0.78;n <-8
#算t值
T <-d/(S/SQRT (n))
#输入自由度n -1,pt () function to get P
-value DF <-n-1
P <-pt (T,DF)
t;p

P value is 0.9711069>0.05, can not reject the original hypothesis, the patients with chronic bronchitis and healthy human blood choline esterase activity is different. Iii. two independent sample T-Test

In daily work, we often have to compare the two groups of measurement data there is no significant difference between the average, such as the study of different treatment of antihypertensive effect or two different preparations to kill rats in the effect of the worm. If it is difficult to find people (or animals) who are exactly the same age, sex, etc., then the difference between each pair cannot be obtained, but the average number of each group can be calculated first and then compared. The two groups of cases can be equal or slightly different. The test method is also to assume that both sets of corresponding total mean equals, to see whether the actual difference between the two groups is close to the hypothesis, the difference is regarded as sampling error, far from a certain limit is that the probability of such a large difference caused by sampling error is too small, to reject the assumption and accept H1, make two overall unequal conclusions.

The T-test of group design data is different from single sample t test and paired T test, both of which can eventually convert the variables to be analyzed into one and belong to the same group (that is, no group variables are involved). While the group data analysis is the same variable, but to design the comparison between different groups of variables, the average of the two sets of data to compare the T test.

The T-Test with two small sample mean comparisons has the following application conditions:

1. Two samples from the population are in accordance with the normal distribution, the two samples from the overall variance of the homogeneity. Therefore, we should use the variance homogeneity test to infer the equality of the total variance of the two samples, and the method of homogeneity test using F test, the principle is to see whether the larger sample variance and the smaller sample variance are close to 1. If close to 1, it can be considered that the two samples represent the overall variance of the homogeneity. To determine whether the two samples from the overall compliance with normal distribution, the use of normal test methods.

2. If the two samples from the overall variance is not consistent with the normal distribution, the data in accordance with logarithmic normal distribution can be used for the T-Test, the other data can be used T-Test or rank and test analysis. Example 1. Independent two-sample T-Test with original data

There were two groups of females, which were fed high protein and low protein feed, and recorded the weight gain of each mouse after 8 weeks, asking whether the mean difference between the two groups of animals was significant.

High protein Group 134 146 104 119 124 161 107 83 113 129 97 123

Low protein group 70 118 101 85 107 132 94

High <-C (134, 146, 124, 119, 161, 113,, 129, 123, <-), Low
118 C (all of them) 132 > #方差齐次性检验
x <-C (134, 146, 119, 124, 161, Ace,, 113, 129,, 123,70, 118, Mr
;-Factor (C (Rep (1,12), Rep (2,7))
#bartlett. Test Variance homogeneity Test
bartlett.test (x~a)
#var. Test variance homogeneity
Var.test (x~a)
#levene. Test Variance homogeneity Test (also SPSS default variance homogeneity test method)
library (CAR)
levene.test (x~a)
# The first two are to check the variance of the original data, Levenetest is the residual of the opposite side difference model to test the homogeneity of the group. It is generally considered that the variance of the residual error is the same, so the general statistical software is to do is levenetest

#t检验
t.test (high , low,paired = FALSE)

1. Variance homogeneity test, take var.test variance homogeneity test result, F = 1.0755,p-value = 0.9788>0.05, description two independent sample data variance homogeneity

2. We are concerned that the Chinese side is "equal to" the corresponding T value, t=1.89,p value 0.0757>0.05, do not reject the original hypothesis, can not be considered two groups of female mice weight gain is not equal to 2 cases. Two-sample T-Test with no original data

Measuring the content of calcium carbonate in water in two regions, 20 samples were randomly selected from two regions for calcium carbonate detection, and the average and standard deviation of calcium carbonate content in two regions were obtained, and the results were shown below. Determine whether the content of calcium carbonate in water in two regions is different.

#输入对照组实验组均值x1, x2 Group number n1,n2 Variance s1,s2 x1<-20.95; x2<-21.79; n1<-20; n2<-20; s1<-5.89; s2<-3.43
#计算两独立样本共同的标准差
sc <-sqrt (1/n1+1/n2) * (n1-1) *s1**2+ (n2-1) *s2**2)/(n1+n2-2))
#t值, Freedom df,p value
t <-(x2-x1)/sc
df <-n1+n2-2
P <-pt (T,DF)
t;p

T=0.5511486,p value 0.7076209>0.05, do not reject the original hypothesis, can not be considered two areas of water calcium carbonate content is different

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

R language T Test _ data analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

R language T Test _ data analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support