R Language and significance test study notes

Last Update:2017-10-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, what is the significance of the test

The thought of the significance test is very simple, that is, the small probability event cannot occur. Although we have always stressed that the small probability event must occur in probability theory, the significance test still believes that the small probability event did not occur in the test I did.

The significance test is used to determine whether there is a difference between the experimental treatment group and the control group or the effects of the two different treatments, and whether the difference is significant.

A hypothesis that is to be tested is often referred to as H0, which is called the original hypothesis (or 0 hypothesis), and H0 's contradictory hypothesis is H1, called the alternative hypothesis.

⑴ when the original hypothesis is true, decides to abandon the original hypothesis, called the first kind of error, the probability of its occurrence is usually recorded as α;

⑵ in the original hypothesis, the decision to accept the original hypothesis, called the second type of error, the probability of its occurrence is usually recorded as beta.

usually only the maximum probability α of the first type of error is defined, and the probability of the second type of error is not considered beta. This hypothesis test is also called the significance test, and the probability α is called the significance level.

We commonly used the significance of the test is T-Test, chi-square test, correlation test, etc., in doing these tests, what need to pay attention to?

Two, normality and P-values

T-Test, chi-square test, correlation test Pearson method are based on the hypothesis of normal samples, so at the beginning of the hypothesis test, the general will do the analysis of normality. You can use Shapiro.test () in R. To conduct a normality test. Of course there are many other options available in the Norm.test package.

The P-value is the minimum horizontal value that can be rejected for the original hypothesis.

Three or four important amount of

To synthesize the previous narrative, we know that there are four very important quantities in the study of significance tests: sample size, significance level, efficacy, and effect value.

Sample size: Obviously, the more samples, the more accurate the grasp of the sample is, but given that we can't have an unlimited sample, how many samples can be met? In today's share we can find the answer through R.

Significance level: The probability of committing the first kind of error, which we will agree before doing the test, and finally decide the trade-off according to P value.

Efficacy: This is a quantity that is generally not mentioned in the significance test but is actually very useful. It measures the probability of a real event occurring. In other words, the greater the effectiveness, the less likely the second type of error will occur. Although the significance hypothesis test does not mention it, the important indicator for measuring the quality of hypothesis tests is that the two types of errors are as small as possible.

Effect value: The amount of the effect under the alternative hypothesis

Four, using the PWR package for efficacy analysis

The following functions are available in the PWR package:

Function	Object
Pwr.r.test	Correlation
Pwr.t.test	T test
Pwr.t2n.test	T-Test with different number of samples
Pwr.shisq.test	Chi-Square Inspection
Pwr.p.test	Proportion

Let's describe the usage of some of these functions.

1. T test

Call Format:

Pwr.t.test (n = null, d = null, Sig.level =0.05, power = null, type =c ("Two.sample", "One.sample", "paired"), alternative = C ("two.sided", "less", "greater"))

Parameter description:

N: Sample Size

Statistics of D:T Test

Sig.level: Significance level

Power: Efficacy level

Type: The test types, where two samples are the default, and the sample size is the same

Alternative: The statistical test is two-sided or one-sided, where the default is two-sided

For example: a known sample size of 60, a single sample T test statistics value of 0.2 (this can be removed by t.test (data) $statistic), a significant level of α=0.1, then the effectiveness of how much?

Enter the command in R:

[Plain]View PlainCopy

Pwr.t.test (d=0.2,n=60,sig.level=0.10,type= "One.sample", alternative= "two.sided")

Get results:

One-sample T Test power calculation

n = 60

D = 0.2

Sig.level = 0.1

Power = 0.4555818

Alternative = two.sided

We can see that the probability of making the second kind of error is above 50%, should we believe this result (whether it is rejected or accepted according to P-value)? Obviously not, how many samples will it take to reduce the second type of error to 10%?

In R, enter:

[Plain]View PlainCopy

Pwr.t.test (d=0.2,power=0.9,sig.level=0.10,type= "One.sample", alternative= "two.sided")

Get results:

One-sample T Test power calculation

n = 215.4542

D = 0.2

Sig.level = 0.1

Power = 0.9

Alternative = two.sided

That is, 216 samples can be satisfied with the results, so that the second type of error probability does not exceed 0.1.

For the two samples is similar, we do not repeat, we will introduce another T test case: Two samples are not equal.

Call Format:

Pwr.t2n.test (n1 = null, n2= null, d = null,sig.level = 0.05, power = null, alternative = C ("two.sided", "less", "greater"))

Parameter description:

N1 NUMBEROF observations in the first sample

N2 Numberof observations in the second sample

D effectsize

Sig.level significancelevel (Type I error probability)

Power powerof Test (1 minus Type II error probability)

Alternative Acharacter string specifying the alternative hypothesis, must be one of "two.sided" (default), "Greater" O R "Less"

For example: Two samples of 90, 60, statistics of 0.6, one-side T-Test, α=0.05, as a big indicator.

     Commands in R:

[Plain]View PlainCopy

Pwr.t2n.test (d=0.6,n1=90,n2=60,alternative= "greater")


     Output Result:

T test power calculation

N1 = 90

N2 = 60

D = 0.6

Sig.level = 0.05

Power = 0.9737262

Alternative = Greater

It can be seen that the effect is very large, and α=0.05, we believe the conclusion of this test is very credible.

2. Relevance

The Pwr.r.test () function analyzes the effect of correlation analysis. The format is as follows:

Pwr.r.test (n = null, r = null, Sig.level = 0.05, power = null, alternative = C ("two.sided", "less", "greater"))

Here, unlike the T test, R is a linear correlation coefficient, which can be obtained through COR (DATA1,DATA2), but it is important to note that you do not enter spearman,kendall correlation coefficients, which are relative to the level of measurement.

Assuming that we study the relationship between depression and loneliness, our original hypothesis and alternative assumptions are:

h0:r<0.25 V.s. h1:r>0.25

Assuming a significant level of 0.05, the original hypothesis is not true, we would like to have 90% confidence to reject H0, how much do we need to observe?

The following code gives the answer:

[Plain]View PlainCopy

Pwr.r.test (r=0.25,sig.level=0.05,power=0.9,alt= "greater")

Approximate correlation power calculation (Arctangh transformation)

n = 133.8325

R = 0.25

Sig.level = 0.05

Power = 0.9

Alternative = Greater

Easy to see, need a sample of 134

3. Chi-Square inspection

The original hypothesis is that the variables are independent and the alternative hypothesis is that the variables are not independent. The command is Pwr.chisq.test () and calls the format:

Pwr.chisq.test (w = null, N = NULL, DF = NULL, Sig.level = 0.05, power = null)

     Where w is the effect value, can be calculated by ES.W2, DF is the column-linked table degrees of freedom

Example:

[Plain]View PlainCopy

Prob<-matrix (c (0.225,0.125,0.125,0.125,0.16,0.16,0.04,0.04), nrow=2,byrow=true)
Prob
ES.W2 (Prob)
Pwr.chisq.test (W=ES.W2 (prob), df= (2-1) * (4-1), n=200)

   Output Result:

Chi Squared Power Calculation

W = 0.2558646

N = 200

DF = 3

Sig.level = 0.05

Power = 0.8733222

Note:n is the number of observations

In other words, the probability of the second-class error is about 13% and the result is more credible.

In R there are a lot of packages related to the analysis of efficacy, which we do not introduce are listed as follows:

R Language and significance test study notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

R Language and significance test study notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

R Language and significance test study notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support