The stat2.3x inference (statistical inference) course was taught at the EdX platform by the University of California, Berkeley (University of California, Berkeley) in 2014.
Download PDF Note (academia.edu)
Summary
- Test of Hypotheses $$\text{null}: h_0$$ $$\text{alternative}: h_a$$ Assuming the Null is true, the chance of Gett ing data like the "The data in the" or even more like the alternative:p-value. If $P $ is small (i.e. cutoff), choose the alternative. Otherwise, stay with the null.
- Significance level and Power
- Significance level was the probability, under $H _0$, that the test concludes $H _a$ error probability, should be small.
- Power is the probability, under $H _a$, and that the test concludes $H _a$ probability of correct conclusion, should be large.
ADDITIONAL practice Problems for EXERCISE SET 2
Problem 1
To test whether "red" comes-18/38 of the time in spins of a roulette wheel, the wheel is spun 3800 times; The result is the "red" 1720 times. is the wheel biased against "red"? Answer the question in the following steps:
A) formulate null and alternative hypotheses about p, the chance with which the wheel shows "red".
b) Calculate an exact p-value or its normal approximation.
c) State the conclusion of the test.
Solution
A) $$\text{null}: p=\frac{18}{38}$$ $$\text{alternative}: P < \frac{18}{38}$$
b) Binomial distribution (exact) $n =3800, p=\frac{18}{38}, k=0:1720$: $ $P (P < \frac{18}{38}) =\sum_{k=0}^{1720}c_{ 3800}^{k}\cdot P^k\cdot (1-p) ^{3800-k}=0.004871166$ $R code:
Sum (Dbinom (0:1720, 3800, 18/38)) [1] 0.004871166
Normal Distribution (approximate): $\mu=n\cdot p=1800, \sigma=\sqrt{n\cdot P\cdot (1-p)}$: $ $Z =\frac{1720.5-\mu}{\ Sigma}\rightarrow p (P < \frac{18}{38}) =0.004898679$$ R code:
n = 3800; p = 18/38; MU = n * p; Sigma = sqrt (n * p * (1-p)) z = (1720.5-MU)/Sigmapnorm (z) [1] 0.004898679
c) $P $ is very small so reject Null, which is, the wheel is biased against "red".
Problem 2
In a "blind taste test" during a nationally televised football game with each of the "loyal Budweiser drinkers" was given Unmarked beer containers and asked to say which one they liked better. One of the containers had Budweiser and the other had Schlitz. Of the participants, said they liked the Schlitz better. Schlitz said this is an impressive showing. But maybe the subjects just couldn ' t tell one beer from another. Test whether the results is or aren ' t like tossing a coin, by providing:
A) The null and alternative hypotheses
b) The P-value
c) The conclusion of the test
Solution
A) $$\text{null}: p=0.5$$ $$\text{alternative}: p\neq0.5$$
b) Binomial distribution: $$\sum_{i=0}^{46}c_{100}^{i}\cdot P^i\cdot (1-p) ^{100-i}+\sum_{j=54}^{100}c_{100}^{j}\ CDOT P^j\cdot (1-p) ^{100-j}=0.4841184$$ R code:
Sum (dbinom (0:46, 0.5)) + SUM (dbinom (54:100, 100, 0.5)) [1] 0.4841184
Normal approximation: $$\mu=100\times0.5=50, \sigma=\sqrt{100\times0.5\times0.5}=5$$ $$\rightarrow Z=\frac{46.5-\mu }{\sigma}, p=0.4839273$$ R code:
n = 100; p = 0.5;MU = n * p; Sigma = sqrt (n * p * (1-p)) z = (46.5-MU)/sigma2 * Pnorm (z) [1] 0.4839273
c) $P $ is huge so reject alternative, which is, the results was look like tossing a coin.
Problem 3
I has a bag of M&m ' s (known as Smarties in some countries; students who recognize neither should Of them as colored pieces of candy). I think of them is red, and my friend thinks that more than is red. In order to decide between these and hypotheses, we is going to take a simple random sample of of the M&m's from the bag. If more than M&m's in the sample is red, we ll choose my friend's hypothesis, and otherwise we'll choose mine. A) state the null hypothesis, which is being tested. b) The significance level of the test was exactly (pick one option and fill in the blanks):
(i) binomial n = _____, p = ___, k in the range _______. (ii) hypergeometric n = ____, G = _____, N = _______, g in the range _____. c) Suppose that in fact there is red M&m's in the bag. The power of the test against this alternative was exactly (pick one option and fill in the blanks): (i) binomial n = _____ , p = ___, k in the range _______.
(ii) hypergeometric n = ___, G = _______, n = _________, g in the range ___.
Solution
A) $H _0:$ there is red in the bag.
b) The significance level was under $H _0$ but concludes $H _a$ (Reject $H _0$), this is Type 1 error. $P $ should is small in this case. Hypergeometric distribution: $ $N =100, g=20, n=40, g=11:20$$ $$\rightarrow P=\frac{\sum_{g=11}^{20}c_{20}^{g}\cdot C_{ 80}^{40-g}}{c_{100}^{40}}=0.1017439$$ R Code:
Sum (Dhyper (11:20, 20, 80, 40)) [1] 0.1017439
c) The power is under $H _a$ and concludes $H _a$, this is correct answer and $P $ should be large. Hypergeometric distribution: $ $N =100, g=30, n=40, g=11:30$$ $$\rightarrow P=\frac{\sum_{g=11}^{30}c_{30}^{g}\cdot C_{ 70}^{40-g}}{c_{100}^{40}}=0.7466689$$ R Code:
Sum (Dhyper (11:30, 30, 70, 40)) [1] 0.7466689
Problem 4
In order to test whether or not a random number generator are producing the digit "0" in the correct proportion (1/10), the Generator'll be run 5,000 times. You can assume the runs is mutually independent and that each have the same probability p of producing "0". Construct a test that has a significance level of approximately 1%. [Note: "Construct a Test" means "Come up with a decision rule.") In the context of this problem, which means you has to say how do you would use the number of 0 's among your 5,000 results to Decide between your hypotheses.]
Solution $ $H _0:p=0.1,\ h_a:p\ne0.1$$ The significance level was $1%$ means concluding $H _a$ while assuming $H _0$ is right. Under $H _0$, by normal approximation: $ $n =5000, P=0.1\rightarrow\mu=n\cdot p=500, \sigma=\sqrt{n\cdot P\cdot (1-p)}$$ This is two-tail test which means tail are 0.005, so $Z =\pm2.575829$ R code:
Qnorm (1-0.005) [1] 2.575829
Thus, the cutoffs is $\mu\pm z\cdot\sigma=[445.2084,554.7916]$ R code:
n = 5000; p = 0.1sigma = sqrt (n * p * (1-p)); MU = n * PMU + z * sigma[1] 445.2084mu-z * sigma[1] 554.7916
The test is:choose $H _a$ If the number of 0 is 445 or less, or 555 or more; Otherwise stay with $H _0$.
EXERCISE SET 2
If a problem asks for a approximation, please use the methods described in the video lecture segments. Unless the problem says otherwise, please give answers correct to one decimal place according to those methods. Some of the problems below is about simple random samples. If The population size isn't given, you can assume that the correction factor for standard errors are close enough to 1st At it does isn't need to be computed. Please use the 5% cutoff for p-values unless otherwise instructed in the problem.
Problem 1
A die is rolled. The face with six spots appears. Is the die biased towards, or are this just chance variation? Answer the question in the steps outlined in problems 1a-1f.
1 A
A. The null hypothesis is
B. The die was biased towards the face with six spots.
C. The chance, the face and the six spots appears are greater than 1/6, and the face appeared the Times in the sample just By chance.
D. The chance, the face with six spots appears are equal to 1/6, and the face appeared the Times of the sample just by Chance.
E. The die was biased towards the faces that don ' t show six spots.
F. The chance the face with six spots appears are equal to 112/600.
G. The proportion of times the face with six spots appears are equal to 112/600.
1 b The alternative hypothesis is
A. The die is biased.
B. The chance the face with six spots appears is greater than 1/6.
C. The chance the face with six spots appears are equal to 112/600.
D. The proportion of times the face with six spots appears are equal to 112/600.
1C If The null hypothesis were true, the expected number of the times the face with six spots appeared would is 112 100
1D If The null hypothesis were true, the standard error in the the number of times the face with six spots appeared would is _ ____.
1E the p-value of the test is _____%. [Careful to enter your answer as a percent; If your answer are 50% you should enter "the blank, not 50%, nor 0.5, nor, etc]
1F "The test concludes, the die was biased towards the face with six spots." True False
Solution
1A) d is correct. $ $H _0:p=\frac{1}{6}$$
1B) b is correct. $ $H _a:p > \frac{1}{6}$$
1C) $ $n \cdot p=600\times\frac{1}{6}=100$$
1D) $$\sigma=\sqrt{n\cdot P\cdot (1-p)}=9.128709$$
1E) Binomial distribution: $$\sum_{k=112}^{600}c_{600}^{k}\cdot P^k\cdot (1-p) ^{600-k}=10.50586\%$$ R code:
Sum (dbinom (112:600, 600, 1/6)) [1] 0.1050586
1F) Because $P > 5\%$ cutoff, so reject $H _a$ (choose $H _0$). The answer is False.
Problem 2
A statistics student hands each of the classmates 2 cookies side by side on a plate. Of the students, 171 choose the cookie that's on their right hand side, and the remaining 129 choose the cookies that ' s On their left. The student says, "That's just like tossing a coin." The student's friend says, "No, it's not." Help them settle their argument by performing a one-sample z test in Problems 2a-2c.
2 a The test should be one-tailed. Two-tailed.
2 b The P-value of the test is _____%.
2 c The test concludes: "That's just like tossing a coin." "No, it ' s not."
Solution
2 A) $ $H _0:p=0.5,\ h_a:p\neq0.5$$ Thus it is two-tailed.
2B) Binomial distribution: $n =300, p=0.5, k=0:129\ \&\ 171:300$:$$\sum_{i=0}^{129}c_{300}^{i}\cdot P^i\cdot (1-p) ^{ 300-i}+\sum_{j=171}^{300}c_{300}^{j}\cdot P^j\cdot (1-p) ^{300-j}=1.777934\%$$ R code:
Sum (dbinom (0:129, 0.5)) + SUM (dbinom (171:300, 300, 0.5)) [1] 0.01777934
2C) $P < 5\%$, so reject $H _0$. That's, it's not like tossing a coin.
Problem 3
There is boxes, with several million tickets marked "1" or "0". The boxes has the same number of tickets, but in one of the boxes, 49% of the tickets is marked "1" and in the other Box 50.5% of the tickets is marked "1". Someone hands me one of the boxes but doesn ' t tell me which box it is. Consider the following hypotheses:null:p = 0.49 Alternative:p = 0.505 Here is my proposed test:i would draw a simple RA Ndom sample of tickets, and if 5,000 or more of them is marked "1" then I'll choose the alternative; Otherwise I'll stay with the null.
3 A significance level of my test is _____%.
3 b The power of my test is _____%.
Solution
3 A) The significance level was the probability of under $H _0$ but concludes $H _a$. Thus, by binomial distribution $p =0.49$ in this case: $$\sum_{k=5000}^{10000}c_{10000}^{k}\cdot0.49^k\cdot0.51^{10000- k}=2.328171\%$$ R Code:
Sum (dbinom (5,000:10,000, 10000, 0.49)) [1] 0.02328171
3B) The power is the probability of under $H _a$ and concludes $H _a$. Thus, by binomial distribution $p =0.505$ in this case: $$\sum_{k=5000}^{10000}c_{10000}^{k}\cdot0.505^k\cdot0.495^{ 10000-k}=84.37643\%$$ R Code:
Problem 4
There is 21,000 students in a Statistics MOOC. Each student tests the fairness of a coin (yes, the same coin; the instructor somehow gets Tyche's help in getting the COI N to all student in turn). Specifically, each student tests:null:p are equal to 0.5 alternative:p are not equal to 0.5 using the 5% cutoff. Suppose that, unknown to the students, the coin was in fact fair. The expected number of students whose test would conclude that the coin are unfair is __________. [This answer is actually a integer, cleanly calculated; But I'll allow your an error of +-5.]
Solution
The chance that a, student makes the wrong conclusion under the null is 5%; That's what the cutoff represents. So the number of students who make the wrong conclusion under the null was binomial with $n =21000, p=0.05$. The expected value is $21000\times0.05=1050$.
University of California, Berkeley stat2.3x Inference statistical Inference Study Note: Section 2 testing statistical hypotheses