University of California, Berkeley stat2.3x Inference statistical Inference Study Note: Section 5 Window to a wider world

Source: Internet
Author: User

The stat2.3x inference (statistical inference) course was taught at the EdX platform by the University of California, Berkeley (University of California, Berkeley) in 2014.

Download PDF Note (academia.edu)

Summary

Chi-Square test

  • Random sample or Not/good or bad
    • $ $H _0: \text{good model}$$ $ $H _a: \text{not Good model}$$
    • Based on the expected proportion to calculate the expected values
    • $\chi^2$ statistic is $$\chi^2=\sum{\frac{(O-E) ^2}{e}}$$ where $o $ are observed values, $e $ is expected values.
    • The degree of freedom is the number of categories minus one
    • Follows approximately the $\chi^2$ distribution, we can calculate its p-value by using R function:
      1-PCHISQ (CHI, DF)
  • Independent or not
    • $ $H _0: \text{independent}$$ $ $H _a: \text{not independent}$$
    • Contingency table
    • Under $H _0$, in each cell of the table $$\text{expected Count}=\frac{\text{row total}\times\text{column Total}}{\text{gra nd total}}$$ is, $P (A\cap B) =p (A) \cdot P (b) $ under the independent assumption.
    • $\chi^2$ statistic is $$\chi^2=\sum{\frac{(O-E) ^2}{e}}$$ where $o $ are observed values, $e $ is expected values.
    • The degree of freedom is $ (\text{row}-1) \times (\text{column}-1) $
    • Follows approximately the $\chi^2$ distribution, we can calculate its p-value by using R function:
      1-PCHISQ (CHI, DF)

ADDITIONAL practice Problems for WEEK 5

The population is all patients at a large system of hospitals; Each sampled patient is classified by the type of the he/she is in, and his/her level of satisfaction with the care Rec Eived. The question is whether type of are independent of level of satisfaction.

1. What is the null and alternative hypotheses?

2. Under the null, what's the estimated expected number of patients in the "shared, somewhat satisfied" cell?

3. Degrees of freedom = ()

4. The chi-square statistic is about 13.8. Roughly what's the p-value, and what is the conclusion of the test?

Solution

1. Null:the, variables is independent; Alternative:the variables is not independent.

2. We need to expand the original table:

Thus The estimated expected number of patients in the shared hostel, somewhat satisfied is $$784\times\frac{322}{784}\times\ frac{255}{784}=104.7321$$

3. Degree of freedom is $ (3-1) \times (3-1) =4$

4. P-value is 0.007961505 which are smaller than 0.05, so we reject $H _0$. That's, the conclusion is the variables was not independent. R Code:

1-PCHISQ (13.8, 4) [1] 0.007961505

ungraded EXERCISE SET A Problem 1

According to a genetics model, plants of a particular species occur in the categories A, B, C, and D, in the ratio 9:3:3:1 . The categories of different plants is mutually independent. At a lab, the grows these plants, 218 is in category A, the in category B, the type C, and in category D. Does The model look good? Follow the steps in problems 1a-1f.

1 A, the null hypothesis is:

A. The model is good.

B. The model isn ' t good.

C. Too Many of the plants is in Category C.

D. The proportion of plants in Category A was expected to be 9/16; The difference in the sample is due to chance.

1 b The alternative hypothesis is:

A. The model is good.

B. The model isn ' t good.

C. Too Many of the plants is in Category C.

D. The proportion of plants in Category A was expected to be 9/16; The difference in the sample is due to chance.

1C under the null, the expected number of plants in Category D is ().

1D The chi-square statistic is closest to

A. 1 B. 1.5 c. 2 D. 2.5 E. 3 F. 3.5 G. 4 H. 4.5

1E Degrees of freedom = ().

1F Based on this test, does the model look good? Yes No

Solution

1A) The null hypothesis is "The model is good". (a) is correct.

1B) The alternative hypothesis is "the model was not good". (b) is correct.

1C) The expected number of plants in Category D is $$ (218+69+84+29) \times\frac{1}{9+3+3+1}=25$$

1D) (d) is correct. We can use the following table

R Code:

o = C (218, N, N, +) E = C (225,,,) Chi = SUM ((o-e) ^2/e); CHI[1] 2.417778

1E) degree of freedom is $4-1=3$.

1F) P-value is 0.4903339 which are larger than 0.05, so we reject $H _a$. The conclusion is "the model is good". R Code:

1-PCHISQ (chi, 3) [1] 0.4903339

Problem 2

A simple random sample of cars in a city is categorized according to fuel type and place of manufacture.

is place of manufacture and fuel type independent? Follow the steps in problems 2a-2d.

2 A If the variables were independent, the chance that a sampled car are a domestic gasoline fueled car would be estimat Ed to is about

0.0362 0.0499 0.2775) 0.3820 0.5

2 b If The variables were independent, the expected number of foreign gas/electric hybrids would be estimated to be () . (please keep at least-places; by now you should understand what you should isn't round off to an integer.)

2 C Degrees of freedom = ()

1 2 3 4

The chi-square statistic is 0.6716. The test therefore concludes that the variables is independent not independent

Solution

2 a) Expand the table:

If the variables were independent, then $ $P (\text{domestic gasoline}) =p (\text{domestic}) \cdot P (\text{gasoline}) =\ Frac{215}{511}\times\frac{337}{511}=0.2774767\doteq 0.2775$$

2B) If The variables were independent, then $$511\times P (\text{foreign gasoline/electricity}) =511\times\frac{296}{ 511}\times\frac{130}{511}=75.30333$$

2C) Degree of freedom is $ (2-1) \times (3-1) =2$.

2D) The P-value is 0.714766 which are larger than 0.05, so we reject $H _a$. That's, the conclusion is independent. R Code:

1-PCHISQ (0.6716, 2) [1] 0.714766

We can calculate $\chi^2$ statistic by using R built-in function

Chisq.test ()
data = Matrix (c (146, S, S, 191, D, D), Ncol = 2)  chisq.test (data) Pearson ' s chi-squared testdata:  datax-square D = 0.6716, df = 2, P-value = 0.7148

University of California, Berkeley stat2.3x Inference statistical Inference Study Note: Section 5 Window to a wider world

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.