4.4 R1 in which of the following problems are Case/control sampling LEAST likely to make a positive impact?
A. Predicting a shopper ' s gender based on the products they buy
B. Finding predictors for a certain type of cancer
C. Predicting if an e-mail is Spam or not Spam
Correct answer:a
Explanation: Case/control sampling is more effective when the prior probabilities of the classes is very unequal. We expect this to is the case for the cancer and spam problems, but not the gender problem.
4.5 R1 Suppose in Ad Clicks (a problem where do I try to model if a user would click on a particular Ad) it's well kno The majority of the time an ad is shown it won't be clicked. What is another the saying that?
A. Ad Clicks has a low Prior probability.
B. Ad Clicks has a high Prior probability.
C. Ad Clicks has a low Density.
D. Ad Clicks has a high Density.
Correct answer:a
Explanation: Whether or not an ad gets clicked is a qualitative Variable. Thus, it does not has a density. The Prior probability of Ad Clicks is low because most ads was not clicked.
4.6 R1 Which of the following is not a linear function in x:
A. f (x) = a + b^2x
B. The discriminant function from LDA.
C. \delta_k (x) = x\frac{\mu_k}{\sigma^2}-\frac{\mu_k^2}{2\sigma^2} +\log (\pi_k)
D. \text{logit} (P (y = 1 | x)) where p (y = 1 | x) is as in logistic regression
E. P (y = 1 | x) from logistic regression
Correct answer:eexplanation:p (y = 1 | x) from logistic regression are not linear because it involves both an exponential function of X and a ratio.
5.1 R2 What is reasons why test error could is less than training error?
A. By chance, the test set has easier cases than the training set.
B. The model is highly complex, so training error systematically overestimates test error.
C. The model is not very complex, so training error systematically overestimates test error.
Correct answer:a
Explanation:training error usually underestimates test error when the model was very complex (compared to the Training set Size), and is a pretty good estimate if the model is not very complex. However, it's always possible we just get too few hard-to-predict points in the test set, or too many in the training set.
---restore content ends---
Stanford Open Class: Statistical the wrong choice in learning