Question 1
Consider the problem of predicting how well a student does in hers second year's college/university, given how well they d ID in their first year. Specifically, let X is equal to the number of "A" grades (including A. A and A + grades) that's a student receives in their first year's College (freshmen year). We would like to predict the value of Y, which we define as the number of "A" grades they get in their second year (Sophom Ore year).
Questions 1 through 4 would use the following training set of a small sample of different students ' performances. Here are all row is one training example. Recall, linear regression, our hypothesis is $h theta (x) =θ_0+θ_1x$, and we use M to denote the number of training Exampl Es.
For the training set given above, what is the value of M? In the box below, please enter your answer (which should is a number between 0 and 10).
Answer
M is the number of training examples. In this example, we have m=4 examples.
4
Question 2
Many substances that can burn (such as gasoline and alcohol) has a chemical structure based on carbon atoms; For this reason they is called hydrocarbons. A Chemist wants to understand how the number of carbon atoms in a molecule affects how much energy was released when that M Olecule combusts (meaning, it is burned). The chemists obtains the dataset below. In the column on the right, "Kj/mol" are the unit measuring the amount of energy released. Examples.
You would-like-to-use linear-regression (hθ (x) =θ_0+θ_1x) to estimate the amount of energy released (y) as a function of th E Number of carbon atoms (x). Which of the following do you think is the values you obtain Forθ_0 andθ_1? You should is able to select the right answer without actually implementing linear regression.
Answer
Since the carbon atoms (x) Increase and the released heat (y) decreases,θ_1 have to be negative. Θ_0 Functionas as the offset. Looking at the table:a fewθ_0 should be higher than-1000
- θ_0=−1780.0,θ_1=−530.9
- θ_0=−569.6,θ_1=−530.9
- θ_0=−1780.0,θ_1=530.9
- θ_0=−569.6,θ_1=530.9
Question explanation
We can give an approximate estimate of theθ0 andθ1 values observing the trend of the data in the training set. We See this Y values decrease quite regularly when the x values increase, thenθ1 must is negative. Θ0 is the value of the hypothesis takes when x was equal to zero, therefore it must being superior to Y (1) in order to Satis FY the decreasing trend of the data. Among the proposed answers, the one that's meets both the conditions is hθ (x) =−569.6−530.9x. We can better appreciate these considerations observing the graph of the training data and the linear regression (below):
Question 3
Suppose we setθ_0=−1,θ_1=0.5. What is hθ (4)?
Answer
hθ(x) = θ_0 + θ_1x
hθ(x) = -1 + 0.5x
hθ(4) = -1 + 0.5 * 4
hθ_θ(4) = 1
Question 4
Let f is some function so, f (θ0,θ1) outputs a number. For this problem, an F is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so F may have local Optima). Suppose we use gradient descent-to-try to minimize F (θ0,θ1) as a function ofθ0 andθ1. Which of the following statements is true? (Check all, apply.)
Answer
- Ifθ0 andθ1 is initialized so thatθ0=θ1, then by symmetry (because we does simultaneous updates to the same parameters), a fter one iteration of gradient descent, we'll still haveθ0=θ1.
- The updates toθ0 andθ1 is different (even though we ' re doing simultaneous updates), so there ' s no particular reason to Expect them to is the same after one iteration of gradient descent.
- Setting The Learning rateαto be very small are not harmful, and can are only speed up the convergence of gradient descent.
- If The learning rate are small, gradient descent ends up taking a extremely small step on all iteration, so this would AC Tually slow down (rather than speed up) the convergence of the algorithm.
- If The first few iterations of gradient descent cause f (θ0,θ1) to increase rather than decrease, then the most likely Caus E is, we have set the learning rateαto too large a value.
- If Alpha were small enough, then gradient descent should all successfully take a tiny small downhill and decrease F (\th eta_0,\theta_1) At least a little bit. If gradient descent instead increases the objective value, that means alpha was too large (or you had a bug in your code!) .
- If The learning rate was too small, then gradient descent could take a very long time to converge.
- If The learning rate are small, gradient descent ends up taking a extremely small step on each iteration, and therefore CA N Take a long time to converge.
Question 5
Suppose, some linear regression problem (say, predicting housing prices as in the lecture), we have some training Set, and for our training set we managed to find someθ0,θ1 such that J (θ0,θ1) =0. Which of the statements below must then be true? (Check all, apply.)For the is true, we must haveθ0=0 andθ1=0 so, hθ (x) =0
Answer
- If J (θ0,θ1) =0, that means the line defined by the equation "y=θ0+θ1x" perfectly fits all of our data. There's no particular reason to expect so the values ofθ0 andθ1 that achieve this is both 0 (unless y (i) =0 for all of Our training examples).
- Our training set can is fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straigh T line.
- If J (θ0,θ1) =0, that means the line defined by the equation "y=θ0+θ1x" perfectly fits all of our data.
- For the To is true, we must has Y (i) =0 for every value of i=1,2,..., m.
- So long as any of our training examples lie on a straight line, we'll be able to findθ0 andθ1 so, J (θ0,θ1) =0. It is not a necessary that Y (i) =0 for all of our examples.
- We can perfectly predict the value of Y even for new examples, that we had not yet seen. (e.g., we can perfectly predict prices of even new houses that we had not yet seen.)
- Even though we can fit our training set perfectly, this does not mean that we'll always make perfect predictions on houses In the future/on houses, we have not yet seen.
"Coursera-machine learning" Linear regression with one Variable-quiz