This is a machine learning course that coursera on fire, and the instructor is Andrew Ng. In the process of looking at the neural network, I did find that I had a problem with a weak foundation and some basic concepts, so I wanted to take this course to find a leak. The current plan is to see the end of the neural network, the back is not necessarily seen.
Of course, look at the process is still to do the notes to do homework, or read it is also a cursory. This note is for me only personally, so I will not repeat what I have already done, which is equivalent to a note for myself. If you are interested, you can go to "machine learning" study carefully.
Next comes the first week of some of the issues I think require extra attention.
1, emphasized the hypothesis and cost function is "it of who"
set a function to be fitted as y=θ1x+θ0, where x is the input sample point, and θi is the parameter to be learned, such an expression we call it hypothesis. in This course, it can be found that the Wunda teacher reads H (x) is "H of X", meaning that H is the function of X, in H (x) x is an independent variable, that is, the input sample point x is an argument.
It should be emphasized that for hypothesis H (x), the argument is the input sample point x, and for the cost function J (θ), the argument is the parameter to be learned. The so-called gradient descent method often gives the same mountain diagram, it is to let the parameters θi a lot of many values, and then see how much θi when the entire cost function to reach the lowest point.
2. Contour Diagram
The contour diagram is used instead of the three-dimensional diagram.
For example, for a y=θ1x+θ0 function, when there are two parameters (θ1 and θ0), the cost function is drawn as a three-dimensional graph that looks like this:
But this is how it is expressed in contour diagrams:
, the point on each lap is the same as the value of the cost function, which is equivalent to the top view of a mountain. like the geography of the mountain is not painted in the shape of the image, but the contour map to express, so the painting is a circle of.
When the extreme point falls in the position of the Red fork, the fitting result is the Blue line on the left side of the picture, you can see that this time the θ1 and θ0 corresponding to the hypothesis (that blue straight line equation) is a good result. The location of the Red Fork can be said to be the lowest valley of the mountain.
3, emphasizing gradient descent method is the synchronization of all parameters update
Just take the picture and explain it.
The above is a glance to list what is called Synchronous update, what is called non-synchronous update.
Like a bunch of neural network weights, not to change a parameter and then the next gradient and then change the next parameter, but the one-time use of the BP algorithm to calculate all the gradient values, and the entire neural network all the parameters are implemented synchronously update.
The reason is also very obvious, look at the right side of this non-synchronous update of the figure, is to change the value of θ0, and then you take this change after the θ0 to calculate the θ1 gradient value, and then to update θ1, which is obviously meaningless.
(recorded in 2016.6.28)
By Yau Wang Nanshan
Coursera Course "Machine learning" study notes (WEEK1)