Discover andrew ng deep learning specialization, include the articles, news, trends, analysis and practical advice about andrew ng deep learning specialization on alibabacloud.com
Model Representation
NG Video has an example of a house price, a data set between the House area X and the price y:
area (x)
Price (y)
2104
460
1416
232
1534
315
852
178
...
...
Here is defined:
m: Number of training samples, M = 4 visible in the table abovex (i) x^{(i)} : I i input variables/features, in multiple input variables x (i) x^{
divided by 2 on the basis of the square mean.The equation for judging deviations is called cost Function. The smaller the deviation, the lower the value of the cost function, the better the fit.4. How do I train a model? The purpose of the training model is to achieve good fit, that is to say, the value of cost function is as small as possible.Training here, is to choose a set of coefficients θ (after the model is determined, the parameter of the model is the coefficient theta), to achieve the
Tags: video LSE tun assign DDE INI got the NTSJust finished watching all videos of this course-thank your Andrew for elaborating all basic ML concepts\algorithms in an Easy to understand.I watched most of the course videos on BART, and unfortunately I didn ' t has a chance to work on programming assignments- But again, just following videos helps a ton. All topics is so well organized and internally related. I ' ve got so many ' ah-ha ' moments, and a
function and the derivation of each parameter when using it. we implement the costfunction ourselves and pass in the response parameter. We can return the following two values at a time:
For example, call the fminunc () function and use @ to input the pointer to the costfunction function. For the initialized Theta, you can also add options (gradobj = on indicates "Open the gradient target parameter ", that is, we will provide gradient parameters for this function ):
6.7 multi-category classifi
updated, and a final θj value is obtained.The entire derivative is calculated as follows:Vector representation of ④ hypothesis function, cost function and gradient descent algorithmSuppose the vector of the function is represented as follows:The cost function is represented as follows:The vectorization of θ using the gradient descent algorithm is represented as follows:(There is an error in the original formula, the formula after the first equals should not be divided by M, corrected here)The c
for linear regressionWe take the formula of the cost function J into the gradient descent algorithm, then use the concept of partial derivative to simplify the formula, and finally we can get the formula. The specific derivation requires some knowledge of calculus.We can actually use them directly. That is, the algorithm is probably written like this, we use these two formulas to constantly revise the value of two parameters, until the function J reached a minimum value. Now that we have this f
"linear regression, gradient descent"The regular equationThe training features are represented as X-matrices, the results are expressed as Y-vectors, and the linear regression model is still the same, and the loss function is unchanged. Then θ can be derived directly from the following formula:The derivation process involves the knowledge of linear algebra, where the linear algebra knowledge is not expanded in detail.Set m as the number of training samples; x is the independent variable in the
endfunction
Initializes the matrix for the preceding dataset. Call a function to calculate the value of the cost function.
1> X = [1 1; 1 2; 1 3]; 2> Y = [1; 2; 3]; 3> Theta = [0; 1]; % records is 0, 1 h (x) = x. The value of the cost function is 04> J = costfunctionj (X, Y, theta) 5 J = 0.
1> Theta = [0; 0]; % values is 0, 0 h (x) = 0. data cannot be fitted at this time. 2> J = costfunctionj (X, Y, theta) 3 J = 2.33334 5> (1 ^ 2 + 2 ^ 2 + 3 ^ 2)/(2*3) % value of the cost function 6 ans = 2
problem of the original problem. Relative to the original problem is only the change of the order of Min and Max, here to take the equal sign. Conditions such as the following descriptive narrations:① If a constrained inequality GI is a convex (convex) function (a linear function belongs to a convex function)② constrained equation hi are affine (affine) functions (Shaped like H (w) =wtx+b)③ and exists W makes for all I,gi (W) In these if, there must be ω?,α?,β, so that Omega is the solution of
category by two, and get N classifiers.When testing is required, input the data into each classifier, selecting one of the largest probabilities as the output.SummaryLogistic regression is built on the basis of linear regression. The model is: the probability that the output is 1 through the sigmoid function. The application should conform to the Bernoulli distribution in the output.The gradient descent algorithm is also useful, and there are some more efficient algorithms. At first, you can us
Omit the use of octave end, later use to see itWeek Three:Logistic Regression:For 0-1 categoriesHypothesis representation:: Sigmoid function or Logistic functionDecision Boundary:Theta's Transpose * small x>=0 is boundaryMay:non-linear decision boundaries, constructing the polynomial of XCost function:Simplified cost function and gradient descent:Because Y has only two values, merging:To find the least biased guide:(The denominator should be ignored)Advanced Optimization:Conjugate gradient,bfgs,
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.