Alibabacloud.com offers a wide variety of articles about coursera stanford machine learning cost, easily find your coursera stanford machine learning cost information here online.
these matrices, and the θ superscript (j) becomes a wave matrix that controls the action from the first layer to the second or second to the third layer. The first hidden unit calculates its value in this way: A (2) 1 equals the S function or S-excitation function, also called the logical excitation function, which acts on the linear combination of this input. The second hidden unit equals the value of the S function on this linear combination. The parameter matrix controls the mapping from thr
From ⅱ to IV, linear regression is used. Chapter II describes simple linear regression (SLR) (single variable ), chapter III describes the basis of line generation, and chapter IV describes multivariate regression (greater than one independent variable ).
The purpose of this article is to implement some algorithms that appear in chapter II. Suitable for scholars who have already completed Stanford courses in this chapter. I am just a beginner and tr
representation is shown in figure:
As you can seeHFunction and sample point fitting is the best, and cost functionJThe minimum value is also obtained,Theta0AndTheta1The value can beJThe horizontal and vertical coordinates. 5. Gradient Descent (
Gradient Descent
)
To makeJMinimum, our idea is to changeThetaTo change the valueJ. HereThetaThe initial value is not required. Only the changes are considered. The falling gradient here means
mean vector for the above image is:
1.2 Gaussian discriminant analysis model
When we have such a classification problem, its input characteristics are continuous random variables. Then we can apply Gaussian discriminant analysis (GDA): Use a multivariate Gaussian distribution to model P (x|y), as follows:
The distributions are written like this:
Here, the parameters of our model are φ,σ,μ0 and μ1 (note that there are 2 different mean vectors, but only one covariance matrix). Its logarithmic
classifier will be severely affected, as shown in:To solve the above two problems, we adjust the optimization problem to:Note: When ξ>1, it is possible to allow the classification to be wrong, and then we add the ξ as a penalty to the target function.Using Lagrange duality again, we get the duality problem as:Surprisingly, after adding the L1 regularization item, only a αi≤c is added to the like limit in the dual problem. Note that the b* calculation needs to be changed (see Platt's paper)KKT d
) ^2\)To break it apart, it was \ (\frac1 2 \bar{x}\) where \ (\bar{x}\) is the mean of the squares of $h _θ (x_i)? Y_i $, or the difference between the predicted value and the actual value.This function is otherwise called the "Squared error function", or "Mean squared error". The mean is halved \ ((\frac1 2) \) as a convenience for the computation of the gradient descent, as the derivative Term of the square function would cancel out the \frac1 2\ . The following image summarizes what is the
same. In addition, it is necessary to feature scale (Features scaling) features before running the gradient descent algorithm.Some options beyond the gradient descent algorithm:In addition to the gradient descent algorithm, there are algorithms that are often used to minimize the cost function, which are more complex and excellent, and typically do not require manual selection of learning rates, and are o
it is easy to cause the overflow. This is because X and ln (x) have the same monotonicity, and both sides take the logarithmSo this is the J (Theta) that Andrew gave, and the only difference is that Andrew has a negative coefficient in front of it, which makes the maximum value a minimum, so that the gradient descent algorithm can be used.But in fact, with this formula can also complete the task, just use the algorithm to become gradient rise, in fact, no difference.ConclusionHere Amway "
5.1 Cost FunctionSuppose the training sample is: {(x1), Y (1)), (x (2), Y (2)),... (x (m), Y (m))}L = Total No.of layers in NetworkSl= no,of units (not counting bias unit) in layer LK = number of output units/classesThe neural network, L = 4,S1 = 3,s2 = 5,S3 = 5, S4 = 4Cost function for logistic regression:The cost function of a neural network: 5.2 Reverse Propagation Algorithm backpropagationA popular ex
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.