Decision Boundary (decision boundary)
The last time we discussed a new model-the logistic regression model (Regression), in logistic regression, we predicted:
- When H? is greater than or equal to 0.5, the predicted Y=1
- When H? is less than 0.5, the predicted y=0
based on the above predictions, we draw an S-shape function, as follows:
According to the function image, we know that when
- Z=0, G (z) =0.5
- Z>0, G (z) >0.5
- Z<0, G (z) <0.5
There are also:
So
above, for the part of our predictable logistic regression. OK, now suppose we have a model: and the parameter? is the vector:[-3 1 1]. Then when -3+X1+X2 is greater than or equal to 0, that is, X1+X2 is greater than or equal to 3 o'clock, the model predicts Y=1.
We can draw out the x1+x2=3, which is the dividing line of our model, also called the judgment boundary (decision boundary), which divides the area predicted to 1 and the area predicted as 0.
Assuming that our data is distributed in such a way, what can our model do to fit the data?
For example, the function image is a circle, the dot is at the origin and the radius is 1, so a curve separates the areas of Y=1 and y=0, so what we need is a two-bit feature:
Assuming that the parameter is [-1 0 0 1 1], then we get the decision boundary that is exactly the circle at the origin and the radius is 1.
We can use very complex models to accommodate the decision boundaries of very complex shapes.
cost function for logistic regression model
For linear regression models, the cost function we define is the sum of squares of all model errors. Theoretically, we can also use this definition for the logistic regression model, but the problem is that when we will:
Substituting the cost function defined in this way, the cost function we get will be a non-convex function (non-covexfunctions).
This means that our cost function will have a lot of local minimum values, which will affect the gradient descent algorithm to find the global minimum.
Therefore, we redefine the cost function of logistic regression as:
Where the cost (h? ( X (i), Y (i)) is an iterative form of cost function that we define as follows:
H? (x) and cost (h? x), the relationship between y) is as follows:
By the cost of such construction (h? ( x), Y) function is characterized by:
When the actual Y=1 and H?=1, the error is 0; when y=1 but h! = 1 o'clock, the error increases with the size of H.
When the actual y=0 and h?=0, the error cost is 0; when y=0 but h! = 0 o'clock, the error becomes larger with the H.
The cost of the build (H? ( x), y) for a simplification, you can get the following simplified formula:
This simplification is actually the cost of the above (h? x), y) for a one-time combination of two expressions.
The simplification is added to the cost function, resulting in:
This is the cost function of the logistic regression model.
After obtaining such a cost function, we can use the gradient descent algorithm (Gradient descent) to find the parameters that can minimize the cost function.
Gradient Descent algorithm:
To differentiate this, get:
* Note: Although the gradient descent algorithm obtained is the same as the gradient descent algorithm on the surface and linear regression, but here the H? (x) = g (? TX) is different from linear regression, so it's actually not the same. In addition, it is necessary to feature scale (Features scaling) features before running the gradient descent algorithm.
Some options beyond the gradient descent algorithm:
In addition to the gradient descent algorithm, there are algorithms that are often used to minimize the cost function, which are more complex and excellent, and typically do not require manual selection of learning rates, and are often faster than gradient descent algorithms. Some examples: conjugate gradient method (conjugate Gradient), Local optimization method (broyden Fletcher Goldfarb Shann, BFGS) and the limited memory local optimization method (lbfgs). These algorithms are more complex and more excellent, if interested we can continue to discuss later.
In Matlab or Octave , there is a minimum value optimization function,fminunc. When used, we need to provide the cost function and the derivation of each parameter, here is an example:
function [Jval, gradient] = costfunction (theta)%costfunction Summary of this function goes here% detailed Explanat Ion goes here Jval = (theta (1)-5) ^2 + (theta (2)-5) ^2; Gradient = zeros (2,1); Gradient (1) = (theta (1)-5); Gradient (2) = Theta (2)-5) Endoptions = optimset (' GradObj ', ' on ', ' maxiter ', ' + '); Initialtheta = zeros (2,1); [Opttheta, Functionval, exitflag] = Fminunc (@costFunction, Initialtheta, Options);
*ps: About machine learning-related algorithms
MatlabOr
OctaveThe code, I uploaded to my coding.net project, the need for children's shoes can contact me.
Machine learning--the cost function of judging boundary and logistic regression model