Machine learning--the cost function of judging boundary and logistic regression model

Source: Internet
Author: User

Decision Boundary (decision boundary)

The last time we discussed a new model-the logistic regression model (Regression), in logistic regression, we predicted:

    • When H? is greater than or equal to 0.5, the predicted Y=1
    • When H? is less than 0.5, the predicted y=0
based on the above predictions, we draw an S-shape function, as follows:


According to the function image, we know that when

    • Z=0, G (z) =0.5
    • Z>0, G (z) >0.5
    • Z<0, G (z) <0.5
There are also:


So


above, for the part of our predictable logistic regression. OK, now suppose we have a model: and the parameter? is the vector:[-3 1 1]. Then when -3+X1+X2 is greater than or equal to 0, that is, X1+X2 is greater than or equal to 3 o'clock, the model predicts Y=1.

We can draw out the x1+x2=3, which is the dividing line of our model, also called the judgment boundary (decision boundary), which divides the area predicted to 1 and the area predicted as 0.


Assuming that our data is distributed in such a way, what can our model do to fit the data?


For example, the function image is a circle, the dot is at the origin and the radius is 1, so a curve separates the areas of Y=1 and y=0, so what we need is a two-bit feature:


Assuming that the parameter is [-1 0 0 1 1], then we get the decision boundary that is exactly the circle at the origin and the radius is 1.

We can use very complex models to accommodate the decision boundaries of very complex shapes.


cost function for logistic regression model

For linear regression models, the cost function we define is the sum of squares of all model errors. Theoretically, we can also use this definition for the logistic regression model, but the problem is that when we will:

Substituting the cost function defined in this way, the cost function we get will be a non-convex function (non-covexfunctions).


This means that our cost function will have a lot of local minimum values, which will affect the gradient descent algorithm to find the global minimum.

Therefore, we redefine the cost function of logistic regression as:


Where the cost (h? ( X (i), Y (i)) is an iterative form of cost function that we define as follows:


H? (x) and cost (h? x), the relationship between y) is as follows:


By the cost of such construction (h? ( x), Y) function is characterized by:

When the actual Y=1 and H?=1, the error is 0; when y=1 but h! = 1 o'clock, the error increases with the size of H.

When the actual y=0 and h?=0, the error cost is 0; when y=0 but h! = 0 o'clock, the error becomes larger with the H.

The cost of the build (H? ( x), y) for a simplification, you can get the following simplified formula:


This simplification is actually the cost of the above (h? x), y) for a one-time combination of two expressions.

The simplification is added to the cost function, resulting in:


This is the cost function of the logistic regression model.

After obtaining such a cost function, we can use the gradient descent algorithm (Gradient descent) to find the parameters that can minimize the cost function.

Gradient Descent algorithm:


To differentiate this, get:


* Note: Although the gradient descent algorithm obtained is the same as the gradient descent algorithm on the surface and linear regression, but here the H? (x) = g (? TX) is different from linear regression, so it's actually not the same. In addition, it is necessary to feature scale (Features scaling) features before running the gradient descent algorithm.


Some options beyond the gradient descent algorithm:

In addition to the gradient descent algorithm, there are algorithms that are often used to minimize the cost function, which are more complex and excellent, and typically do not require manual selection of learning rates, and are often faster than gradient descent algorithms. Some examples: conjugate gradient method (conjugate Gradient), Local optimization method (broyden Fletcher Goldfarb Shann, BFGS) and the limited memory local optimization method (lbfgs). These algorithms are more complex and more excellent, if interested we can continue to discuss later.

In Matlab or Octave , there is a minimum value optimization function,fminunc. When used, we need to provide the cost function and the derivation of each parameter, here is an example:

function [Jval, gradient] = costfunction (theta)%costfunction Summary of this function goes here% detailed   Explanat Ion goes here    Jval = (theta (1)-5) ^2 + (theta (2)-5) ^2;    Gradient = zeros (2,1);    Gradient (1) = (theta (1)-5);    Gradient (2) = Theta (2)-5) Endoptions = optimset (' GradObj ', ' on ', ' maxiter ', ' + '); Initialtheta = zeros (2,1); [Opttheta, Functionval, exitflag] = Fminunc (@costFunction, Initialtheta, Options);

*ps: About machine learning-related algorithms MatlabOr OctaveThe code, I uploaded to my coding.net project, the need for children's shoes can contact me.

Machine learning--the cost function of judging boundary and logistic regression model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.