[Original] Andrew Ng Stanford Machine Learning (6) -- lecture 6_logistic Regression

Source: Internet
Author: User
Lecture6 logistic regression Logistic Regression

6.1 classification problem Classification
6.2 assume that hypothesis Representation
6.3 decision Boundary
6.4 cost function
6.5 simplified cost functions and gradient descent simplified cost functions and Gradient Descent
6.6 advanced Optimization
6.7 multi-category classification: One-to-multiple multiclass classification _ One-vs-all

 

Although a logistic regression name contains a regression, it is not a regression algorithm. It is a very powerful classification algorithm that may even be the most widely used in the world.

Feature ScalingIt is also applicable to logistic regression.

6.1 classification problem Classification

Reference video: 6-1-classification (8 min 2.16.mkv

Binary classificationBinary classification ProblemDefinition:

  

6.2 assume that hypothesis Representation

Reference video: 6-2-hypothesis representation (7 min 2.16.mkv

  Introduce a new model: Logistic regression. The output variable ranges from 0 to 1. The assumption of the logistic regression model is:

  

?? Represents a feature vector;

?? The logistic function is also called the sigmoid function. The curve is as follows:

 

Given the input variable X, the probability of Y = 1 is given based on the selected parameter H (X. The probability of Y = 0 is 1-h (x)

6.3 decision Boundary

Reference video: 6-3-demo-boundary (15 min ).mkv

The decision boundary is the boundary between prediction 1 and prediction 0. TheDemo-boundaryIs the line that separates the area where Y = 0 and where Y = 1. It is created by our hypothesis function.

Linear decision boundaries:

Non-linear decision boundaries:

6.4 cost function

Reference video: 6-4-cost function (11 min 2.16.mkv

If we use the cost function in linear regression, it will lead to J (??) It is not a convex function, leading to many local optimal solutions.

To fit the parameters of the logistic regression model ??, The cost function is as follows:

Based on the above formula, when the prediction is the same as the actual price, the price is 0, and vice versa, the price is infinite.

When y = 1, the curves of h (x) and J (j) are as follows:

When y = 0, the curves of h (x) and J (j) are as follows:

 

There are the following rules:

Cost (H θ (x), y) = 0 if H θ (x) = y

Cost (H θ (x), Y) → ∞ if y = 0 and H θ (x) → 1

Cost (H θ (x), Y) → ∞ if y = 1 and H θ (x) → 0

 

6.5 simplified cost functions and Gradient Descent

Reference video: 6-5-simplified cost function and gradient descent (10 min 2.16.mkv

The preceding two formulas are simplified to the following formula (when y is equal to 0 or 1, only one of the two formulas is left ):

The complete cost functions are as follows:

A vector is implemented as follows:

The gradient descent process is as follows:

Use mathematical methods to deduce the derivative of J (Cosine) in the upper formula:

Import the update algorithm to obtain the following algorithm:

The gradient descent algorithm above looks the same as linear regression, but in fact it is completely different. Because h (x) is a linear function, h (x) in logistic regression is defined as follows:

 

The vectoring implementation of a gradient descent is as follows:

 

6.6 advanced Optimization

Reference video: 6-6-advanced optimization (14 min 2.16.mkv

In addition to gradient descent algorithms, there are also some algorithms that are often used to minimize the cost of functions. These algorithms are more complex and superior, and generally do not require manual learning rate, which is faster than gradient descent algorithms. These include:Bounded gradient(Conjugate gradient ),Local Optimization Method(Broyden Fletcher Goldfarb shann, BFGS) andLimited Memory Local Optimization Method(Lbfgs ).

These algorithms have an intelligent internal loop called line search, which can automatically try different learning rates. You only need to provide these algorithms with methods for calculating derivative items and cost functions to return results. It is applicable to large-scale machine learning problems.

They are too complex to implement by themselves, but rather call the MATLAB method. For example, an unlimited minimum value function fminunc. It uses one of the many advanced optimization algorithms, just like the Gradient Descent Method of the enhanced version, which automatically selects the learning rate and finds the Optimal Gradient value.

We need to provide the cost function and the derivation of each parameter when using it. we implement the costfunction ourselves and pass in the response parameter. We can return the following two values at a time:

For example, call the fminunc () function and use @ to input the pointer to the costfunction function. For the initialized Theta, you can also add options (gradobj = on indicates "Open the gradient target parameter ", that is, we will provide gradient parameters for this function ):

6.7 multi-category classification: One-to-multiple multiclass classification _ One-vs-all

Reference video: 6-7-multiclass classification _ One-vs-All (6 min 2.16.mkv

In multiclass classification problems, Y has the possible values {0, 1... n} in total N + 1. Method:

(1) split into n + 1 binary classification problem.

(2) An h (x) value is predicted for each category. It indicates the possibility that Y is of this type.

(3) The final result is the most likely type.

Related Terms

Demo-boundary decision Boundary
Loophole Vulnerability
Nonlinear Non-linear
Penalize makes negative

[Original] Andrew Ng Stanford Machine Learning (6) -- lecture 6_logistic Regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.