Lecture6 logistic regression Logistic Regression
6.1 classification problem Classification
6.2 assume that hypothesis Representation
6.3 decision Boundary
6.4 cost function
6.5 simplified cost functions and gradient descent simplified cost functions and Gradient Descent
6.6 advanced Optimization
6.7 multi-category classification: One-to-multiple multiclass classification _ One-vs-all
Although a logistic regression name contains a regression, it is not a regression algorithm. It is a very powerful classification algorithm that may even be the most widely used in the world.
Feature ScalingIt is also applicable to logistic regression.
6.1 classification problem Classification
Reference video: 6-1-classification (8 min 2.16.mkv
Binary classificationBinary classification ProblemDefinition:
6.2 assume that hypothesis Representation
Reference video: 6-2-hypothesis representation (7 min 2.16.mkv
Introduce a new model: Logistic regression. The output variable ranges from 0 to 1. The assumption of the logistic regression model is:
?? Represents a feature vector;
?? The logistic function is also called the sigmoid function. The curve is as follows:
Given the input variable X, the probability of Y = 1 is given based on the selected parameter H (X. The probability of Y = 0 is 1-h (x)
6.3 decision Boundary
Reference video: 6-3-demo-boundary (15 min ).mkv
The decision boundary is the boundary between prediction 1 and prediction 0. TheDemo-boundaryIs the line that separates the area where Y = 0 and where Y = 1. It is created by our hypothesis function.
Linear decision boundaries:
Non-linear decision boundaries:
6.4 cost function
Reference video: 6-4-cost function (11 min 2.16.mkv
If we use the cost function in linear regression, it will lead to J (??) It is not a convex function, leading to many local optimal solutions.
To fit the parameters of the logistic regression model ??, The cost function is as follows:
Based on the above formula, when the prediction is the same as the actual price, the price is 0, and vice versa, the price is infinite.
When y = 1, the curves of h (x) and J (j) are as follows:
When y = 0, the curves of h (x) and J (j) are as follows:
There are the following rules:
Cost (H θ (x), y) = 0 if H θ (x) = y
Cost (H θ (x), Y) → ∞ if y = 0 and H θ (x) → 1
Cost (H θ (x), Y) → ∞ if y = 1 and H θ (x) → 0
6.5 simplified cost functions and Gradient Descent
Reference video: 6-5-simplified cost function and gradient descent (10 min 2.16.mkv
The preceding two formulas are simplified to the following formula (when y is equal to 0 or 1, only one of the two formulas is left ):
The complete cost functions are as follows:
A vector is implemented as follows:
The gradient descent process is as follows:
Use mathematical methods to deduce the derivative of J (Cosine) in the upper formula:
Import the update algorithm to obtain the following algorithm:
The gradient descent algorithm above looks the same as linear regression, but in fact it is completely different. Because h (x) is a linear function, h (x) in logistic regression is defined as follows:
The vectoring implementation of a gradient descent is as follows:
6.6 advanced Optimization
Reference video: 6-6-advanced optimization (14 min 2.16.mkv
In addition to gradient descent algorithms, there are also some algorithms that are often used to minimize the cost of functions. These algorithms are more complex and superior, and generally do not require manual learning rate, which is faster than gradient descent algorithms. These include:Bounded gradient(Conjugate gradient ),Local Optimization Method(Broyden Fletcher Goldfarb shann, BFGS) andLimited Memory Local Optimization Method(Lbfgs ).
These algorithms have an intelligent internal loop called line search, which can automatically try different learning rates. You only need to provide these algorithms with methods for calculating derivative items and cost functions to return results. It is applicable to large-scale machine learning problems.
They are too complex to implement by themselves, but rather call the MATLAB method. For example, an unlimited minimum value function fminunc. It uses one of the many advanced optimization algorithms, just like the Gradient Descent Method of the enhanced version, which automatically selects the learning rate and finds the Optimal Gradient value.
We need to provide the cost function and the derivation of each parameter when using it. we implement the costfunction ourselves and pass in the response parameter. We can return the following two values at a time:
For example, call the fminunc () function and use @ to input the pointer to the costfunction function. For the initialized Theta, you can also add options (gradobj = on indicates "Open the gradient target parameter ", that is, we will provide gradient parameters for this function ):
6.7 multi-category classification: One-to-multiple multiclass classification _ One-vs-all
Reference video: 6-7-multiclass classification _ One-vs-All (6 min 2.16.mkv
In multiclass classification problems, Y has the possible values {0, 1... n} in total N + 1. Method:
(1) split into n + 1 binary classification problem.
(2) An h (x) value is predicted for each category. It indicates the possibility that Y is of this type.
(3) The final result is the most likely type.
Related Terms
Demo-boundary decision Boundary
Loophole Vulnerability
Nonlinear Non-linear
Penalize makes negative
[Original] Andrew Ng Stanford Machine Learning (6) -- lecture 6_logistic Regression