Introduction
The Machine learning section records Some of the notes I've learned about the learning process, including linear regression, logistic regression, Softmax regression, neural networks, and SVM, and the main learning data from Standford Andrew Ms Ng's tutorials in Coursera and online courses such as UFLDL Tutorial,stanford cs231n and Tutorial, as well as a large number of online related materials (listed later).
Preface
This article mainly introduces the basic knowledge of logistic regression, the article section is arranged as follows:
1) Logistic regression definition
2) hypothetical function (hypothesis functions)
3) Decision boundaries (decision boundary)
4) Cost function
5) Optimization method
Logistic regression definition
In simple terms,
Logistic Regression is a machine learning method for solving the two classification (0 or 1) problem, which is used to estimate the likelihood of something . For example, the likelihood of a user buying a product, the likelihood of a patient suffering from a disease, and the likelihood that an ad is clicked by a user.
Note that the "probability" is used here, rather than the mathematical "probabilities", the result of the LOGISITC regression is not a probability value in the mathematical definition, and cannot be used directly as a probability value. This result is often used for weighted summation with other eigenvalues rather than direct multiplication.
So what is the relationship between logistic regression and linear regression?
Logistic regression (logistic Regression) and linear regression (Linear Regression) are both generalized linear models (generalized Linear model). The logistic regression assumes that the dependent variable y obeys the Bernoulli distribution, whereas the linear regression assumes that the dependent variable y obeys the Gaussian distribution.
Therefore, there are many similarities with linear regression, and the algorithm is a linear regression if the sigmoid mapping function is removed. It can be said that the logistic regression is supported by the theory of linear regression, but the logistic regression has introduced the nonlinear factor through the sigmoid function, so it is easy to deal with the 0/1 classification problem.
Any algorithm in machine learning has a mathematical basis, with different assumptions and corresponding constraints. Therefore, if you want to learn more about machine learning algorithms, you must pick up math textbooks, including statistics, probabilities, calculus and so on.
hypothetical function (hypothesis functions)
The hypothetical function of logistic regression is as follows:
This function is called the sigmoid function, also known as the logical function (logistic functions), and its function curve is as follows:
You can see that the sigmoid function is an S-shaped curve, and its value is between [0, 1], where the value of the function away from 0 will quickly close to 0/1. This nature allows us to interpret it in a probabilistic way.
A machine learning model, in fact, is to limit the decision-making function to a certain set of conditions, and this set of conditions determines the hypothetical space of the model. Of course, we also hope that this set of qualifications is simple and reasonable. The assumptions made by the logistic regression model are:
The G (h) Here is the sigmoid function mentioned above, and the corresponding decision function is:
Select 0.5 as a threshold is a general practice, the actual application of the specific circumstances can choose different thresholds, if the accuracy of the positive example of high requirements, you can choose a larger threshold, the recall requirements for a positive case is high, you can choose a smaller threshold value.
decision boundaries (decision boundary)
Decision boundaries, also known as decision polygons, are planes or surfaces used to separate different categories of samples in n-dimensional space.
First look at the two images from the Andrew Ng teacher's course:
Linear decision Boundaries:
Decision boundaries:
Nonlinear decision boundaries:
Decision boundaries:
The above two graphs clearly explain what a decision boundary is, the decision boundary is actually an equation , and in logistic regression, the decision boundary is defined by Theta ' x=0.
It is important to understand the difference and relation between the assumption function and the decision boundary function. A decision boundary is a property of a hypothetical function, which is determined by the parameters of the assumed function.
In logistic regression, it is assumed that the function (h=g (z)) is used to calculate the probability that the sample belongs to a category; The decision function (h=1 (z) >0.5) is used to calculate (give) the category of the sample; decision Boundary (θ^tx= 0) is an equation that identifies the classification boundary of a classification function (model).
cost function
The cost function in linear regression:
The cost function in linear regression seems to be well understood, but it cannot be used for logistic regression for the following reasons:
If we use this generation of values, J (θ) becomes a non-convex function of the parameter θ, because in logistic regression, H (θ) is a sigmoid function with the following curves:
The function is a non-convex function with many local optimal values. If you use the gradient descent method on a function like this one, there is no guarantee that it will converge to the global minimum value.
Accordingly we want our cost function j (θ) to be a convex function, which is a single-bow function, as follows:
If the gradient descent method is used, we can ensure that the gradient descent method converges to the global minimum of the function.
Since h (θ) is a sigmoid function that causes j (θ) to become a non-convex function, we need to find a different cost function, which is a convex function, which allows us to use a good algorithm, such as gradient descent method, and to ensure that the global minimum value is found.
Therefore, we use the following form to calculate the generation value of the sample:
Cost functions in Logistic regression:
Additional information: extremum and optimization issues
The so-called extremum, simply stated, refers to the maximum amount (or minimum) of a group of similar quantities. The research on the extremum problem has always been regarded as a fascinating subject. Polya said: "Although everyone has his own problems, we can note that most of these problems are of great or minimal concern." We always want to achieve a goal at the lowest possible cost, or to try to achieve the greatest possible result, or to do the most work in a certain amount of time, but we also want to take the minimum risk. I believe that the mathematics of the question of great and small, the reason why we are interested, because it can make our daily problems idealized. "Polya," Mathematics and Conjecture ", Volume One, page 133th we will see that many practical problems and mathematical problems can be attributed to all kinds of extreme problems before they can be solved uniformly.
Optimization Method
In logistic regression, the cost function is still optimized using the gradient descent method, the complete form is as follows:
Attention:
In the case of logistic regression and linear regression, the form of the gradient descent algorithm appears to be consistent (the rules that update the parameters look basically the same), but in fact the two are completely different, because assuming the functions are different, you need to pay special attention to this.
Its vectorization implementation (Vectorized implementation) is as follows:
[Machine Learning] Coursera ml notes-Logistic regression (logistic Regression)