Stanford Machine Learning Open Course Notes (III)-logical Regression

Source: Internet
Author: User

Public Course address:Https://class.coursera.org/ml-003/class/index 

INSTRUCTOR:Andrew Ng

1. Classification ( Category )

Consider a system for predicting patient tumors that can determine whether a patient's tumor is benign or malignant. We can use a valueYε{0, 1}WhenYIs0Is benign,YIs1Is malignant. We collected8Sample data that shows the size and nature of the tumor. The points on the plane are shown as follows:


If we use data for linear regression to obtain a straight line, we can determine a threshold-0.5., We canMalignantEqual5Point Projection, think the tumor size on the left side of the Blue LineY = 0 (Benign), On the rightY = 1 (Vicious)And can also be written:


However, if a sample data is added, the result may be as follows:

It can be seen that, because of the newly added sample points, the linear regression produces a straight line offset to the right. If we still0.5As a threshold value, the point projected to the left is considered benign, and the right side is regarded as malignant, which leads to an error. This is because not all vertices are satisfied.H (x)≥0.5HourY = 1,H (x) <1, 0.5HourY = 0The two vertices marked in the green box are the inverse examples. It seems that linear regression is no longer applicable in this example. We need to introduce more complex functions. The Logistic regression model is the most common function model with better robustness, suitable for multiple data distributions.

2 , Hypothesis representation ( Hypothesis )

Consider the above patient's tumor judgment problem. If the probability is used to indicate that it is not prone to errors? We can useH (x)To indicate the probability that the patient's tumor is malignant. The probability is greater0.5It may be malignant. Otherwise, it may be benign. Therefore, no sample data is added to the preceding table, leading to classification errors. Therefore, we needH (x)Limited in[0, 1]Within the range, this can useSigmoidFunction constraints:


As shown on the right, the above is Sigmoid Function expression. Sigmoid Coordinate representation of the function. It can be found that the function value is in the range of positive and negative infinity. (0, 1) Interval, and 0.5 Differentiation Z When the values are positive and negative, this is no longer caused by the addition of sample points. H (x) This changes the situation where misjudgment occurs. Here, we add that X And Theta After, Y Fetch 0 And 1 The probability and should be 1 :


3. demo-boundary ( Decision edge )

According to previous assumptions,H (x)≥0.5HourY = 1OtherwiseY = 0But hereH (x)YesSigmoidFunctionH (x), That is:


In this case, you only need to determineTheta'* XAnd0.

For example, assume thatH (x) = g (θ 0 + θ 1x1 + θ 2x2), Parameterθ = [-3, 1, 1] T, Then judgeYValue only needs to be determinedTheta'* XThe value is as follows:


If-3 + X1 + x2≥0, ThenY = 1OtherwiseY = 0.InX1, x2On the plane of the coordinate system,Y = 0AndY = 1TakeX1 + X2 = 3This line serves as the dividing line, that is, the meaning of decision edge. Of course, the decision edge can also be non-linear, as shown below:


We can see the shape andH (x)There is a lot to do with the form.

4. Cost Function ( Cost functions )

The next round of the cost function is now targeted at the newH (x).Theta, The sub-form is consistent:


Because of the introductionSigmoidFunction andYOnly0,1So the form of the cost function must be changed:


Analysis shows that whenY = 1,H (x) = 1The price is0And ifH (x) = 0The cost will be infinite, such:


Y = 0.

5. simplified cost function and gradient descent ( Simplified cost functions and Gradient Descent )

In the above section, we divide the cost functions into two types. In fact, we can combine a function:

After this modification, we can still use gradient descent to solve the problem.Theta.


This process is exactly the same as linear regression.

6. Advanced optimization ( Advanced Optimization )

Correspond to the minimum value of the cost function. Like linear regression, we not only drop the gradient, but also some methods in the left-side brackets. The advantages and disadvantages of these methods are as follows:

The specific problem is analyzed based on the method used.

7. multi-class classification: One-vs-All ( Multiclass classification: One-to-multiple )

Sometimes the problem is not as simple as determining whether a patient's tumor is malignant or benign. For example, determining whether the weather is sunny, cloudy, raining, Or snowing is necessary. We can use a line to separate binary classification. What about multiclass classification?

There is a simple method, that is, to separate only one category at a time. There are several categories to construct several decision edge, that is, severalH (x):


In the above example, we need to introduce three logistic regression classification functions to differentiate the three types.H (x)When determining the category of a sample point, we only need to calculate threeH (x)Select the class in which the value is the largest, and classify the sample points into the class. The reason is thatH (x)IndicatesXThe probability of a class. Since a class has the highest probability, it is more likely to belong to this class.

-----------------------------Weak split line-------------------------------

In this lecture, I think the most important thing isSigmoidFunction. First, we can use the example of Tumor Diagnosis to discover the problem of binary classification that can be expressed by probability,SigmoidTherefore, functions are introduced. Of course,SigmoidFunctions are not only used for logistic regression, but also for other important purposes. In the Machine Learning Group, we recommend that you read the instructor's description.SigmoidFunctionArticle, The link is as follows: http://www.guzili.com /? P = 45195


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.