Patterns Recognition (Pattern recognition) Learning notes (31)--Linear regression

Source: Internet
Author: User

1. Supervised learning

Regression algorithms are often used in supervised learning algorithms, so before speaking about regression, the first to say that supervised learning.

We have learned a lot of classifier design methods, such as Perceptron, SVM, and so on, their common feature is that according to a given class label samples, training learning machine, and then enable the machine to the new non-tagged samples of the correct classification, like this is supervised pattern recognition, For learning machines, it is supervised learning.

Give me a chestnut, take Wunda. The most favorite example of predicting house prices, the graphical representation of the supervised learning process is as follows:


In the price pest, because we want to return or predict the variable (house price) is continuous, so we call this learning problem regression problem, conversely, if we predict the target variable value is a few discrete values, is the classification problem.

2. What is linear regression

Sometimes when we look at an object property, we find that there is a linear or approximate linear relationship between one or several attribute variables, such as the speed and number of wheels of a car, the weight and height of a person, and so on, which describes the relationship between age and blood pressure:


Can be approximated as a linear relationship, the graph can be seen there are many sets of sample data points, and then we based on these useful data points to find out the relationship between blood pressure and age (on the way with a straight-line bid, approximate to the function), so like this use known quantity stronghold to solve the linear relationship between the various attribute variables, is called linear regression.

It only investigates the influence of age on blood pressure, the linear regression of single variable is called linear regression, which is relatively simple, but in the professional medical knowledge, there are some other factors, such as fat and thin, that have some relationship with blood pressure. This linear regression, which has a certain relationship between the variables we care about and a number of independent variables, is called multivariate linear regression, and the formal representation is:


Where y is the variable to be returned, ... is a characteristic attribute of an argument with the existence of a relationship, ... is the corresponding regression coefficient, is the residual of the regression (that is, using the linear function of x to estimate the error generated by Y), is a constant term;

Of course, the above self-variable features can be either continuous or discrete.

3. Solution of Linear regression

Next, in order to illustrate the specific knowledge of linear regression, we still take the example of housing price forecast to illustrate.

Now, we have the relationship data between the house size and the housing price, as in the following table:


Therefore, we have two characteristics, namely: House size and number of rooms; In general, the supervision of learning problems depends on the quality of the selected features, the specific knowledge of feature selection, follow-up will learn.


Supervised learning, we need to create a hypothetical function h, how to do it? According to the above, if there is a certain approximation between the house size and the number of rooms, then we can approximate the estimated price by using the linear function h (x) of the following x:

(1)

Among them, and is the regression coefficient, which is used to parameterize the mapping space of the feature X to the regression variable y;

The deformation of the upper form can be written as a vector product:

(2)

where n is the characteristic number;

With a hypothetical function model and a known training data set, what do you do next? Obviously, to use these training data sets to estimate the regression coefficients in the hypothesis function, it is reasonable to learn that the assumption that H (x) is close to Y should be measured by defining a loss function:

(3)

This formula should be familiar to everyone, in the solution of linear regression, the most basic method is to use the least squares, that is, the sum of the squares of each sample residuals (that is) to achieve the minimum value, in order to minimize, we can also use gradient descent (of course, there are other methods) to solve:

1) The initial value given first;

2) Use the following formula to update:

Which is the correction step, in machine learning is called the learning rate;

Note: Each iteration fix is updated for all, and the next iteration is not started until the update is complete.



On the left is a dollar, the right is plural;

4. Reasonable selection of learning rate

If the learning rate is too large, the pace is too large, the loss of function changes will not fall every iteration, and finally lead to the inability to converge, if the learning rate is too small, the pace is too small, resulting in slow process, the final convergence rate is very slow.

According to the Wunda teacher in the Stanford ML course, it is best to choose the learning rate:


Meaning is, if used 0.001, convergence is too slow, that increases 10 times times the use of 0.01, if 0.01 is not, then 3 times times, that is 0.003;

5. Standardization of features

This question has been mentioned before, because this is very easy for beginners to ignore the place, so pay attention to the value range of the self-variable feature, the argument is too large or too small will cause the function saturation, can not distinguish between the changes in the argument, and if a too large feature as input, will be prone to this feature in the classification does not play any role in the phenomenon, although we can use a small weight to achieve correction, but this will lead to the difference between the weight of the obvious, also not conducive to learning process convergence, so in order to avoid this situation, it is necessary to carry out the characteristics of standardization, The most common practice is to use the mean and variance to make the feature normalized:


Among them, the mean value, is the variance;

This is true for the example of predicting house prices:




This article refers to the video of the Wunda teacher with the Stanford ML course;



Patterns Recognition (Pattern recognition) Learning notes (31)--Linear regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.