Machine learning Algorithms Interview-Dictation (5): Regression

Source: Internet
Author: User

This series is to deal with the job interview when the interviewer asked the algorithm, so just also thanks to the brief introduction of the algorithm, the latter will be supplemented in the

Algorithm of Common polygon problems.

I. Logistic regression

First, the logistic regression, which is based on the existing data on the classification of the boundary of the regression formula, to classify. The calculation cost is not high, it is easy

Realization and understanding, but easy to be fit, classification accuracy is not too high;

Logistic regression can be considered as a probability estimate, using the Sigmioid function,

By training the data to train the parameters [W1, W2, ..., WN], the values of h (x) can be calculated according to the parameters of the training, comparing their

With a size of 0.5 (greater than one category, less than another). The key question now is how to

Training data to get training parameters?

1. Using gradient Rise method

The best way to find the maximum value of a function is to look in the gradient direction of the function.

Just start to give the same weight, and then according to the results and known labels to calculate the training error, and then update the weights, and constantly iterate

until stop if certain conditions are met. But this place has a point to take care of--this method only tries with some very small

Data Set, because the entire data set is used when the weights are updated, this can make the speed much slower.


2. There is an improved method for the above problem-the stochastic gradient rise method, unlike the gradient rise method, uses the entire

Data Set Update data, the random gradient rise method uses a current data point to update the weights, so that the basic

does not involve vectors on the operation, the preceding H (x) and error errors are a vector, but the random gradient rise method

Both of these are converted from vectors to A numeric value, the speed of a large increase. However, since it is chosen as a random sample

this point to update the data, so sometimes this There are local fluctuations in the method, which can affect the results of precision.


3, for the above fluctuation problem, can be improved by two steps: first, each iteration of the time to change the step A, two is

using random samples to update the weight value.


Second, linear regression

The goal of regression is to predict the target value of numerical type, the simplest method is to calculate the target's formula according to training data, and linear regression is to use a straight line to fit the data to achieve the prediction of the target value.

The goal is to find the front regression coefficients


Using least squares to get results


Of course there is an inverse process in the result, it is necessary to verify that the inverse exists!


Third, local linear weighted regression

One of the more serious problems with linear regression is under-fitting (because it asks for unbiased estimation of the minimum mean squared error) and can be used to improve the problem by using local linear weighted regression-giving a certain weight to each point near the predicted point

The results obtained are:


This approach increases the precision of the fitting, but it also increases the number of calculations, each of which is distance from all training samples.


The above two or three can be done in the case of inverse existence, but what if the characteristics of the data are more than the sample points, because the inverse is not present at this time? You can use the ridge regression method to solve this problem, that is, it will be converted to, the other and the previous approach is similar.

Of course, there is a method called forward stepwise regression, it is through each step to a certain weight increase or decrease a small value, and then recalculate W and error, if the error is smaller, then update W.


For a detailed introduction to regression, refer to: http://www.cnblogs.com/jerrylead/archive/2011/03/05/1971867.html

By the way, csdn under the trough, after editing has not seen the back, a line sometimes can only write half of the place ...

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Machine learning Algorithms Interview-Dictation (5): Regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.