[Note] linear regression & Gradient Descent

Source: Internet
Author: User
I. Summary

Linear Regression Algorithms are a type of supervised learning algorithm used for Numerical Prediction of continuous functions.

After preliminary modeling, the process determines the model parameters through the training set to obtain the final prediction function. Then, the predicted value can be obtained by inputting the independent variable.

Ii. Basic Process

1. Preliminary modeling. Determine the hypothesis FunctionH (x)(Used for final prediction)

2. Create value functionsJ (θ)(Also called the target function and loss function). Evaluate the parameters.θUse)

3. Evaluate Parametersθ. Evaluate the deviation (gradient) of the value function, and then use the gradient descent algorithm to obtain the final θ value.

4. Place the θ value of the parameter into the hypothetical function.

Iii. Agreed symbols

X: Independent variable, I .e. the feature value

Y: Dependent variable, that is, the result

H (x): Assume the function

J (θ): Value Functions

N: Number of independent variables, I .e. the number of feature values

M: Number of data entries in the training set

α: Learning rate, that is, the step size when the gradient drops

Iv. Specific process 1. Initial Modeling

Create a hypothetical function based on the data characteristics of the training set. Here we create the following basic linear function:

One thing to note here is the last step of the formula. If you organize parameters and independent variables of each group into a matrix, you can also use the matrix calculation method.

When it comes to matrices, column matrices are used by default. Here, we first transpose the parameter matrix to the row matrix, and then we can perform inner product with the X column matrix to get the result.

2. Create value functions

To make our hypothesis function better fit the actual situation, we can use the least squares (LMS) to create a value function and then substitute the training set data, then try to make all the results of the function as small as possible.

Here, the first 1/2 is specifically added for the convenience of the subsequent computation.

How can we keep the value as small as possible?

The method is to find the partial derivative of the Value Function on θ and then use the gradient descent algorithm to converge.

3. Evaluate Parameters

The gradient descent algorithm is used to continuously correct the values of θ until convergence.

To determine whether to converge, use the following method:

1. Until the θ value is no longer obvious (the difference value is smaller than a given value );

2. Place the training set into the hypothetical function to judge the changes of the sum of all the hypothetical function values until the changes are no longer obvious;

3. Of course, in special cases, you can also forcibly limit the number of iterations (the maximum number of cycles can be used to prevent endless loops ).

The formula used in the iteration is as follows:

α is the learning rate, that is, the step size of each iteration. If it is too small, the iteration speed is too slow. If it is too large, the expected minimum value cannot be effectively found because the span is too large. This value needs to be adjusted according to the actual situation.

If you want to implement it using code, a bunch of derivatives may not work, so you need to first find the rightmost partial derivative and substitute the value function:

In fact, the number of parameters is the same as the number of independent variables, that is, j = n.

So the gradient descent formula becomes like this:

If it is substituted into the training set, the formula is:

(The tip number above Theta on the left of the formula indicates that this is the expected value)

This gets the legendaryBatch Gradient DescentAlgorithm.

However, this algorithm will be very inefficient when the training set is very large, because the data of the entire training set should be substituted into the calculation every time the theta value is updated.

Therefore, improvements are usually used in actual situations.Incremental gradient descent (random gradient descent)Algorithm:

This algorithm only uses a group of data in the training set for each update.

In this way, although the optimal descent route for each iteration is sacrificed, the overall route is still declining, but it is only a detour in the middle, but the efficiency is greatly improved.

4. Obtain the final prediction function

Place the final θ value into the initial hypothetical function, that is, the final prediction function.

V. Code Implementation

Incremental gradient descent, Python code implementation:

#-*-Coding: UTF-8-*-"incremental gradient descent y = 1 + 0.5x" Import sys # training dataset # independent variable X (x0, X1) X = [(1, 1.15), (1, 1.9), (1, 3.06), (1, 4.66), (1, 6.84), (1, 7.95)] # assume that the function h (x) = theta0 * X [0] + theta1 * X [1] # Y is the true function value of the ideal Theta value Y = [1.37, 2.4, 3.02, 3.06, 4.22, 5.42] # Two termination conditions loop_max = 10000 # maximum number of iterations Epsilon = 0.0001 # convergence accuracy alpha = 0.005 # Step Size diff = 0 # the gap between the current value and the ideal value during each test error0 = 0 # sum of the last target function value error1 = 0 # sum of the current target function value m = Len (X) # Number of training data entries # init the parameters to zerotheta = [0, 0] Count = 0 finish = 0 while count <loop_max: Count + = 1 # traverse the training dataset, continuously update the theta value for I in range (m): # input the training set and calculate the assumed function value h (x) diff = (theta [0] + Theta [1] * X [I] [1])-y [I] # evaluate the parameter Theta, incremental gradient descent algorithm, use only one set of training data at a time. Theta [0] = Theta [0]-Alpha * diff * X [I] [0] Theta [1] = Theta [1]-Alpha * diff * X [I] [1] # at this time, the training set has been traversed, the value of Theta at this time is obtained # determine whether if ABS (theta [0]-error0) has been converged <epsilon and ABS (theta [1]-error1) <Epsilon: Print 'theta: [% F, % F] '% (theta [0], theta [1]), 'error1:', error1 finish = 1 else: error0, error1 = Theta if finish: breakprint 'finish count: % s' % count

The final result is:

Theta: [1.066522, 0.515434] error1: 0.515449819658
Finish count: 564

Ideal Value θ 1 =1, θ 2 =0.5

The calculated value is θ 1 =1.066522, θ 2 =0.515434

Iterated564Times

Note that you need to adjust it repeatedly in the actual process.Convergence precisionAndLearning RateThese two parameters can produce satisfactory convergence results!

In addition, if the function has a local extreme value problem, you can randomly initialize θ multiple times, find multiple results, and then find the optimal solution.

Regular Equation MethodI will not discuss it here ~

-- EOF --

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.