Multiple linear regression Exercises

Source: Internet
Author: User

Data

Data download: Ex3data.zip. Students who have seen Stanford's Wunda Teacher's machine learning program know a data set on house size and house price, the data set we use this time is the 47 sample data, X represents the size of the house and the number of bedrooms, and y represents the price of the house. Our goal is to predict the house size of 1650, and the bedroom has 3 house prices.

Data preprocessing

The data is added to the X0 entry, which initializes the data to 1

x = [Ones (M, 1), X];

The difference of the size of the data has great influence on the gradient descent algorithm, and we should deal with the data in a map. The method is to first seek the standard deviation and the average value, then the data minus the mean and then divided by the standard deviation.

Sigma = STD (x);% seek standard deviation mu = mean (x);  % averaging x (:, 2) = (X (:, 2)-mu (2))./Sigma (2); % of the second column of house area processed X (:, 3) = (X (:, 3)-mu (3))./Sigma (3); % of third-column house bedroom number to be processed

Gradient Descent

The relevant formula for gradient descent can be referenced in the previous blog post, which is not repeated here.

the choice of learning rate

The choice of learning rate is generally



The change of function value of loss function is observed by running gradient descent method to adjust the learning rate so as to find the best learning rate.

theta = Zeros (Size (x (1,:))) ';% initialize fitting Parametersalpha = percent of Your initial learning rate%%j = zeros (1); for Num_iterations = 1:50    J (num_iterations) = percent Calculate your cost function here percent    theta = percent of Result of gradient D  Escent Update%%end% Now plot j% technically, the first J-starts at the Zero-eth iteration% but Matlab/octave doesn ' t has A zero Indexfigure;plot (0:49, J (1:50), '-') xlabel (' Number of iterations ') ylabel (' Cost J ')

If you choose a learning rate within a better range, you should see a graph like the one shown.

If you get a graph that is very different from it, you should try to adjust your learning rate.

Of course, you can also use normal equations to solve the problem.

Below we give the main program code:


Method One: Gradient descent method x = Load (' Ex3x.dat '); y = load (' Ex3y.dat '); x = [Ones (Size (x,1), 1) X];meanx = mean (x);% mean Sigmax = STD (x);% for standard deviation X (:, 2) = (X (:, 2)-meanx (2))./sigmax (2); X (:, 3) = (X (:, 3)-meanx (3))./sigmax (3); figureitera_num = 100; The number of iterations of the% attempt sample_num = size (x,1); % of training samples alpha = [0.01, 0.03, 0.1, 0.3, 1, 1.3];% because almost every 3 times times the learning rate is chosen to test, so directly enumerate out Plotstyle = {' B ', ' R ', ' G ', ' k ', ' b--', ' r--'} ; theta_grad_descent = zeros (Size (x (1,:)))  for alpha_i = 1:length (alpha)% try to see which learning rate is best     theta = zeros ( Size (x,2), 1); The initial value of%theta is assigned to 0    Jtheta = Zeros (itera_num, 1);  for i = 1:itera_num% calculates the parameter after the number of iterations itera_num for a learning rate alpha &NB Sp             Jtheta (i) = (1/(2*sample_num)). * (x*theta-y) ' * (x*theta-y);%jtheta is a line vector & nbsp Grad = (1/sample_num). *x ' * (x*theta-y);  theta = Theta-alpha (alpha_i) .*grad;  end  plot (0:49, Jtheta ( 1:50), char (Plotstyle (alpha_i)), ' LineWidth ', 2)% be sure to convert by Char function     hold on  if (1 = = Alpha (alpha_i))% The experiment found that Alpha was1 o'clock the effect is best, then the theta value after the iteration is the desired value         theta_grad_descent = theta  endend legend (' 0.01 ', ' 0.03 ', ' 0.1 ', ' 0.3 ', ' 1 ', ' 1.3 '); Xlabel (' Number of iterations ') ylabel (' cost function ')% below is the predictive formula Price_grad_descend = Theta_grad_descent ' *[1 (1650-meanx (2))/sigmax (2) (3-meanx (3)/sigmax (3))] '  %% method two: normal EQUATIONSX = Load (' Ex3x.dat '); y = load (' Ex3y.dat '); x = [Ones (Size (x,1), 1) x];theta_norequ = INV ((x ' *x)) *x ' *yprice_norequ = theta_norequ ' *[ 1 1650 3] '


The curves obtained by gradient descent method in different learning rates are as follows:


The graph shows that when the learning rate is 1 o'clock, the convergence is the fastest, so we choose the learning rate of 1, the parameters obtained and the results of the prediction data are as follows

when we use the normal equations to solve the problem, the parameters and the predicted result data are as follows:


The calculation result is slightly error, but the difference is not big, basically 290,000 points.

Reference article:

Http://www.cnblogs.com/tornadomeet/archive/2013/03/15/2962116.html

http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex3/ Ex3.html

Multiple linear regression Exercises

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.