Coursera Open Class Machine Learning: Linear Regression with multiple variables

Source: Internet
Author: User
Multiple features

In fact, we all know that when choosing a house, we need to consider not only the area, but also the area, structure, age, and neighborhood relationship, as mentioned above, we can't help but talk about housing prices with area alone.

Consider more features

We have added some features to consider the price issue:

Symbol Interpretation

$ N $: number of features

$ X ^ {(I)} $: Enter the training data at $ I $.

$ X ^ {(I)} _ j $: $ I $ training data $ J $ features

$ H _ \ theta (x) $

Correspondingly, $ H _ \ theta (x) $ is changed:

$ H _ \ theta (X) = \ Theta _ 0 + \ Theta _ 1 X _ 1 + \ Theta _ 2 X _ 2 + \ cdots + \ Theta _ n X _ N $

In combination with the vector content in the previous section, add $ X _ 0 = 1 $ to get the following result:

$ \ Matrix {x }=\begin {bmatrix} X _ 0 \ CRX _ 1 \ Cr \ vdots \ CRX _ n \ Cr \ end {bmatrix} \ matrix {\ Theta }=\ begin {bmatrix} \ Theta _ 0 \ Cr \ Theta _ 1 \ Cr \ vdots \ Cr \ Theta _ n \ Cr \ end {bmatrix} $

$ H _ \ theta (x) = \ matrix {\ Theta} ^ t \ matrix {x} $

$ X _ 0 = 1 $

Cost functions

$ J (\ matrix {\ Theta}) = J (\ Theta _ 0, \ Theta _ 1, \ cdots, \ Theta _ n) = \ frac {1} {2 m} \ sum _ {I = 1} ^ m {(H _ \ theta (x ^ {(I )}) -y ^ {(I)}) ^ 2} $

Multi-variable Gradient Descent

As described above, repeat the following formula until convergence:

$ \ Theta _ j = \ Theta _ j-\ Alpha \ frac {\ partial} {\ partial \ Theta _ j} J (\ Theta _ 0, \ Theta _ 1, \ cdots, \ Theta _ n) $

Calculate the differential equation, namely:

$ \ Theta _ j = \ Theta _ j-\ Alpha \ frac {1} {m} \ sum _ {I = 1} ^ m {(H _ \ theta (x ^ {(I )}) -y ^ {(I)}) x ^ {(I)} _ j} $

Feature value Normalization

In actual operation, it is inevitable that there will be a large number of feature values, such as 1 ~ 10000000. If you want to deal with it at this time, the computation is large, which is not very conducive to the actual operation, so narrowing down the scope is a good choice.

Normalization restricts data processing to a specific range to facilitate subsequent processing and accelerate program convergence.

Mean Normalization

$ X _ I =\frac {X _ I-\ bar {x }}{\ Sigma} $

$ \ Sigma = \ SQRT {\ frac {1} {n} \ sum _ {I = 1} ^ n {(X _ I-\ bar {x }) ^ 2 }}$

Step Size Selection

There is no difference in the step size selection between single variables and multi-variables. It cannot be too large or too small. For the reason, you can refer to the previous article, which is actually the same.

Polynomial Regression Merge some features

If the House provides two dimensions: length and width to predict house prices, we all know that the two variables are more difficult to process than single variables. In fact, we can use area to replace length and width at this time, in this way, there is only one variable, which is much easier to process.

Polynomial

Sometimes linear regression cannot fit a given sample well, for example, house price prediction:

$ H _ \ theta (x) = \ Theta _ 0 + \ Theta _ 1 (size) + \ Theta _ 2 {(size )} ^ 2 + \ Theta _ 3 {(size)} ^ 3 $

Maybe this formula can better fit the sample. At this time, there are square and cubic characters in the formula, which is not linear.

In fact, it can be converted to linear:

$ X _ 1 = (size) $

$ X _ 2 = {(size)} ^ 2 $

$ X _ 3 = {(size)} ^ 3 $

This is a familiar multi-variable regression.

The root number can also be selected based on the actual situation.

Regular Equation

In addition to Iteration Methods, linear algebra can be used to directly calculate $ \ matrix {\ Theta} $.

For example, four groups of property price forecasts:

Least Squares

$ \ Theta = (\ matrix {x} ^ t \ matrix {x}) ^ {-1} \ matrix {x} ^ t \ matrix {y} $

Gradient Descent, advantages and disadvantages of regular equations Gradient Descent:
  • Desired stride $ \ Alpha $;
  • Multiple iterations are required;
  • Even if $ N $ is large, it works well.
Regular equation:
  • You do not need to select $ \ Alpha $;
  • Iteration is not required;
  • Calculate $ (\ matrix {x} ^ t \ matrix {x}) ^ {-1} $, time complexity $ O (N ^ 3) $;
  • If $ N $ is large, the operation is very slow.
Irreversible Processing Method of regular Equation Matrix

$ \ Matrix {x} ^ t \ matrix {x} $ irreversible processing method:

  • Remove redundancy;

  • Delete excessive features.

References

This article mainly references the following materials:

  • Andrew Ng, machine learning
  • Coursera Open Course Notes: Stanford Machine Learning Course 4 "linear regression with multiple variables )"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.