Introduction to Machine learning (11)--multi-feature gradient descent algorithm

Source: Internet
Author: User


definition

Multivariate hypothesis: The output is determined by multidimensional input, that is, the input is a multidimensional feature. For the previous case of the housing price forecast, assume that the variable from the house area, increased to the number of bedrooms, the number of floors, the length of the house three variables, given the data set as shown below: Price for the output, the front four dimensions for input:


Multi-feature vectors

Tips:

n = number of features

X (i) = all input characteristics of the I training sample, which can be considered as a set of eigenvectors

X (i) j = value of the J characteristic of the first training sample, which can be considered as the J value in the eigenvector

Suppose h (x) =θ0+θ1x1+ ... The so-called multi-parameter linear regression is that each input x has (n+1) dimension [X0......xn]1. The linear hypothesis model of multi-characteristic quantity,

hθ (x) =θ0 +θ1x1 +θ2x2 +θ3x3 +θ4x4

For convenience, remember x0 = 1, then multivariate linear regression can be recorded as:

hθ (x) =θtx

which


gradient descent algorithm for multi-feature quantity

And for the multi-variable gradient descent algorithm,

For hypothesis:hθ (x) =θtx=θ0+θ1x1+θ2x2+...+θnxn

Where parameters: Θ0,θ1,..., partθn can be represented as vector θ of n+1 dimension

For cost Function:


The gradient descent algorithm can be transformed into:


the normalization of multi-characteristic quantity

If you have a machine learning problem this problem has multiple characteristics if you can ensure that these characteristics are in a similar range I mean to ensure that the different characteristics of the value in a similar range, so that the gradient descent method can be faster convergence, or not only will converge slowly, but also will be a fluctuation or even concussion, This uses the feature scaling algorithm.

Thought: Standardize the values of each feature, so that the range of values is roughly between -1<=x<=1;

method One: Mean normalization, which is simple normalization, divided by the maximum value of each set of features, then:

For example, the price problem: Feature 1: The size of the house (0-2000); feature 2: Number of rooms (1-5);

The contours are as follows:


According to the simple normalization process,


Its contour line is


method Two: mean normalization, the basic idea is to replace XI with Xi–μi to approximate the mean of the characteristics of 0 (but not x0=1 processing), the mean normalized formula is:


which

The SI can be a range of values (maximum-minimum) or standard deviation (deviation).

Μi is the mean value of the training set XI.


The choice of learning rate

For the gradient descent algorithm: two points to note:

-"Debug": How to ensure that the gradient descent algorithm is executed correctly;

-How to choose the correct step size (learning rate): α;

The 2nd is important and it is also the key point to ensure the convergence of gradient descent. To ensure that the gradient descent algorithm works correctly, it is necessary to ensure that J (θ) is reduced in each iteration, and that if a step decreases less than a small value ϵ, it converges. For example:


This section of the curve between 300 steps and 400 steps is the one that looks like J (θ) does not fall so much so that when you reach the 400-step iteration, the curve looks pretty flat, which means that the gradient descent algorithm is basically converging for the 400-step iteration, because the cost function and the number do not continue to fall. This curve can help you determine if the gradient descent algorithm has been convergent. That is, if the cost function J (θ) falls less than a small value ε then it is considered to have been convergent, usually to choose a suitable threshold of ε is quite difficult. Will usually try a series of alpha values, so in the run gradient descent method, try different alpha values such as 0.001, 0.01 here every 10 times times a value and then for these different alpha values are plotted J (θ) with the number of iterations of the curve and then choose a α value that appears to make J (θ) fast descent 。

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.