TensorFlow is used for simple linear regression and gradient descent examples. tensorflow gradient

Source: Internet
Author: User

TensorFlow is used for simple linear regression and gradient descent examples. tensorflow gradient

Linear regression is supervised learning. Therefore, the method and supervised learning should be the same. First, a training set is given and a linear function is learned based on the training set, then, test whether the function is trained (that is, whether the function is sufficient to fit the training set data), and select the best function (minimum cost function.

Single-Variable Linear Regression:

A) because it is a linear regression, the learned function is a linear function, that is, a linear function;

B) because it is a single variable, there is only one x.

We can provide a single-variable linear regression model:

We often call x feature, and h (x) hypothesis.

In the method described above, we must have a question: how can we see the linear function fitting?

So here, we need to use the Cost Function (Cost Function). The smaller the Cost Function, the better the linear regression (better fitting with the training set). Of course, the minimum value is 0, that is, full fit.

For example:

We want to predict the price of a house based on the house size. The following data set is given:


Draw the following diagram based on the preceding Dataset:


We need to fit a straight line based on these points to minimize the Cost Function. Although we do not know what the Cost Function is like, our goal is to give the input vector x, the output vector y, the theta vector, and the output Cost value.

Cost Function:

Purpose of the Cost Function: Evaluate the hypothetical Function. The smaller the Cost Function, the better the fit for the training data.

The role of the Cost Function when the Cost Function is black box is described in detail:

But we certainly want to know what is the internal structure of the Cost Function? Therefore, the following formula is provided:

Where:

Indicates the I element in vector x;

Indicates the I-th element in vector y;

Indicates a known hypothetical function; m indicates the number of training sets.

If theta0 is always 0, the functions of theta1 and J are:

If theta0 and theta1 are not fixed, the functions of theta0, theta1, and J are:

Of course, we can also use two-dimensional graphs to represent the contour map:

Note that for linear regression, the cost function must be bowl-like, that is, there is only one minimum point.

Gradient Descent (Gradient Descent ):

However, another problem arises. Although given a function, we can know whether the function fits well based on the cost function. But after all, there are so many functions that we cannot try one by one?

So we lead to Gradient Descent: we can find the minimum value of the cost function. (Of course, there are many ways to solve the problem, gradient descent is only one of them, and another method is Normal Equation ).

The principle of gradient descent: compares a function to a mountain. We stand on a hillside and look around, from which direction we take a small step down, the fastest way to fall.

Method:

A) Determine the step toward the next step, which is called learning rate;

B) Any Given Initial Value: And;

C) determine a downward direction, follow the predefined steps, and update and;

D) when the descent height is smaller than a defined value, the descent stops.

Algorithm:

Features:

A) The obtained minimum values are different from the initial values. Therefore, the gradient descent result is only the local minimum value;

B) The closer the minimum value, the slower the descent speed.

Question 1: If the initial value and the initial value are at the location of local minimum, how will they change?

Answer: because it is already in the local minimum location, derivative must be 0, so it will not change.

Question 2: If a correct value is obtained, the cost function should be smaller and smaller. So how to set the value?

Answer: Observe the value at any time. If the cost function is smaller, it is OK. Otherwise, a smaller value is obtained.

The gradient descent process is described in detail:

It can be seen that the minimum values obtained are also different from the initial values. Therefore, gradient descent only calculates the local minimum values.

Note: the downgrading step is very important, because if it is too small, it will be very slow to find the minimum value of the function; if it is too large, it may be overshoot the minimum phenomenon.

Is the overshoot phenomenon:

If J function increases after Learning Rate is set, the value of Learning Rate needs to be reduced.

Integrating with Gradient Descent & Linear Regression:

Gradient descent can be used to obtain the minimum value of a function.

Linear regression requires the minimum Cost Function.

Therefore, we can use gradient descent for Cost functions to integrate gradient descent with linear regression, as shown in:

Gradient Descent is through non-stop iterations, and we pay more attention to the number of iterations, because this is related to the execution speed of gradient descent. To reduce the number of iterations, Feature Scaling is introduced.

Feature Scaling:

This method is applied to gradient descent to speed up gradient descent.

Idea: standardize the values of each feature so that the value range is roughly between-1 <= x <= 1.

The common method is Mean Normalization, that is, [X-mean (X)]/std (X ).

Exercise questions

We want to pass the midterm exam score to predict the final exam score. The expected equation is:

Specify the following training set:

If we want to perform feature scaling on (midterm exam) ^ 2, what is the value after feature scaling?

Answer: max = 8836, min = 4761, mean = 6675.5, then = (4761-6675.5)/(8836-4761) =-0.47.

Multi-Variable Linear Regression

In the previous section, we only introduced the linear regression of single variables, that is, there is only one input variable. The real world is not just that simple, so here we will introduce the linear regression of multiple variables.

For example, a house price is determined by many factors, such as size, number of bedrooms, number of floors, and age of home. Here we assume that the house price is determined by four factors, as shown in:

We have previously defined a single-variable linear regression model:

Here we can define a multi-variable linear regression model:

The Cost Function is as follows:

If we use gradient descent to solve the linear regression of multiple variables, we can still use the traditional Gradient Descent Algorithm for calculation:

Total exercise questions

We want to predict the score of the second year based on the score of A student in the first year. x indicates the number of A in the first year, and y indicates the number of A in the second year. The following dataset is given:

(1) number of training sets?

Answer: 4.

(2) What is the result of J (0, 1?

Solution: J (0, 1) = 1/(2*4) * [(3-4) ^ 2 + (2-1) ^ 2 + (4-3) ^ 2 + (0-1) ^ 2] = 1/8*(1 + 1 + 1 + 1) = 1/2 = 0.5.

We can also use vectorization to quickly calculate J (0, 1 ):

Below is a simple implementation through TensorFlow:

#!/usr/bin/env python  from __future__ import print_function  import tensorflow as tf import numpy as np  trX = np.linspace(-1, 1, 101) # create a y value which is approximately linear but with some random noise trY = 2 * trX + \   np.ones(*trX.shape) * 4 + \   np.random.randn(*trX.shape) * 0.03  X = tf.placeholder(tf.float32) # create symbolic variables Y = tf.placeholder(tf.float32)  def model(X, w, b):   # linear regression is just X*w + b, so this model line is pretty simple   return tf.mul(X, w) + b   # create a shared for weight s w = tf.Variable(0.0, name="weights") # create a variable for biases b = tf.Variable(0.0, name="biases") y_model = model(X, w, b)  cost = tf.square(Y - y_model) # use square error for cost function  # construct an optimizer to minimize cost and fit line to mydata train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost)  # launch the graph in a session with tf.Session() as sess:   # you need to initialize variables (in this case just variable w)   init = tf.initialize_all_variables()   sess.run(init)    # train   for i in range(100):     for (x, y) in zip(trX, trY):       sess.run(train_op, feed_dict={X: x, Y: y})    # print weight   print(sess.run(w)) # it should be something around 2   # print bias   print(sess.run(b)) # it should be something atound 4 

Refer:

TensorFlow linear regression Demo

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.