Linear regression learning notes and regression learning notes
Operating System: CentOS7.3.1611 _ x64
Python version: 2.7.5
Sklearn version: 0.18.2
Tensorflow version: 1.2.1
Linear regression is a statistical analysis method that uses regression analysis in mathematical statistics to determine the quantitative relationship between two or more variables. It is widely used. The expression is y = w'x + e, and e is the normal distribution with the mean of 0.
One-dimensional linear regression and multiple linear regression can be divided based on the number of variables.
In a regression model, mona1 regression is the simplest and most robust, but it often lacks the ability to describe the behavior of complex systems. Therefore, multivariate regression-based prediction technology is more common. Traditional multivariate regression models are generally linear. Due to the possibility of insignificant variables and the correlation between their respective variables, the regular equations of regression may become seriously diseased, it affects the stability of regression equations. Therefore, a basic problem facing multiple linear regression is to find the "Optimal" regression equation.
1-dimensional linear regression
In regression analysis, only one independent variable and one dependent variable are included, and the relationship between the two can be expressed in a straight line, which is called a one-dimensional linear regression analysis. The expression is as follows:
Y = a + b*X + e
A Indicates the intercept, B indicates the slope of the straight line, and e indicates the error item. This equation can predict the value of the target variable (Y) based on the given prediction variable (X.
When a = 1, B = 2, e = 0.1, the curve is as follows (Y = 1 + 2 * X + 0.1 ):
Common application scenarios:
Simply predict commodity prices, cost assessment, etc.
Use sklearn to solve the problem of unary linear regression
Sample Code:
#! /usr/bin/env python# -*- coding:utf-8 -*-# version : Python 2.7.5import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegressionrng = np.random.RandomState(1)X = 10 * rng.rand(30)Y = 1 + 2 * X + rng.randn(30)#print X#print Ymodel = LinearRegression(fit_intercept=True)model.fit(X[:, np.newaxis], Y)xfit = np.linspace(0, 20, 100)yfit = model.predict(xfit[:, np.newaxis])plt.scatter(X, Y)plt.plot(xfit, yfit)plt.show()
Github address of the Code:
Https://github.com/mike-zhang/pyExamples/blob/master/algorithm/LinearRegression/lr_sklearn_test1.py
The running effect is as follows:
Tensorflow for Linear Regression
Sample Code:
#! /Usr/bin/env python #-*-coding: UTF-8-*-# python version: 2.7.5 # tensorflow version: 1.2.1import tensorflow as tfimport numpy as npimport matplotlib. pyplot as pltN = 200 # sample data format trainNum = 30 # training times # formula: y = w * x + bX = np. linspace (-1, 1, N) Y = 3.0 * X + np. random. standard_normal (X. shape) * 0.3 + 0.9X = X. reshape ([N, 1]) Y = Y. reshape ([N, 1]) # The expected graph plt. scatter (X, Y) plt. plot (X, 3.0 * X + 0.9) plt. show () # modeling inputX = tf. placeholder (dtype = tf. float32, shape = [None, 1]) outputY = tf. placeholder (dtype = tf. float32, shape = [None, 1]) W = tf. variable (tf. random_normal ([1, 1], stddev = 0.01) B = tf. variable (tf. random_normal ([1], stddev = 0.01) pred = tf. matmul (inputX, W) + bloss = tf. performance_sum (tf. pow (pred-outputY, 2) train = tf. train. gradientDescentOptimizer (0.001 ). minimize (loss) tf. summary. scalar ("loss", loss) merged = tf. summary. merge_all () init = tf. global_variables_initializer () # Train with tf. session () as sess: sess. run (init) for I in range (trainNum): sess. run (train, feed_dict = {inputX: X, outputY: Y}) predArr, lossArr = sess. run ([pred, loss], feed_dict = {inputX: X, outputY: Y}) # print "lossArr:", lossArr # print "predArr:", predArr summary_str = sess. run (merged, feed_dict = {inputX: X, outputY: Y}) WArr, bArr = sess. run ([W, B]) print (WArr, bArr) # predicted graph plt. scatter (X, Y) plt. plot (X, WArr * X + bArr) plt. show ()
Github address of the Code:
Https://github.com/mike-zhang/pyExamples/blob/master/algorithm/LinearRegression/lr_tensorflow_test1.py
The running effect is as follows:
(array([[ 0.4075802]], dtype=float32), array([ 0.35226884], dtype=float32))(array([[ 0.75750935]], dtype=float32), array([ 0.56450701], dtype=float32))(array([[ 1.06031227]], dtype=float32), array([ 0.69184995], dtype=float32))(array([[ 1.32233584]], dtype=float32), array([ 0.76825565], dtype=float32))(array([[ 1.54907179]], dtype=float32), array([ 0.81409913], dtype=float32))(array([[ 1.7452724]], dtype=float32), array([ 0.84160519], dtype=float32))(array([[ 1.91505003]], dtype=float32), array([ 0.85810882], dtype=float32))(array([[ 2.06196308]], dtype=float32), array([ 0.868011], dtype=float32))(array([[ 2.18909097]], dtype=float32), array([ 0.87395233], dtype=float32))(array([[ 2.29909801]], dtype=float32), array([ 0.8775171], dtype=float32))(array([[ 2.39428997]], dtype=float32), array([ 0.87965596], dtype=float32))(array([[ 2.47666216]], dtype=float32), array([ 0.8809393], dtype=float32))(array([[ 2.54794097]], dtype=float32), array([ 0.88170928], dtype=float32))(array([[ 2.60962057]], dtype=float32), array([ 0.88217127], dtype=float32))(array([[ 2.66299343]], dtype=float32), array([ 0.88244849], dtype=float32))(array([[ 2.70917845]], dtype=float32), array([ 0.88261479], dtype=float32))(array([[ 2.7491436]], dtype=float32), array([ 0.88271457], dtype=float32))(array([[ 2.78372645]], dtype=float32), array([ 0.88277447], dtype=float32))(array([[ 2.81365204]], dtype=float32), array([ 0.88281041], dtype=float32))(array([[ 2.8395474]], dtype=float32), array([ 0.88283193], dtype=float32))(array([[ 2.8619554]], dtype=float32), array([ 0.88284487], dtype=float32))(array([[ 2.88134551]], dtype=float32), array([ 0.88285261], dtype=float32))(array([[ 2.89812446]], dtype=float32), array([ 0.88285726], dtype=float32))(array([[ 2.91264367]], dtype=float32), array([ 0.88286006], dtype=float32))(array([[ 2.92520738]], dtype=float32), array([ 0.88286173], dtype=float32))(array([[ 2.93607926]], dtype=float32), array([ 0.88286275], dtype=float32))(array([[ 2.94548702]], dtype=float32), array([ 0.88286334], dtype=float32))(array([[ 2.95362759]], dtype=float32), array([ 0.8828637], dtype=float32))(array([[ 2.9606719]], dtype=float32), array([ 0.88286394], dtype=float32))(array([[ 2.96676755]], dtype=float32), array([ 0.88286406], dtype=float32))
Multiple linear regression
In regression analysis, two or more independent variables are included, and the relationship between the dependent variables and independent variables is linear.
The expression is as follows:
Y = a0 + a1 * X1 + a2 * X2 + ... + an * Xn + e
Where,
(A0, a1, a2, a3,..., an) is an unknown parameter vector.
(X1, X2, X3,..., Xn) is an interpreted variable, which can be fixed (designed) or random
E is a random error.
This equation can predict the value of the target variable (Y) based on the given prediction vector (X1, X2, X3,..., Xn.
When a0 = 1, a1 = 2, a2 = 3, e = 0.1, the equation is as follows:
Y = 1 + 2 * X1 + 3 * X2 + 0.1
Use sklearn to solve multiple linear regression problems
Sample Code:
#! /usr/bin/env python#-*- coding:utf-8 -*-# version : Python 2.7.5import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegressionrng = np.random.RandomState(1)N = 10X = np.array(N * [10 * rng.rand(2)])b = [2, 3]Y = 1 + np.matmul(X,b) + rng.randn(N)print Xprint Ymodel = LinearRegression()model.fit(X, Y)xfit = np.array(10 * [10 * rng.rand(2)])yfit = model.predict(xfit)print "xfit :"print xfitprint "yfit :"print yfit
Github address of the Code:
Https://github.com/mike-zhang/pyExamples/blob/master/algorithm/LinearRegression/lr_sklearn_test2.py
The running effect is as follows:
[[ 4.17022005 7.20324493] [ 4.17022005 7.20324493] [ 4.17022005 7.20324493] [ 4.17022005 7.20324493] [ 4.17022005 7.20324493] [ 4.17022005 7.20324493] [ 4.17022005 7.20324493] [ 4.17022005 7.20324493] [ 4.17022005 7.20324493] [ 4.17022005 7.20324493]][ 30.42200315 29.87720628 31.81558253 28.6486362 32.69498666 30.188968 31.26921399 30.70080452 32.41228283 28.89003419]xfit :[[ 1.40386939 1.98101489] [ 1.40386939 1.98101489] [ 1.40386939 1.98101489] [ 1.40386939 1.98101489] [ 1.40386939 1.98101489] [ 1.40386939 1.98101489] [ 1.40386939 1.98101489] [ 1.40386939 1.98101489] [ 1.40386939 1.98101489] [ 1.40386939 1.98101489]]yfit :[ 12.7586356 12.7586356 12.7586356 12.7586356 12.7586356 12.7586356 12.7586356 12.7586356 12.7586356 12.7586356]
Okay, that's all. I hope it will help you.
Github address:
Bytes
Please add