Linear regression of PYMC3-GLM

Last Update:2018-07-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

GLM: Linear regression

GLM is the generalized linear model, the generalized linear models.
Some software kits for Bayesian statistics jags, BUGS, Stan and PYMC, use these toolkits to have a good understanding of the models that will resume. the traditional form of linear regression

In general, the frequency school expresses linear regression as:
Y=xβ+ϵy = X\beta + \epsilon
Where y y is the output we expect to predict (the dependent variable), x x is the predictive factor (the argument), the Β\beta is the Model factor (parameter) to be estimated, and the Ε\epsilon is the error term, which is assumed to be a normal distribution.

The optimal solution of Β\beta can be found by using ordinary least squares or maximum likelihood method. probabilistic forms of linear regression

The Bayesian sampling probability distribution form describes the linear regression model, which can be expressed as the following form:
Y∼ (xβ,σ2) Y \sim \mathcal{n} (X \beta, \sigma^2)
Where Y y is a random variable whose value (each data point) obeys the normal distribution. The mean value of this normal distribution is determined by the linear variable xβx \beta, and the variance is determined by σ2 \sigma^2.

This description has the following two advantages:
* Transcendental: can give parameters to any prior we have.
* Quantifying uncertainty: Not only the estimation of a single β\beta is obtained, but the entire posterior β\beta of the Β\beta is given, which characterizes the possibility of different values. Bayesian Glms in the PyMC3

Using the GLM model in PyMC3, a complex model can be constructed simply.

%matplotlib inline

Import pymc3 as PM
import numpy as NP
import Matplotlib.pyplot as Plt

Generating Data

A straight line is generated by the intercept and the slope, and the normal distribution data points are generated near it with the line-bit mean value.

# data size
size =

# True line parameter
true_intercept = 1
true_slope = 2

# y = a + b*x
x = n P.linspace (0, 1, size)
True_regression_line = true_intercept + true_slope * x

# Add noise (sampled data)
y = True_regression_line + np.random.normal (scale=.5, size=size)

# simulate data 
data = Dict (x=x, y=y)

# Plot the data
fig = plt.figure (figsize= (7, 7));
Ax = fig.add_subplot (xlabel= ' x ', ylabel= ' y ', title= ' generated data and underlying model ');
Ax.plot (x, Y, ' x ', label= ' sampled data ');
Ax.plot (x, True_regression_line, label= ' true regression line ', lw=2.0);
Plt.legend (loc=0);

Direct Modeling Estimation

The Bayesian linear regression is used to model and fit the sampled data.

with PM. Model () as Naive_model: # model Specifications in PyMC3 are wrapped in a with-statement # Define priors intercept
    = pm. Normal (' Intercept ', 0, sd=20)
    slope = pm. Normal (' Slope ', 0, sd=20)
    sigma = pm. Halfcauchy (' Sigma ', beta=10, testval=1.)

    # Define Likelihood
    likelihood = pm. Normal (' y ', mu=intercept + slope * x, Sd=sigma, observed=y)

Sampling using the default sampler, PyMC3 does not need to specifically specify the sample initial value. The nuts sampler is initialized with Advi.

With Naive_model:
    naive_trace = Pm.sample (2000)

Auto-assigning NUTS Sampler
... Initializing NUTS using Advi ...
Average Elbo = -157.32:100%|██████████| 200000/200000 [00:18<00:00, 10888.26it/s]
finished [100%]: Average Elbo = -157.3
100%|██████████| 2000/ Watts [00:04<00:00, 432.16IT/S]

The result of the fitting shows that the estimated value of the Intercept (Intercept) is about 1.027 (true 1), and the estimate of the slope is about 1.893 (True 2). Noise variance is estimated at 0.528 (true 0.5). The accurate estimate was made.

#pm. Traceplot (naive_trace);
Pm.plot_posterior (Naive_trace);

GLM Modeling Estimation

Modeling estimates using the GLM module.

with PM. Model () as Glm_model:
    pm.glm.glm (' y~x ', data,\
               {' Intercept ': PM. Normal.dist (mu=0, sd=20), \
                ' SD ':p m. Halfcauchy.dist (beta=10, testval=1.), \
                ' x ':p m. Normal.dist (Mu=0, sd=20)})

?? Pm.glm.glm

With Glm_model:
    glm_trace = Pm.sample (2000)

Auto-assigning NUTS Sampler
... Initializing NUTS using Advi ...
Average Elbo = -157.31:100%|██████████| 200000/200000 [00:21<00:00, 9193.40it/s] 
finished [100%]: Average Elbo = -157.32
100%|██████████| 2000/ Watts [00:05<00:00, 391.03IT/S]

#pm. Traceplot (glm_trace);
Pm.plot_posterior (Glm_trace);

Analysis Model

Bayesian inference not only gives the optimal fitting line, but also gives the complete posteriori information of the parameters to be estimated.

The left side of the three random variable is given the edge of the posterior: The x axis represents the possible value, and the y-axis represents the probability of the corresponding value. The sample value of three random variables is given on the right side, and it can be seen that the three random variables are well convergent and stable (no jumps and strange patterns).

Secondly, the maximum posterior estimate of the random variable is very close to the real value (parameter simulation data is given the value).

The parameters are estimated by means of the mean value posterior test.

est_intercept=naive_trace[' intercept '].mean ();
est_slope=naive_trace[' slope '].mean ();
Est_regression_line = est_intercept + est_slope * x;

?? Pm.glm.plot_posterior_predictive

Plt.figure (figsize= (7, 7))
Plt.plot (x, Y, ' x ', label= ' data ');

Pm.glm.plot_posterior_predictive (glm_trace,samples=200,c= ' m ');
Plt.plot (x, True_regression_line, label= ' true regression line ', lw=3.0, c= ' y ');
Plt.plot (X,est_regression_line,label = ' mean estimate regression line ', c= ' K ');
Plt.title (' Posterior predictive regression lines ');
Plt.legend (loc=0);
Plt.xlabel (' x ');
Plt.ylabel (' Y ');

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linear regression of PYMC3-GLM

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linear regression of PYMC3-GLM

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support