Linear regression of PYMC3-GLM

Source: Internet
Author: User
GLM: Linear regression

GLM is the generalized linear model, the generalized linear models.
Some software kits for Bayesian statistics jags, BUGS, Stan and PYMC, use these toolkits to have a good understanding of the models that will resume. the traditional form of linear regression

In general, the frequency school expresses linear regression as:
Y=xβ+ϵy = X\beta + \epsilon
Where y y is the output we expect to predict (the dependent variable), x x is the predictive factor (the argument), the Β\beta is the Model factor (parameter) to be estimated, and the Ε\epsilon is the error term, which is assumed to be a normal distribution.

The optimal solution of Β\beta can be found by using ordinary least squares or maximum likelihood method. probabilistic forms of linear regression

The Bayesian sampling probability distribution form describes the linear regression model, which can be expressed as the following form:
Y∼ (xβ,σ2) Y \sim \mathcal{n} (X \beta, \sigma^2)
Where Y y is a random variable whose value (each data point) obeys the normal distribution. The mean value of this normal distribution is determined by the linear variable xβx \beta, and the variance is determined by σ2 \sigma^2.

This description has the following two advantages:
* Transcendental: can give parameters to any prior we have.
* Quantifying uncertainty: Not only the estimation of a single β\beta is obtained, but the entire posterior β\beta of the Β\beta is given, which characterizes the possibility of different values. Bayesian Glms in the PyMC3

Using the GLM model in PyMC3, a complex model can be constructed simply.

%matplotlib inline

Import pymc3 as PM
import numpy as NP
import Matplotlib.pyplot as Plt
Generating Data

A straight line is generated by the intercept and the slope, and the normal distribution data points are generated near it with the line-bit mean value.

# data size
size =

# True line parameter
true_intercept = 1
true_slope = 2

# y = a + b*x
x = n P.linspace (0, 1, size)
True_regression_line = true_intercept + true_slope * x

# Add noise (sampled data)
y = True_regression_line + np.random.normal (scale=.5, size=size)

# simulate data 
data = Dict (x=x, y=y)

# Plot the data
fig = plt.figure (figsize= (7, 7));
Ax = fig.add_subplot (xlabel= ' x ', ylabel= ' y ', title= ' generated data and underlying model ');
Ax.plot (x, Y, ' x ', label= ' sampled data ');
Ax.plot (x, True_regression_line, label= ' true regression line ', lw=2.0);
Plt.legend (loc=0);

Direct Modeling Estimation

The Bayesian linear regression is used to model and fit the sampled data.

with PM. Model () as Naive_model: # model Specifications in PyMC3 are wrapped in a with-statement # Define priors intercept
    = pm. Normal (' Intercept ', 0, sd=20)
    slope = pm. Normal (' Slope ', 0, sd=20)
    sigma = pm. Halfcauchy (' Sigma ', beta=10, testval=1.)

    # Define Likelihood
    likelihood = pm. Normal (' y ', mu=intercept + slope * x, Sd=sigma, observed=y)

Sampling using the default sampler, PyMC3 does not need to specifically specify the sample initial value. The nuts sampler is initialized with Advi.

With Naive_model:
    naive_trace = Pm.sample (2000)
Auto-assigning NUTS Sampler
... Initializing NUTS using Advi ...
Average Elbo = -157.32:100%|██████████| 200000/200000 [00:18<00:00, 10888.26it/s]
finished [100%]: Average Elbo = -157.3
100%|██████████| 2000/ Watts [00:04<00:00, 432.16IT/S]

The result of the fitting shows that the estimated value of the Intercept (Intercept) is about 1.027 (true 1), and the estimate of the slope is about 1.893 (True 2). Noise variance is estimated at 0.528 (true 0.5). The accurate estimate was made.

#pm. Traceplot (naive_trace);
Pm.plot_posterior (Naive_trace);

GLM Modeling Estimation

Modeling estimates using the GLM module.

with PM. Model () as Glm_model:
    pm.glm.glm (' y~x ', data,\
               {' Intercept ': PM. Normal.dist (mu=0, sd=20), \
                ' SD ':p m. Halfcauchy.dist (beta=10, testval=1.), \
                ' x ':p m. Normal.dist (Mu=0, sd=20)})
?? Pm.glm.glm
With Glm_model:
    glm_trace = Pm.sample (2000)
Auto-assigning NUTS Sampler
... Initializing NUTS using Advi ...
Average Elbo = -157.31:100%|██████████| 200000/200000 [00:21<00:00, 9193.40it/s] 
finished [100%]: Average Elbo = -157.32
100%|██████████| 2000/ Watts [00:05<00:00, 391.03IT/S]
#pm. Traceplot (glm_trace);
Pm.plot_posterior (Glm_trace);

Analysis Model

Bayesian inference not only gives the optimal fitting line, but also gives the complete posteriori information of the parameters to be estimated.

The left side of the three random variable is given the edge of the posterior: The x axis represents the possible value, and the y-axis represents the probability of the corresponding value. The sample value of three random variables is given on the right side, and it can be seen that the three random variables are well convergent and stable (no jumps and strange patterns).

Secondly, the maximum posterior estimate of the random variable is very close to the real value (parameter simulation data is given the value).

The parameters are estimated by means of the mean value posterior test.

est_intercept=naive_trace[' intercept '].mean ();
est_slope=naive_trace[' slope '].mean ();
Est_regression_line = est_intercept + est_slope * x;
?? Pm.glm.plot_posterior_predictive
Plt.figure (figsize= (7, 7))
Plt.plot (x, Y, ' x ', label= ' data ');

Pm.glm.plot_posterior_predictive (glm_trace,samples=200,c= ' m ');
Plt.plot (x, True_regression_line, label= ' true regression line ', lw=3.0, c= ' y ');
Plt.plot (X,est_regression_line,label = ' mean estimate regression line ', c= ' K ');
Plt.title (' Posterior predictive regression lines ');
Plt.legend (loc=0);
Plt.xlabel (' x ');
Plt.ylabel (' Y ');

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.