GLM: Linear regression
GLM is the generalized linear model, the generalized linear models.
Some software kits for Bayesian statistics jags, BUGS, Stan and PYMC, use these toolkits to have a good understanding of the models that will resume. the traditional form of linear regression
In general, the frequency school expresses linear regression as:
Y=xβ+ϵy = X\beta + \epsilon
Where y y is the output we expect to predict (the dependent variable), x x is the predictive factor (the argument), the Β\beta is the Model factor (parameter) to be estimated, and the Ε\epsilon is the error term, which is assumed to be a normal distribution.
The optimal solution of Β\beta can be found by using ordinary least squares or maximum likelihood method. probabilistic forms of linear regression
The Bayesian sampling probability distribution form describes the linear regression model, which can be expressed as the following form:
Y∼ (xβ,σ2) Y \sim \mathcal{n} (X \beta, \sigma^2)
Where Y y is a random variable whose value (each data point) obeys the normal distribution. The mean value of this normal distribution is determined by the linear variable xβx \beta, and the variance is determined by σ2 \sigma^2.
This description has the following two advantages:
* Transcendental: can give parameters to any prior we have.
* Quantifying uncertainty: Not only the estimation of a single β\beta is obtained, but the entire posterior β\beta of the Β\beta is given, which characterizes the possibility of different values. Bayesian Glms in the PyMC3
Using the GLM model in PyMC3, a complex model can be constructed simply.
%matplotlib inline
Import pymc3 as PM
import numpy as NP
import Matplotlib.pyplot as Plt
Generating Data
A straight line is generated by the intercept and the slope, and the normal distribution data points are generated near it with the line-bit mean value.
# data size
size =
# True line parameter
true_intercept = 1
true_slope = 2
# y = a + b*x
x = n P.linspace (0, 1, size)
True_regression_line = true_intercept + true_slope * x
# Add noise (sampled data)
y = True_regression_line + np.random.normal (scale=.5, size=size)
# simulate data
data = Dict (x=x, y=y)
# Plot the data
fig = plt.figure (figsize= (7, 7));
Ax = fig.add_subplot (xlabel= ' x ', ylabel= ' y ', title= ' generated data and underlying model ');
Ax.plot (x, Y, ' x ', label= ' sampled data ');
Ax.plot (x, True_regression_line, label= ' true regression line ', lw=2.0);
Plt.legend (loc=0);
Direct Modeling Estimation
The Bayesian linear regression is used to model and fit the sampled data.
with PM. Model () as Naive_model: # model Specifications in PyMC3 are wrapped in a with-statement # Define priors intercept
= pm. Normal (' Intercept ', 0, sd=20)
slope = pm. Normal (' Slope ', 0, sd=20)
sigma = pm. Halfcauchy (' Sigma ', beta=10, testval=1.)
# Define Likelihood
likelihood = pm. Normal (' y ', mu=intercept + slope * x, Sd=sigma, observed=y)
Sampling using the default sampler, PyMC3 does not need to specifically specify the sample initial value. The nuts sampler is initialized with Advi.
With Naive_model:
naive_trace = Pm.sample (2000)
Auto-assigning NUTS Sampler
... Initializing NUTS using Advi ...
Average Elbo = -157.32:100%|██████████| 200000/200000 [00:18<00:00, 10888.26it/s]
finished [100%]: Average Elbo = -157.3
100%|██████████| 2000/ Watts [00:04<00:00, 432.16IT/S]
The result of the fitting shows that the estimated value of the Intercept (Intercept) is about 1.027 (true 1), and the estimate of the slope is about 1.893 (True 2). Noise variance is estimated at 0.528 (true 0.5). The accurate estimate was made.
#pm. Traceplot (naive_trace);
Pm.plot_posterior (Naive_trace);
GLM Modeling Estimation
Modeling estimates using the GLM module.
with PM. Model () as Glm_model:
pm.glm.glm (' y~x ', data,\
{' Intercept ': PM. Normal.dist (mu=0, sd=20), \
' SD ':p m. Halfcauchy.dist (beta=10, testval=1.), \
' x ':p m. Normal.dist (Mu=0, sd=20)})
?? Pm.glm.glm
With Glm_model:
glm_trace = Pm.sample (2000)
Auto-assigning NUTS Sampler
... Initializing NUTS using Advi ...
Average Elbo = -157.31:100%|██████████| 200000/200000 [00:21<00:00, 9193.40it/s]
finished [100%]: Average Elbo = -157.32
100%|██████████| 2000/ Watts [00:05<00:00, 391.03IT/S]
#pm. Traceplot (glm_trace);
Pm.plot_posterior (Glm_trace);
Analysis Model
Bayesian inference not only gives the optimal fitting line, but also gives the complete posteriori information of the parameters to be estimated.
The left side of the three random variable is given the edge of the posterior: The x axis represents the possible value, and the y-axis represents the probability of the corresponding value. The sample value of three random variables is given on the right side, and it can be seen that the three random variables are well convergent and stable (no jumps and strange patterns).
Secondly, the maximum posterior estimate of the random variable is very close to the real value (parameter simulation data is given the value).
The parameters are estimated by means of the mean value posterior test.
est_intercept=naive_trace[' intercept '].mean ();
est_slope=naive_trace[' slope '].mean ();
Est_regression_line = est_intercept + est_slope * x;
?? Pm.glm.plot_posterior_predictive
Plt.figure (figsize= (7, 7))
Plt.plot (x, Y, ' x ', label= ' data ');
Pm.glm.plot_posterior_predictive (glm_trace,samples=200,c= ' m ');
Plt.plot (x, True_regression_line, label= ' true regression line ', lw=3.0, c= ' y ');
Plt.plot (X,est_regression_line,label = ' mean estimate regression line ', c= ' K ');
Plt.title (' Posterior predictive regression lines ');
Plt.legend (loc=0);
Plt.xlabel (' x ');
Plt.ylabel (' Y ');