Layered Bayesian model
For a random variable sequence $y_{1},..., y_{n} $, if in any sort order $\pi $, its probability density satisfies $p (y_{1},..., Y_{n}) =p (Y_{\pi_{1}},..., Y_{\pi_{n}) $, Then the variables are called exchangeable. When we lack the information to differentiate these random variables, the commutative nature is a reasonable attribute of $p (Y_{1},..., y_{n}) $. In this case, each random variable can be considered as a result of independent sampling from a group, and the properties of the group can be described by a fixed unknown parameter $\phi $, i.e.:
$$
\phi\sim P (\phi)
$$
$$
\{y_{1},..., y_{n}|\phi\}\sim^{i.i.d.}p (Y|\phi)
$$
Consider the hierarchical data $\{y_{1},..., y_{n}\} $, where $y_{j}=\{y_{1,j},..., y_{n_{j},j}\} $, then there are
$$
\{Y_{1,J},..., y_{n_{j},j}|\phi_{j}\}\sim^{i.i.d.}p (Y|\phi_{j})
$$
But how do we represent the group parameter $\phi_{1},..., \phi_{m} $? If these groups themselves belong to a larger group, then these parameter variables also satisfy the commutative nature, so there are
$$
\{\phi_{1},..., \phi_{m}|\phi\}\sim^{i.i.d.}p (\PHI|\PSI)
$$
In conclusion, we are able to get three probability distributions
In-Group sampling: $\{y_{1,j},..., y_{n_{j},j}|\phi_{j}\}\sim^{i.i.d.}p (Y|\phi_{j}) $ Inter-group sampling: $\{\phi_{1},..., \phi_{m}|\phi\}\sim^{i.i.d.}p (\PHI|\PSI) $ Prior distribution: $\psi \sim P (\psi) $ |
Layered Normal distribution model
In the following, the hierarchical normal distribution model is used to describe the mean heterogeneity among several groups, both intra-group and inter-group sampling are subject to normal distribution:
Intra-group model: $\phi_{j}=\{\theta_{j},\sigma^2\},\; P (Y|\phi_{j}) =normal (\theta_{j},\sigma^2) $ Inter-group model: $\psi=\{\mu,\tau^2\},\; P (\theta_{j}|\psi) =normal (\mu,\tau^2) $ |
The unknown parameters fixed in the model are, and, for convenience, we assume that these parameters conform to the standard semi-conjugate normal distribution and inverse gamma distribution:
$1/\sigma^2 \sim Gamma (\NU_{0}/2,\NU_{0}\SIGMA_{0}^2/2) $ $1/\tau^2 \sim Gamma (\eta_{0}/2,\eta_{0} \TAU_{0}^2/2) $ $\mu \sim Normal (\mu_{0}, \gamma_{0}^2) $ |
The model structure is as follows:
Post-mortem inference:
Important conclusions of one-dimensional normal model:
Conclusion 1 Assume that the sampling model is $\{y_{1},..., y_{n}|\theta,\sigma^2\}\sim^{i.i.d.}normal (\theta,\sigma^2) $ if $\theta \sim normal (\mu_{0},\tau_{0}^2) $,$1/\sigma^2 \sim Gamma (\NU_{0}/2,\NU_{0}\SIGMA_{0}^2/2) $, then $p (\theta|\sigma^2,y_{1},..., Y_ {n}) \sim Normal (\mu_{n},\tau_{n}^2) $, where $\mu_{n}=\frac{\mu_{0}/\tau{0}^2+n\bar{y}/\sigma^2}{1/\tau^2+n/\sigma^2} $,$\ Tau_{n}^2=\big (\frac{1}{\tau_{0}^2}+\frac{n}{\sigma^2}\big) ^{-1} $
Conclusion 2 Assume that the sampling model is $\{y_{1},..., y_{n}|\theta,\sigma^2\}\sim^{i.i.d.}normal (\theta,\sigma^2) $ if $\theta \sim normal (\mu_{0},\tau_{0}^2) $,$1/\sigma^2 \sim Gamma (\NU_{0}/2,\NU_{0}\SIGMA_{0}^2/2) $, then $p (\sigma^2|\theta,y_{1},..., Y_ {n}) \sim Inverse-gamma (\nu_{n}/2,\nu_{n}\sigma_{n}^2 (\theta)/2) $, wherein $\nu_{n}=\nu_{0}+n $,$\sigma_{n}^2 (\theta) =\frac{ 1}{\nu_{n}[\nu_{0}\sigma_{0}^2+ns_{n}^2 (\theta)]} $, $s _{n}^2 (\theta) =\sum (Y_{i}-\theta) ^2/n $
Unknown parameters in the system include the group mean value $ (\theta_{1},..., \theta_{m}) $ and the variance $\sigma^2 $ and the inter-group mean value $\MU $ and Variance $\tau^2 $, the joint posteriori inference of the parameters can be estimated by constructing the $gibbs $ sampler P (\theta,..., \theta,\mu,\tau^2,\sigma^2|y_{1},..., y_{m}) $, $Gibbs $ sampler is calculated by iterating through the full conditional distribution of each parameter.
$$
\begin{aligned}
&p (\theta_{1},..., \theta_{m},\mu,\tau^2,\sigma^2|y_{1},..., y_{m}) \ \
&\propto p (\mu,\tau^2,\sigma^2) \times p (\theta_{1},..., \theta_{m}|\mu,\tau^2,\sigma^2) \times p (y_{1},..., y_{m }|\theta_{1},..., \theta_{m},\mu,\tau^2,\sigma^2) \ \
&=p (\MU) p (\tau^2) p (\sigma^2) \big \{\prod_{j=1}^m P (\theta_j|\mu,\tau^2) \big\} \big\{\prod_{j=1}^m \prod_{i=1}^ n P (y_{i,j}|\theta_j,\sigma^2) \big\}
\end{aligned}
$$
Based on the dependence of random variables, we can get the full condition distribution of each variable.
$$
P (\mu|\theta_{1},..., \theta_{m},\tau^2,\sigma^2,y_{1},..., y_{m}) \propto P (\mu) \prod p (\theta_{j}|\mu,\tau^2)
$$
$$
P (\tau^2|\theta_{1},..., \theta_{m},\tau^2,\sigma^2,y_{1},..., y_{m}) \propto P (\tau^2) \prod p (\theta_{j}|\mu,\tau^ 2)
$$
$$
P (\theta_{j}|\mu,\tau^2,\sigma^2,y_{1},..., y_{m}) \propto P (\theta_{j}|\mu,\tau^2) \prod_{i=1}^{n_{j}} p (y_{i,j}|\ THETA_{J},\SIGMA^2)
$$
$$
\begin{aligned}
P (\sigma^2|\theta_{1},..., \theta_{m},y_{1},..., y_{m}) &\propto P (\sigma^2) \prod_{j=1}^m \prod_{i=1}^{n_{j}}p ( y_{i,j}|\theta_{j},\sigma^2) \ \
&\propto (\sigma^2) ^{-\nu_{0}/2+1}e^{-\frac{\nu_{0}\sigma_{0}^2}{2\sigma^2}} (\sigma^2) ^{-\sum n_{j}/2}e^{-\ Frac{\sum \sum (Y_{i,j}-\theta_{j}) ^2}{2\sigma^2}}
\end{aligned}
$$
So according to the above two conclusions, we can get:
$$
\{\mu|\theta_{1},..., \theta_{m},\tau^2\}\sim normal \big (\frac{m\bar{\theta}/\tau^2+\mu_{0}/\gamma_{0}^2}{m/\tau ^2+1/\GAMMA_{0}^2},[M/\TAU^2+1/\GAMMA_{0}^2]^{-1} \big)
$$
$$
\{1/\tau^2|\theta_{1},..., \theta_{m},\mu\}\sim Gamma \big (\frac{\eta_{0}+m}{2},\frac{\eta_{0}\tau_{0}^2+\sum (\ THETA_{J}-\MU) ^2}{2}\big)
$$
$$
\{\THETA_{J}|Y_{1,J},..., Y_{n,j},\sigma^2\}\sim Normal\big (\frac{n_{j}\bar{y}_{j}/\sigma^2+1/\tau^2}{n_{j}/\ SIGMA^2+1/\TAU^2},[N_{J}/\SIGMA^2+1/\TAU^2]^{-1}\BIG)
$$
$$
\{1/\sigma^2|\theta,y_{1},..., Y_{n}\sim Gamma \big (\frac{1}{2}[\nu_{0}+\sum_{j=1}^m n_{j}],\frac{1}{2}[\nu_{0}\ Sigma_{0}^2+\sum_{j=1}^m \sum_{i=1}^{n_{j}} (Y_{i,j}-\theta_{j}) (^2]\big) \}
$$
Calculation process:
- Setting prior distribution parameters
$ (\nu_{0},\sigma_{0}^2) \rightarrow P (\sigma^2) $
$ (\eta_{0},\tau_{0}^2) \rightarrow P (\tau_{0}^2) $
$ (\mu_{0},\gamma_{0}^2) \rightarrow P (\MU) $
2. Parameter posteriori estimation for each unknown parameter in the full conditional distribution, i.e. the current state of the given parameter $\{\theta_{1}^{(s)},..., \theta_{m}^{(s)},\mu^{(s)},\tau^{2 (s)},\sigma^{ 2 (s)}\} $, the new state is obtained in the following manner:
$sample: \;\mu^{(s+1)}\sim p (\mu|\theta_{1}^{(s)},..., \theta_{m}^{(s)},\tau^{2 (s)}) $
$sample: \;\tau^{2 (s+1)}\sim p (\tau^2|\theta_{1}^{(s)},..., \theta_{m}^{(s)},\mu^{(S+1)}) $
$sample: \;\sigma^{2 (s+1)}\sim p (\sigma^2|\theta_{1}^{(s)},..., \theta_{m}^{(s)},y_{1},..., y_{m}) $
$for \;each\;j\in\{1,..., m\},\;sample\;\theta_{j}^{(s+1)}\sim p (\theta_{j}|\mu^{(s+1)},\tau^{2 (s+1)},\sigma^{2 (s +1)},y_{i}) $
Until the parameter converges, the system parameters are obtained.
Further, if the mean between the groups is different, the variance between the groups is also different, at this point $\sigma_{j}^2 $ is the variance of the $j $ group, then our sampling model becomes: $\{y_{1,j},..., y_{n_{j},j}\}\sim^{i.i.d. The full conditional distribution of normal (\theta_{j},\sigma_{j}^2) $,$\theta_{j} $ is: $\{\theta_{j}|y_{1,j},..., Y_{n_{j},j},\sigma_{j}^2\}\sim Normal \big (\frac{n_{j}\bar{y}_{j}/\sigma_{j}^2+1/\tau^2}{n_{j}/\sigma_{j}^2+1/\tau^2},[n_{j}/\sigma_{j}^2+1/\ TAU^2]^{-1}\BIG) $
How to estimate $\sigma_{j}^2 $? Let us first assume that:
$$
\sigma_{1}^2,..., \sigma_{m}^2\sim^{i.i.d.}gamma (\NU_{0}/2,\NU_{0}\SIGMA_{0}^2/2)
$$
Its full-condition distribution is:
$$
\{1/\SIGMA_{J}^2|Y_{1,J},..., Y_{n_{j},j},\theta_{j}\}\sim Gamma \big ([\nu_{0}+n_{j}]/2,[\nu_{0}\sigma_{0}^2+\sum (Y_{i,j}-\theta_{j}) ^2]/2) \big)
$$
The value of $\sigma_{1}^2,..., \sigma_{m}^2 $ can also be solved using the $gibbs $ sampling iteration.
If $\nu_{0} $ and $\sigma_{0}^2 $ are fixed, $\sigma_{j}^2 $ is independent of each other, meaning that the value of $\sigma_{m}^2 $ cannot be $\sigma_{1}^2,..., \sigma_{m-1}^2 $ To estimate, but if $\sigma_{m}^2 $ is in a small sample size, we should consider using $\sigma_{1}^2,..., \sigma_{m-1}^2 $ to increase the estimate of $\sigma_{m}^2 $, what should we do? What we're actually going to do is we can put $\nu_{0} $ and $\sigma_{0}^2 $ as an estimate, the overall structure of the system is:
Thus our unknown parameters are: Intra-Group sample distribution $\{(\theta_{1},\sigma_{1}^2),..., (\theta_{m},\sigma_{m}^2) \} $, inter-group mean heterogeneity parameter $\{\mu,\tau^2\} $, Group inter-party heterogeneity parameters $\{\nu_{0},\sigma_{0}^2\} $,$\{\mu,\tau^2\} $ and $\{(\theta_{1},\sigma_{1}^2),..., (\theta_{m},\sigma_{m}^2 ) \} $ is given, and the estimate for $\{\nu_{0},\sigma_{0}^2\} $ is now discussed. Assuming that $\sigma_{0}^2 $ obeys a priori distribution of the Conjugate class, $p (\sigma_{0}^2) \sim Gamma (A, b) $, then there is
$$
P (\sigma_{0}^2|\sigma_{1}^2,..., \sigma_{m}^2,\nu_{0}) =dgamma (A+\frac{1}{2}m\nu_{0},b+\frac{1}{2}\sum_{j=1}^m (1 /\SIGMA_{J}^2))
$$
The conjugate priori distribution of the simple $\nu_{0} $ does not exist, but if we limit it to an integer, the problem becomes simple. Assuming that $\nu_{0} $ has a priori obey $\{1,2,... \} $ on the geometric distribution so that $p (\nu_{0}) \propto E^{-\alpha \nu_{0}} $, then
$$
\begin{aligned}
& P (\nu_{0}|\sigma_{0}^2,\sigma_{1}^2,..., \sigma_{m}^2) \ \
& \propto P (\nu_{0}) \times p (\sigma_{1}^2,..., \sigma_{m}^2|\nu_{0},\sigma_{0}^2) \ \
& \propto \big (\frac{(\NU_{0}\SIGMA_{0}^2/2) ^{\nu_{0}/2}}{\gamma (\NU_{0}/2)}\big) ^m \big (\prod_{j=1}^m \frac{1 }{\sigma_{j}^2}\big) ^{\nu_{0}/2-1}\times exp\{-\nu_{0} (\alpha+\frac{1}{2}\sigma_{0}^2\sum (1/\sigma_{j}^2)) \}
\end{aligned}
$$
Problems are solved.
References: Hoff, Peter d. A first course in Bayesian statistical methods. Springer Science & Business Media, 2009.
Layered Bayesian Model--structure