PRML 4:generative Models

Last Update:2015-06-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From the perspective of probability theory, a classification problem is usually divided into two stages: (I) inference stage refers to the establishment of a certain type of probabilistic model, the posterior probability distribution is obtained by a certain way $p (c_k\text{|} \VEC{X}) $; (II) The decision stage is based on a posteriori probability distribution, which makes predictions on the feature vectors that are unknown to the tag. In the concrete realization process can be divided into two schools: (I) The statistical school often uses the MLE or MAP and other means to estimate the model parameters, in the inference stage to obtain a definite probability distribution, and then found in the decision stage so that expected loss up To the smallest predicted value, (II) the Bayesian school learns and approximates the distribution of the parameters, then marginalization all possible parameters to get the predicted values, such as

$p (C_k|\vec{x}_{n+1},x,\vec{t}) =\int P (c_k|\vec{x}_{n+1},\vec{w}) \cdot P (\vec{w}| X,\vec{t}) \cdot d\vec{w}$, which asks for $p (\vec{w}| X,\vec{t}) $ take a conjugate priori there are often online algorithms.

Regardless of the school, inference stage can be divided into two modeling methods: (I) generative model is to establish class-conditional distribution $p (\vec{x}\text{|} C_k) $ of the parameter model, first get the joint probability distribution $p (c_k,\vec{x}) $ re-calculation of the posterior probability distribution $p (c_k\text{|} \VEC{X}) $; (II) The discriminative model directly models and learns the posterior probability distributions, the most classic example being logistic regression.

The previously mentioned discriminant function is a kind of opportunistic judgment method, not based on the probability theory, but directly to the classification decision modeling, and then construct the appropriate objective function to optimize. Compared with this method, the advantage of the method based on probability theory is: (I) can modify expected loss at any time, (II) can solve the problem of the imbalance of positive and negative samples by constructing a training set with appropriate prior distribution ; III) can be fused with independent, labeled models, such as $p (c_k\text{|} \vec{x},\vec{y}) =\frac{p (c_k\text{|} \VEC{X}) p (c_k\text{|} \vec{y})}{p (c_k)}$.

A typical generative model is Naive Bayes Classifier with Laplace smoothing: Given the class label, we A Ssume the feature Components is conditionally independent distributed, i.e. $p (\vec{x}^{(i)}=a_{is},\vec{x}^{(j)}=a_{ jr}\text{|} C_k) =p (\vec{x}^{(i)}=a_{is}\text{|} C_k) \cdot P (\vec{x}^{(j)}=a_{jr}\text{|} C_k) $.

(1) Prior: $p (c_k) =\frac{\sum_{n=1}^n I (y_n=c_k) +1}{n+k}$ for $0\leq k<k$;

(2) Likelihood: $p (\vec{x}^{(j)}=a_{jl}\text{|} C_k) =\frac{\sum_{n=1}^n I (\vec{x}_n^{(j)}=a_{jl},c_k) +1}{\sum_{n=1}^n I (Y_n=c_k) +s_j}$;

(3) Prediction: $y =\mathop{argmax}_{c_k}p (c_k) \cdot \prod_{j=0}^{k-1}p (\vec{x}^{(j)}=\vec{x}_{n+1}^{(j)}\text{|} C_k) $.

　　Gaussian dicriminant Analysis (GDA) was another example that makes an MAP estimate to do a prediction:given the C Lass label, we assume the feature vector is Gaussian distributed. Here we take $K =2$ for example.

(1) Prior: $p (c_k) =\frac{1}{n}\cdot\sum_{n=1}^n I (y_n=c_k) $ for $k =0,1$;

(2) Likelihood: $p (\vec{x}\text{|} C_k) =gauss (\vec{x}\text{|} \VEC{\MU}_K,\SIGMA) $, where by MLE we have

$\vec\mu_k=\frac{\sum_{n=1}^n I (y_n=c_k) \cdot\vec x_n}{\sum_{n=1}^n i (y_n=c_k)}$, and $\sigma=\frac{1}{n}\sum_{n=1} ^n (\vec X_n-\vec\mu_{y_n}) \cdot (\vec x_n-\vec\mu_{y_n}) ^t$;

(3) Prediction: $y =\mathop{argmax}_{c_k} p (c_k) \cdot p (\vec{x}_{n+1}\text{|} C_k) $.

The difference between GDA and logistic regression is that, compared with GDA, the model hypothesis of logistic regression is weaker, it does not require the data to obey the same variance normal distribution and the scope of application is wider; but the advantage of GDA is that when the data obeys its assumptions, it is more than Logistic regression more accurate, and as long as fewer samples can achieve the same convergence effect. NB and Softmax regression relationship with the similar: Softmax regression does not require each characteristic component obey independent conditional distribution, it can actually replace all satisfy $p (\vec{ x}\text{|} C_k) \propto The generation model of the e^{\vec{w}_k^t\vec{x}}$ distribution hypothesis. Generally, the general model tends to have high bias and low variance (easy to cause underfitting), which is suitable for small training sets; Discriminative model tends to have an bias, high Varian CE (easy to cause overfitting), suitable for large training sets.

References:

1. Bishop, Christopher M. Pattern recognition and machine learning [m]. Singapore:springer, 2006

PRML 4:generative Models

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

PRML 4:generative Models

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

PRML 4:generative Models

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support