PRML 5:kernel Methods

Last Update:2015-06-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A kernel function implicitly maps a data point to some high-dimensional feature space and substitutes for the inner prod UCT of feature vectors, so, a non-linearly separable classification problem can is converted into a linearly separ Able One. This trick can is applied to many feature vector-based models such as SVM, which we had introduced in previous articles.

To test the validity of a kernel function, we need the Mercer theorem: function $k: \mathbb{r}^m\times\mathbb{r}^m \rightarrow\mathbb{r}$ is a Mercer kernel iff for all finite sets $\{\vec{x}_1,\vec{x}_2,..., \vec{x}_n\}$, the CO rresponding kernel matrix is proved to be symmetric positive semi-definite.

One of the good kernel functions is the Gaussian kernel $k (\vec{x}_m,\vec{x}_n) =exp\{-\frac{1}{2\sigma^2}| | \vec{x}_m-\vec{x}_n| | ^2\}$, which has infinite dimensionality. Another one is the polynomial kernel $k (\vec{x}_m,\vec{x}_n) = (\vec{x}_m^t\vec{x}_n+c) ^m$ with $c >0$. In reality, we can construct a new kernel function with some simple valid kernels according to some properties.

We can also use a generative model to define kernel functions, such as:

(1) $k (\vec{x}_m,\vec{x}_n) =\int p (\vec{x}_m\text{|} \vec{z}) \cdot P (\vec{x}_n\text{|} \vec{z}) \cdot P (\vec{z}) \cdot d\vec{z}$, where $\vec{z}$ is a latent variable;

(2) $k (\vec{x}_m,\vec{x}_n) =g (\vec{\theta},\vec{x}) ^tf^{-1}g (\vec{\theta},\vec{x}) $, where $g (\vec{\theta},\vec{x }) =\bigtriangledown_{\vec{\theta}}ln{p (\vec{x}\text{|} \vec{\theta})}$ is the Fisher score,

and $F =\frac{1}{n}\sum_{n=1}^n g (\vec{\theta},\vec{x}_n) g (\vec{\theta},\vec{x}_n) ^t$ is the Fisher information Matrix.

　　Gaussian Process is a probabilistic discriminative model, where an assumption are made that the set of values of $ Y (x) $ evaluated at a arbitrary set of points $\{\vec{x}_1,\vec{x}_2,..., \vec{x}_n\}$ is jointly Gaussian distributed. Here is the kernel matrix to determine the covariance.

　　Gaussian Process for Regression:

Typically, we choose $k (\vec{x}_m,\vec{x}_n) =\theta_0 exp\{-\frac{\theta_1}{2}| | \vec{x}_n-\vec{x}_m| | ^2\}+\theta_2+\theta_3 \vec{x}_m^t\vec{x}_n$, and assume that:

(1) Prior distribution $p (\vec{y}_n) =gauss (\vec{y}_n\text{|} \vec{0},k_n) $;
(2) Likelihood $p (\vec{t}_n\text{|} \vec{y}_n) =gauss (\vec{t}_n\text{|} \vec{y}_n,\beta^{-1}i_n) $.

Then, we have $p (\vec{t}_n) =\int p (\vec{t}_n\text{|} \vec{y}_n) \cdot P (\vec{y}_n) \cdot D\vec{y}_n=gauss (\vec{t}_n\text{|} \vec{0},k_n+\beta^{-1}i_n) $. Here, $p (\vec{t}_n) $ are the likelihood of hyperparameter $\vec{\theta}$, and we can make a MLE to learn $\vec{\theta}$.

Also, $p (\vec{t}_{n+1}) =gauss (\vec{t}_{n+1}\text{|} \VEC{0},K_{N+1}+\BETA^{-1}I_{N+1}) $. Hence, denote $\vec{k}=[k (\vec{x}_1,\vec{x}_{n+1}), K (\vec{x}_2,\vec{x}_{n+1}),..., K (\vec{x}_n,\vec{x}_{n+1})]^T$ , then we can get the conditional Gaussian $p (\vec{t}_{n+1}\text{|} \vec{t}_n) = Gauss (\vec{k}^t (k_n+\beta^{-1}i_n) ^{-1}\vec{t}_n,k (\vec{x}_{n+1},\vec{x}_{n+1})-\vec{k}^T (K_N+\beta ^{-1}i_n) ^{-1}\vec{k}+\beta^{-1}) $

　　Gaussian Process for classification:

We make a assumption that $p (t_n\text{|} A_n) =\sigma (A_n) $, and take the following steps:

(1) Calculate $p (\vec{a}_n\text{|} \vec{t}_n) $ by Laplace approximation;

(2) Given $p (\vec{a}_n\text{|} \vec{t}_n) $ and $p (\vec{a}_{n+1}\text{|} \VEC{T}_{N+1}) $, $p (a_{n+1}\text{|} \vec{a}_n) $ is a conditional Gaussian;

(3) $p (a_{n+1}\text{|} \vec{t}_n) =\int p (a_{n+1}\text{|} \vec{a}_n) \cdot p (\vec{a}_n\text{|} \vec{t}_n) \cdot d\vec{a}_n$;

(4) $p (t_{n+1}\text{|} \vec{t}_n) =\int \sigma (a_{n+1}) \cdot P (a_{n+1}\text{|} \vec{t}_n) \cdot d\vec{a}_{n+1}$.

References:

1. Bishop, Christopher M. Pattern recognition and machine learning [m]. Singapore:springer, 2006

PRML 5:kernel Methods

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

PRML 5:kernel Methods

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

PRML 5:kernel Methods

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support