PRML 5:kernel Methods

Source: Internet
Author: User

A kernel function implicitly maps a data point to some high-dimensional feature space and substitutes for the inner prod UCT of feature vectors, so, a non-linearly separable classification problem can is converted into a linearly separ Able One. This trick can is applied to many feature vector-based models such as SVM, which we had introduced in previous articles.

To test the validity of a kernel function, we need the Mercer theorem: function $k: \mathbb{r}^m\times\mathbb{r}^m \rightarrow\mathbb{r}$ is a Mercer kernel iff for all finite sets $\{\vec{x}_1,\vec{x}_2,..., \vec{x}_n\}$, the CO rresponding kernel matrix is proved to be symmetric positive semi-definite.

One of the good kernel functions is the Gaussian kernel $k (\vec{x}_m,\vec{x}_n) =exp\{-\frac{1}{2\sigma^2}| | \vec{x}_m-\vec{x}_n| | ^2\}$, which has infinite dimensionality. Another one is the polynomial kernel $k (\vec{x}_m,\vec{x}_n) = (\vec{x}_m^t\vec{x}_n+c) ^m$ with $c >0$. In reality, we can construct a new kernel function with some simple valid kernels according to some properties.

We can also use a generative model to define kernel functions, such as:

(1) $k (\vec{x}_m,\vec{x}_n) =\int p (\vec{x}_m\text{|} \vec{z}) \cdot P (\vec{x}_n\text{|} \vec{z}) \cdot P (\vec{z}) \cdot d\vec{z}$, where $\vec{z}$ is a latent variable;

(2) $k (\vec{x}_m,\vec{x}_n) =g (\vec{\theta},\vec{x}) ^tf^{-1}g (\vec{\theta},\vec{x}) $, where $g (\vec{\theta},\vec{x }) =\bigtriangledown_{\vec{\theta}}ln{p (\vec{x}\text{|} \vec{\theta})}$ is the Fisher score,

and $F =\frac{1}{n}\sum_{n=1}^n g (\vec{\theta},\vec{x}_n) g (\vec{\theta},\vec{x}_n) ^t$ is the Fisher information Matrix.

  Gaussian Process is a probabilistic discriminative model, where an assumption are made that the set of values of $ Y (x) $ evaluated at a arbitrary set of points $\{\vec{x}_1,\vec{x}_2,..., \vec{x}_n\}$ is jointly Gaussian distributed. Here is the kernel matrix to determine the covariance.

  Gaussian Process for Regression:

Typically, we choose $k (\vec{x}_m,\vec{x}_n) =\theta_0 exp\{-\frac{\theta_1}{2}| | \vec{x}_n-\vec{x}_m| | ^2\}+\theta_2+\theta_3 \vec{x}_m^t\vec{x}_n$, and assume that:

(1) Prior distribution $p (\vec{y}_n) =gauss (\vec{y}_n\text{|} \vec{0},k_n) $;
(2) Likelihood $p (\vec{t}_n\text{|} \vec{y}_n) =gauss (\vec{t}_n\text{|} \vec{y}_n,\beta^{-1}i_n) $.

Then, we have $p (\vec{t}_n) =\int p (\vec{t}_n\text{|} \vec{y}_n) \cdot P (\vec{y}_n) \cdot D\vec{y}_n=gauss (\vec{t}_n\text{|} \vec{0},k_n+\beta^{-1}i_n) $. Here, $p (\vec{t}_n) $ are the likelihood of hyperparameter $\vec{\theta}$, and we can make a MLE to learn $\vec{\theta}$.

Also, $p (\vec{t}_{n+1}) =gauss (\vec{t}_{n+1}\text{|} \VEC{0},K_{N+1}+\BETA^{-1}I_{N+1}) $. Hence, denote $\vec{k}=[k (\vec{x}_1,\vec{x}_{n+1}), K (\vec{x}_2,\vec{x}_{n+1}),..., K (\vec{x}_n,\vec{x}_{n+1})]^T$ , then we can get the conditional Gaussian $p (\vec{t}_{n+1}\text{|} \vec{t}_n) = Gauss (\vec{k}^t (k_n+\beta^{-1}i_n) ^{-1}\vec{t}_n,k (\vec{x}_{n+1},\vec{x}_{n+1})-\vec{k}^T (K_N+\beta ^{-1}i_n) ^{-1}\vec{k}+\beta^{-1}) $

  Gaussian Process for classification:

We make a assumption that $p (t_n\text{|} A_n) =\sigma (A_n) $, and take the following steps:

(1) Calculate $p (\vec{a}_n\text{|} \vec{t}_n) $ by Laplace approximation;

(2) Given $p (\vec{a}_n\text{|} \vec{t}_n) $ and $p (\vec{a}_{n+1}\text{|} \VEC{T}_{N+1}) $, $p (a_{n+1}\text{|} \vec{a}_n) $ is a conditional Gaussian;

(3) $p (a_{n+1}\text{|} \vec{t}_n) =\int p (a_{n+1}\text{|} \vec{a}_n) \cdot p (\vec{a}_n\text{|} \vec{t}_n) \cdot d\vec{a}_n$;

(4) $p (t_{n+1}\text{|} \vec{t}_n) =\int \sigma (a_{n+1}) \cdot P (a_{n+1}\text{|} \vec{t}_n) \cdot d\vec{a}_{n+1}$.

References:

1. Bishop, Christopher M. Pattern recognition and machine learning [m]. Singapore:springer, 2006

PRML 5:kernel Methods

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.