Machine Learning---kernel functions

Source: Internet
Author: User
Tags svm

When I talk to you about nuclear, your mind must be like this:

Think of must be boomboom. Talk about the nuclear color change, but today we say the nuclear is more gentle and lovely.

I remember what I said earlier that SVM's nuclear weapon is a nuclear function, and this article can be used as the next article in http://www.cnblogs.com/xiaohuahua108/p/5934282.html. But let me first emphasize that the kernel function is not just used in SVM, but a tool that maps low-dimensional data to high-dimensional data.

The shape is like this:

It would have been two-dimensional data, and now we're mapping it to high dimensions. It is also necessary to illustrate that the low dimension to the high dimension, the dimension does not have a number of standards, may be infinite dimension to infinite dimension.

Introduction to one-core 1.1-core


The kernel method is a class of algorithms for pattern analysis or recognition, and its most well-known use is in support vector machines (SVM). The general task of pattern analysis is to find and study general types of relationships (such as clustering, rankings, principal components, correlations, classifications) in general types of data (such as sequences, text documents, point sets, vectors, images, etc.). The kernel method maps the data to a higher dimensional space, and it is hoped that in this higher dimensional space, the data can become more easily detached or better structured. There is also no constraint on the form of this mapping, which can even lead to infinite dimensional space. However, this mapping function hardly needs to be computed, so it can be said to be a tool for computing the product of high dimensional space in low dimensional space.

Tips for 1.2 cores

Kernel tricks are a very interesting and powerful tool. It is powerful because it provides a linear to non-linear connection and any algorithm that can represent only the dot product between two vectors. It comes from the fact that if we first map our input data to a higher dimensional space, then the effect of my operation in this high dimensional space will be non-linear in the original space.

Now, kernel tricks are interesting because you don't need to calculate the mappings. If our algorithm can only be represented by the inner product of the two vectors, all we need is to replace the inner product with some other suitable space. This is where the "trick" is: regardless of the dot product used, it is replaced by kernel functions. A kernel function represents an inner product in a feature space, usually expressed as:

K (x, y) = <φ (×), φ (y) >

Using kernel functions, the algorithm can then be carried into a higher-dimensional space without explicitly mapping the input points into the space. This is very desirable, because sometimes our high dimensional feature spaces can even be infinite dimensions, so it is impossible to calculate.

That's a big piece of crap, and I'm going to add the black section above, to calculate the dot product of the high-dimensional data in the low dimension.

1.3 The nature of the kernel function

The kernel function must be continuous, symmetric, and preferably should have a positive (semi) definite gram matrix. It is said that the kernel that satisfies the Mercer theorem is a positive half-constant, meaning that their nuclear matrices have only nonnegative eigenvalues. Using a positive kernel ensures that the optimization problem will be convex and the solution will be unique.

However, many nuclear functions that are not strictly defined perform well in practice. An example is the sigmoid kernel, although it is widely used, but it is not positive for some values of its parameters. Boughorbel (2005) Also experiments have shown that only the kernel of the conditional positive definite can outperform most classical kernels in some applications.

The kernel can also be divided into anisotropic static, isotropic static, compact support, local static, unstable or separable non-stationary. In addition, the kernel can also be labeled as scale-invariant (scale invariant) or scale-dependent (scale dependent), which is an interesting attribute because the scale does not change the scale of the data that the kernel drives the training process unchanged.

Supplement: Mercer theorem: Any semi-definite function can be used as a kernel function. The so-called semi-definite function f (xi,xj), refers to a set of training data (x1,x2,... xn), we define a matrix of elements AIJ = f (XI,XJ), the matrix n*n, if the matrix is semi-positive, then F (XI,XJ) is called semi-definite function. This Mercer theorem is not a necessary condition of nuclear function, but a sufficient condition, that is, a function that does not satisfy the Mercer theorem can also be a kernel function.

1.4 How to choose a nuclear

My tutor said is a world-wide problem, I do not understand, if you have to understand, you can privately talk about me oh.

But it is said that the Gaussian nucleus works well.

Two commonly used nuclear 2.1 linear nuclei

The linear kernel is the simplest kernel function. It is given by the inner product <x,y> plus the optional constant c. Kernel algorithms that use linear kernels are usually equal to their non-kernel counterparts, i.e. the KPCA with a linear kernel is the same as the standard PCA.

An expression:

2.2 Polynomial kernel functions

The polynomial core is a non-fixed kernel. The polynomial kernel is well-suited to the problem of normalization of all training data. I remember the problem is usually normalized it??

Expression: K (x, y) = (αx ^ T y + c) ^ D

tunable parameters are slope α, constant term C and polynomial degree d.

2.3 Gauss nucleus

The Gaussian nucleus is an example of a radial basis function nucleus.

Alternatively, it can also be used to implement

The tunable parameter Sigma plays a major role in the performance of the kernel and should be carefully adjusted to the problem at hand. If overestimated, the exponent will be almost linear, and the high-dimensional projection will begin to lose its nonlinear power. On the other hand, if underestimated, the function will lack regularization, and the decision boundary will be highly sensitive to the noise in the training data.

2.4 Core of the index

The exponential nucleus is closely related to the Gaussian nucleus, and only the square of the normal is ignored. It is also a radial basis function core.

An expression:. And the Gaussian nucleus is really like, haha.

2.5 Laplace operator Kernel

The Laplace core is exactly the same as the exponential core, except that it is less sensitive to changes in the Sigma parameter. As an equivalent, it is also a radial basis function core.

An expression:

It is important to note that the observation of the σ parameter of the Gaussian kernel also applies to the exponent and Laplace kernels.

All right, I'll talk about it today, and we'll have time to share the other nuclear functions. If you feel I can write also, trouble point a powder, or point a recommendation OH.

Machine Learning---kernel functions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.