Understanding support Vector Machines (ii) kernel functions

Source: Internet
Author: User

Defined by the previous definition of the kernel function (see statistical learning Method definition 7.6):
Set χ is the input space (Euclidean space or discrete set), η is the feature space (Hilbert space), if there is a mapping from χ to η

φ (x): Χ→η makes for all x,z∈χ, function κ (x,z) =φ (x)? φ (z),
It is called κ (x,z) as a kernel function, φ (x) is a mapping function, φ (x)? φ (z) maps to the inner product of the feature space for X,z.
Because the mapping function is very complicated and difficult to calculate, in practice, the kernel function is usually used to solve the inner product, the computational complexity is not increased, and the mapping function is only a logical map, which represents the mapping relationship between the input space and the feature space. For example:
Set input space χ:,
mapping function φ (x) = < X,x > =
Kernel function κ (x,z) =) 〗^2 "title=" ">
Then, take two sample x= (three-way), z= (4,5,6) by mapping function and kernel function to calculate the inner product process as follows:
φ (x) = (1,2,3,2,4,6,3,6,9)
φ (z) = (16,20,24,20,25,30,24,30,36)
φ (x)? φ (z) =16+40+72+40+100+180+72+180+324=1024
and directly through κ (x,z) calculated [(4+10+18)]^2=1024
In comparison, the computation of the kernel function is obviously much smaller than the mapping function.

From the above, if we know the kernel function, then we can complete the nonlinear transformation without increasing the computational complexity. So our next task is to determine the kernel function.
What we call the kernel function is the positive definite kernel function, what function is the positive definite kernel function?
The necessary and sufficient conditions for determining the positive definite nucleus are given here (see statistical learning method theorem 7.5):
Κ:χxχ→r is a symmetric function, the necessary and sufficient condition of Κ (X,Z) is the positive definite kernel function is the m,κ matrix corresponding to any x_i∈χ,i=1,2,..., x,z (gram):

is a semi-definite matrix.
The equivalent definition of the positive definite nucleus can be given by the necessary and sufficient conditions:
Set χ as the input space, Κ (X,Z) is defined in the χxχ symmetric function, if the x_i∈χ,i=1,2 matrix corresponds to any m,κ,..., x,z (gram):
is a semi-positive definite matrix, then called κ (x,z) is a positive definite nucleus.
A function that conforms to such a condition, we call it a positive definite kernel function.
But it is worth mentioning that the definition is to determine the positive definite nucleus, there are also nuclear functions are non-positive, such as multivariate two-time kernel function:
In practical applications, we often use the Mercer theorem to determine the kernel function. The kernel function obtained by the Mercer theorem is called the Mercer Nucleus, and the definition of the positive definite nucleus and the Mercer nucleus are as follows:

As can be seen from the above definition, the positive definite nucleus is more general than the Mercer nucleus, because the positive definite nucleus requires the function to define the symmetric function on the space, while the Mercer kernel requires the function to be a symmetric continuous function.

Ii. Common kernel functions
1. Linear kernel function
The linear kernel function is the simplest kernel function and is a special case of the radial basis kernel function, the formula is:

It is mainly used for the linear variational case, which corresponds to the linear-supported vector machine and the linear support vector machine of the previous speech. It finds the optimal linear classifier in the primitive space, and has the advantage of fast parameter and low speed.
2. Polynomial kernel function
The polynomial kernel is suitable for orthogonal normalization (vector orthogonal and modulo 1) data, the formula is:
A polynomial kernel function is a global kernel function that allows data points that are far apart to have an effect on the value of a kernel function. The larger the parameter D, the higher the dimension of the map, and the greater the computational capacity. When D is too large, because the learning complexity will be too high, prone to "over-fitting phenomenon."
3. Radial basis core function & Gaussian kernel function
The radial basis kernel function belongs to the local kernel function, and the value becomes smaller when the number of points is farther away from the center point. The formula is:
The Gaussian kernel function can be seen as another form of radial basis kernel function:
The noise in the Gaussian radial basis check data has good anti-jamming ability, because of its strong locality, its parameters determine the function range, weaken with the increment of the parameter σ.
4. sigmoid kernel function
The sigmoid kernel function is derived from neural networks and is widely used in deep learning and machine learning. The formula is:
When using sigmoid function as kernel function, support vector machine is a kind of multilayer perceptron neural network. The theoretical basis of support vector machine (convex two-time plan) determines the global optimal value rather than the local optimal value, and also guarantees its good generalization ability to unknown samples.
5. String kernel function
Kernel functions can be defined not only in Euclidean space, but also on the set of discrete data. String kernel function is a kernel function defined on a set of strings, which can be intuitively understood to measure the similarity of a pair of strings, and is applied in text categorization, information retrieval and so on.

Understanding support Vector Machines (ii) kernel functions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.