Through learning, we can analyze and summarize rules from known facts, and make correct predictions and judgments on future phenomena or unobserved phenomena based on rules, that is, the cognitive promotion capability is obtained. In the study of intelligent machines, people also hope to use machines (computers) to simulate good learning abilities. This is a machine learning problem. Data-based machine learning is an important aspect of modern intelligent technology. The purpose of machine learning is to find the internal dependency between data through learning known data, in this way, the prediction and judgment capabilities of unknown data are obtained. In the past decade, artificial neural networks have developed a powerful parallel processing mechanism and arbitrary function approximation capabilities, learning and self-organizing and adaptive capabilities have been widely used in the fields of pattern recognition, prediction, and decision-making. However, neural networks are greatly affected by the complexity of the network structure and the complexity of samples.
Or low generalization ability. In particular, neural network learning algorithms lack quantitative analysis and complete theoretical support, and do not actually promote the essence of the learning process.
One of the important theoretical foundations of existing machine learning methods is statistics. The traditional statistical research is about the theory of approximation when the number of samples tends to be infinite. The existing learning methods are mostly based on this assumption. However, in practice, the number of samples is often limited, so some theoretically excellent learning methods may not be satisfactory in practice.
Compared with traditional statistics, statistical learning theory (Statistical Learning Theory or SLT) is a theory dedicated to studying the law of machine learning in small samples. vapnik and others have been devoted to this research since and. By the middle of, with the continuous development and maturity of their theories [17], due to the lack of substantial progress in theory in learning methods such as neural networks, statistical Learning Theory is becoming more and more widely valued.
Statistical Learning Theory is based on a solid set of theories and provides a unified framework for solving the problem of Finite Sample learning. It can include many existing methods and is expected to help solve many difficult problems (such as Neural Network Structure Selection and Local Minimization). At the same time, based on this theory, a new general learning method-Support Vector Machine (SVM) has been developed. It has initially demonstrated many advantages over existing methods. Some scholars believe that SVM is becoming a new research hotspot after the study of neural networks, and will effectively promote the development of machine learning theories and techniques.
SVM is a good method to minimize structural risks. Its machine learning strategy is the principle of minimizing structural risks. In order to minimize the expected risks, the empirical risks and confidence ranges should be minimized at the same time)
The basic idea of SVM is as follows:
(1) It is a learning machine dedicated to a finite sample. It minimizes the structural risk and seeks a compromise between the precision of the given data approximation and the complexity of the approximation function, in order to obtain the best promotion capability;
(2) It finally solves the problem of convex quadratic planning. Theoretically, the global optimal solution is obtained, solving the local extreme value that cannot be avoided in the neural network method;
(3) It converts the actual problem to a high-dimensional feature space through a non-linear transformation, and constructs a linear decision-making function in the high-dimensional space to implement the non-linear decision-making function in the original space, it cleverly solves the dimension problem and ensures good promotion capability, and the algorithm complexity is irrelevant to the sample dimension.
Currently, SVM algorithms are used in pattern recognition, regression estimation, and probability density function estimation. Moreover, the efficiency and accuracy of SVM algorithms have exceeded or equal to those of traditional learning algorithms.
For empirical risk R, different loss functions can be used to describe them, such as E insensitive functions, quadratic functions, Huber functions, and Laplace functions.
Kernel functions include polynomial kernel, Gaussian radial kernel, exponential radial kernel, multi-hidden layer sensing kernel, Fourier series kernel, spline kernel, and B-spline kernel, although some experiments show that different kernel functions in classification can produce almost the same results, in regression, different kernel functions often have a greater impact on the fitting results.
The support vector regression algorithm is mainly used to construct a linear decision function in a high-dimensional space after dimension increase to implement linear regression. When e is not sensitive, the basis is mainly e-insensitive functions and kernel function algorithms. If a fitting mathematical model is used to express a certain curve of a multi-dimensional space, the result obtained from the non-sensitive e function includes the "e Pipeline" of the curve and the training point ". Among all the sample points, only the part of the sample points distributed on the "pipe wall" determines the pipeline position. This part of the training sample is called "Support Vector ". To adapt to the non-linearity of the training sample set, the traditional fitting method usually adds a higher order item after the linear square. This method is effective, but the resulting adjustable parameters do not increase the risk of overfitting. The support vector regression algorithm uses core functions to resolve this conflict. Replace linear terms in Linear Equations with kernel functions
The original linear algorithm can be "nonlinear", that is, non-linear regression can be performed. At the same time, the introduction of kernel functions achieves the goal of "increasing dimensions", while the added adjustable parameters are still controllable by overfitting.
Give several useful links: http://www.support-vector.net/index.html
Http://www.support-vector.net/software.html
Http://www.csie.ntu.edu.tw /~ Cjlin/libsvm/
Http://www.isis.ecs.soton.ac.uk/isystems/kernel/
Http://www.ecs.soton.ac.uk /~ SRG/publications/pdf/svmations
Http://www.kernel-machines.org/Good Books: 1. Vapnik. Statistical Learning Theory. Springer, N. Y., 1998.
2. Vapnik. The nature of Statistical Learning Theory. Springer, N. Y., 1995.
3. Steve R. Gunn. Support Vector Machines for classification and regression. University of southamton, 1997.
4. I am Pei, Sun Deshan. Modern data analysis. Beijing: Machinery Industry Press, 2006