SVM for Linear Regression
Method Analysis
In a sample dataset (), it is not a simple discrete value, but a continuous value. For example, in linear regression, the price is predicted. For linear regression, the target function is a regular square error function:
In the SVM regression algorithm, the objective is to train the hyperplane and use it as the predicted value. In order to obtain sparse solutions, that is, to calculate the hyperplane parameter W, B does not rely on all sample data, but part of the data (such as the SVM classification algorithm, which defines the Support Vector) and uses the error function.
The error function is defined as: if the difference between the predicted value and the actual value is smaller than the threshold, the sample will not be punished. If the threshold is exceeded, the penalty is.
Is the image of the error function and the square error function.
Target Function
Observe the form of the above error function. We can see that a pipeline is actually formed, and the sample points in the pipeline are not punished, so it is called, such as the shadow red part.
The square error item is replaced, so the minimization of error functions can be defined as the optimization goal:
Because the above objective functions contain absolute values, they cannot be tiny. We can transform it into a constraint optimization problem. A common method is to define two relaxation variables for each sample data to indicate the distance between measurements.
As shown in:
When the actual value of the sample point is above the pipe, it is written as an expression,
When the actual value of the sample point is below the pipe, it is written as an expression,
Therefore, the condition that each sample point is in the pipeline is:
When you are at the top of the MPs queue
When you are at the bottom of the MPs queue
The error function can be written into a convex quadratic optimization problem:
Constraints:
Written as the Laplace function:
Dual Problem
The above issues are extremely minor issues
Same as the SVM classification analysis method, it is rewritten as a dual problem.
First, obtain the partial derivative.
Hyperplane computing
Support vector machine-regression (SVR)
Support vector machine can also be used as a regression method, maintaining all the main features that characterize the algorithm (maximal margin ). the Support Vector Regression (SVR) uses the same principles as the SVM for classification, with only a few minor differences. first of all, because output is a real number it becomes very difficult to predict the information at hand, which has infinite possibilities. in the case of regression, a margin of tolerance (epsilon) is set in approximation to the SVM which wowould have already requested from the problem. but besides this fact, there is also a more complicated reason, the algorithm is more complicated therefore to be taken in consideration. however, the main idea is always the same: to minimize error, individualizing the hyperplane which maximizes the margin, keeping in mind that part of the error is tolerated.
Linear SVR
Non-linear SVR
The kernel functions transform the data into a higher dimensional feature space to make it possible to perfom the linear Separation
Kernel functions
SVM for Linear Regression