Recent job hunting really panic, on the one hand to see machine learning, on the one hand also brush code. Let's just go ahead and take a look at the course because I feel really good about it. Can ask what kind of move brick work on the fate of it.
The core of this class is how to trick the kernel to the logistic regression.
Firstly, the expression of relaxation variable is modified, and the form of constrained is changed into unconstrained form.
After changing to this ' unconstrained ' form of soft-margin SVM, it suddenly found much like L2 regularization
If you look at SVM with regularized model, you can correspond to C with Lambda.
Above only said Soft-margin SVM and L2 regularization in the form of comparison. The similarities are analyzed from the perspective of Erro measure.
From the point of view of error measure, SVM does look more like logreg.
From the perspective of binary classification, Soft-margin SVM and Logreg L2
(1) Soft-margin SVM and Logreg can bound the PLA's error measure
(2) Soft-margin SVM with Logreg curve looks like
What do you mean by saying so much? I think Lin is going to say things like this:
(1) The binary classification of Logistic regression is good, the kernel of SVM is good
(2) I moved the kernel trick to logreg inside.
First, a probabilistic SVM algorithm is given.
The specific approach is in two steps:
(1) using kernel Soft-margin SVM first to find out the W ' SVM and BSVM based on data
(2) Introduction of A and b two variables into the Logreg (a Do size change, b do intercept translation changes)
There are two benefits in this way:
(1) The benefits of dual SVM can be used to introduce the kernel trick directly to the
(2) The expression is a, B unconstrained extremum problem, can be solved by gradient method, etc.
The "a" here should preferably be positive, where B should be the initial value is very small (otherwise, the effect of the original SVM is too poor)
The above method is just an approximate method of combining SVM with Logreg. In fact, there are more exact kernel trick used on the Logreg.
The core of the kernel trick is that w can be represented as a linear combination of input vectors (represented by data)
The PLA SVM has been proven, and so is the Logreg.
So this can be promoted.
In fact, it can be, for L2 regularization this form of linear model is possible, as follows.
The question above is: in exactly the linear model that conforms to L2, W can certainly be expressed as a linear combination of Zn.
The more intuitive proof here is that the core is to split w into components parallel to the Z space and the components of the vertical and z spaces.
It's easy to prove
(1) The component perpendicular to the Z-space does not play a role in the latter err
(2) For the previous item, if W has a component perpendicular to the z space, it is certainly not the minimum value, at least the vertical component must be removed.
Representer theorem is feasible for l2-regularized linear model.
Therefore, this conclusion is great, l2-regularized linear model can be kernlized.
So this l2-logreg problem is solved, because already representer theorem let us already know the form of W.
So, it becomes the unconstrained optimization problem of N-beta directly. Then kernel trick can migrate past for Logreg.
From another point of view, in fact, the original solution of l2-logreg problem, it has been transformed into a beta space solution problem. Most of the beta may not be 0 and will take up a lot of computing resources.
"Kernel Logistic Regression" heights Field machine learning technology