Preface
The Machine learning section records Some of the notes I have learned in the process of learning, including the online course or tutorial's study notes, the reading notes of the papers, the debugging of algorithmic code, the thinking of cuttingedge theory and so on, which will open different column series for different content.
Machine learning is an exciting and fascinating field of research, with both wonderful theoretical formulas and practical engineering techniques, and in the process of learning and applying machine learning algorithms, I am increasingly attracted to this field and hate myself for not being exposed to this magical and great field earlier! But I also feel very fortunate to live in this era of machine learning technology development, and do related work.
The purpose of blogging is to encourage them to learn lessons, to think about algorithmic principles, to deepen technical understanding, and to exercise their ability to express and write. At the same time, I hope that through the sharing of experience to help new friends, to meet the relevant work of friends, but also hope to get the high people of God's criticism!
Preface
[Machine Learning] The Coursera Note series was compiled with notes from the course I studied at the Coursera learning (Andrew ng teacher). The content covers linear regression, logistic regression, Softmax regression, SVM, neural networks, and CNN, among other things, and the main learning materials are from the machine learning tutorials of teacher Andrew Ng in Coursera and UFLDL Tutorial,stanford CS231N and other online courses and tutorial, but also reference a large number of online related information.
This article is mainly organized from "Support Vector machines (SVMs)" Course notes and some classic textbooks, but also reference the online Classic on the SVM blog (listed later). Because the SVM explained by Ng is simply not simple enough, there is too much difference between the traditional SVM tutorial, and many details can not be further elaborated ... I estimate a lot of students in the study of the course will produce a lot of questions, so this article does not comb the knowledge structure of the SVM class (because it is simply deformed, a lot of important knowledge points are skipped), but to give each knowledge point in the course and the regular SVM tutorial contrast, Hope to give the reader a thought.
The article subsections are arranged as follows:
1) basic idea of SVM
2) Optimization target (optimization Objective)
3) assumption function and decision boundary (hypothesis function and decision boundary)
4) What is a support vector
5) kernel functions (kernels function)
6) The final summary
01, the SVM course on Ng
Ms Ng in the Coursera SVM course is simply can not be simpler, the course of SVM even a little distorted, many students in learning Coursera Ng SVM This chapter will produce a lot of questions .... Readers contact with more SVM will find that Miss Ng in order to let us better access to SVM, in order to eliminate our fear of SVM theory, in order to build our confidence, really is to simplify the SVM theory can not be simplified ... No VC dimension, no structural risk concept, no SMO algorithm, no kkt conditions, no Lagrange ... SVM rigorous and beautiful mathematics by Miss Ng all omitted ...
You know, the first paragraph of Microsoft's SVM tutorial is this:
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization.
Most of the basic parts of the textbook will review the function interval and geometric interval, and then derive the optimization goal of SVM from the idea of maximizing the interval, and then explain what the support vector is, what the hard and soft interval classifiers are, and other important details ... However, Miss Ng omitted all of this and tried to present the simplest side of the beginner, giving the reader the least effort to touch the gate of SVM.
Here on the one hand for Miss Ng praise, on the other hand to remind readers, SVM is not really what you see in the NG Teacher's course so simple ...
02, about this blog
It's tricky to sort out this blog about SVM, why it's so tangled, for two reasons:
1 because NG Teacher's SVM course is very simple, explain the idea even with the conventional SVM textbooks are inconsistent (for example, how the SVM), so it is difficult to miss Ng's explanation of ideas and conventional SVM to explain the idea of integration.
2) SVM has a rigorous mathematical theory, involving the theoretical concept and mathematical formula a basket, the article has a lot of indepth, I write a more indepth introduction of the nature of the blog, the significance is not big. If the later time and energy allow, I will be the SVMrelated theoretical algorithm completed a full deduction.
So finally I think this blog is only to organize NG teacher in the Coursera course content, at the same time to do a certain degree of expansion, if the reader in the study NG course when there is a problem, you can find the answer in this blog, if the reader wants to learn more about SVM, it is recommended to find professional information study (I will give some).
03, SVM of highquality blog and tutorial and other information
There are a lot of high quality blogs about SVM on the Internet, here are a few recommended:
JASPERSVM Introductory Series
Http://www.blogjava.net/zhenandaci/category/31868.html
PluskidSupport Vector Machine series
http://blog.pluskid.org/?page_id=683
JerryleadSupport Vector Machine series
http://www.cnblogs.com/jerrylead/tag/Machine%20Learning/
More Good tutorial:
Microsofta Tutorial on support Vector machines for Pattern recognition
https://www.microsoft.com/enus/research/publication/atutorialonsupportvectormachinesforpatternrecognition/
Support Vector and Kernel machines
Http://www.supportvector.net/icmltutorial.pdf
Support Vector Machine (and statistical learning theory) Tutorial
Http://www.cs.columbia.edu/~kathy/cs4701/documents/jason_svm_tutorial.pdf
VC Dimension
http://www.svms.org/vcdimension/

first, the basic idea of SVM
1.1 Explanation of conventional tutorial
SVM (Support Vector mechines) is a linear classifier that defines the largest interval in a feature space. In Hangyuan Li's "Statistical learning method", this is the basic idea of describing SVM:
The basic idea of support vector machine learning is to solve the separation hyper plane which can correctly divide the training data set and the largest geometrical interval.
In other words, SVM not only requires the ability to correctly separate the two classes of samples, but also requires the largest geometry interval, so it is easy to know that the solution of SVM optimization problem is unique. Professional point of view, SVM Learning algorithm is to solve convex twotime programming optimization algorithm.
Based on the above thought, the objective function and constraint conditions of SVM are obtained, and then the optimization target of SVM is obtained.

1.2 ng Teacher's explanation
Let's take a look at what Miss Ng said.
Ng believes that SVM is derived from logistic regression, which is based on logistic regression, modifies sample cost calculation function, regularization parameter, and so on:
Many supervised learning algorithms are derived from logistic regression, such as neural networks, SVMs, and so on, they are more or less related to logistic regression, and even can be said that the classification algorithm originates from logistic regression, the difference of various classification algorithms is mainly the cost function calculation, regularization parameters and so on.
Based on the above thought, Ng deduces the optimization target of SVM.
Ii. Optimization Objectives (optimization Objective)
2.1 Explanation of conventional tutorial
In the conventional SVM textbook, in the course of explaining the objective function of SVM, it involves at least the mathematical concept: function interval and geometric interval. SVM is called a large interval classifier, where the interval refers to the geometric interval.
With these two concepts, the objective function and the constraint conditions of SVM can be deduced by the basic idea of SVM:
Then, using the relationship between the function interval and the geometric interval, the optimization problem is rewritten as:
Using the equivalence relation of Max and Min, and the solution of the coefficients without affecting the optimization problem, the optimization goal of SVM is obtained:
2.2 ng Teacher's explanation
In Ng's course, SVM is derived from logistic regression, then first look at the cost function of logistic regression, and gradually draw out the cost function of SVM.
In logistic regression, for a single sample (x, y), the cost function is:
1) If Y=1, make z=<θ, X>, then the above cost function with the zchange curve such as:
2) If y=0, make z=<θ, X>, then the above cost function with the zchange curve such as:
In SVM, for a single sample (x, y), its cost function is changed to:
So
1) If Y=1, make z=<θ, X>, then the above cost function with the zchange curve such as:
2) If y=0, make z=<θ, X>, then the above cost function with the zchange curve such as:
As you can see, SVM uses a polyline as the new cost function. The new cost function form gives the computational advantage, since the calculation of log (sigmoid (z)) is omitted and a linear function is calculated directly.
In this paper, a comparative analysis of the cost function of a single sample is taken to compare the optimization objective function of logistic regression and SVM:
For logistic regression, the objective function is:
SVM replaces the sample cost calculation in the objective function on the basis of logistic regression, and uses the piecewise linear function which is easier to calculate, so it can be written as:
At the same time, SVM modifies the form of regularization parameters (Ms Ng says this is only because of the convention of support vector machines ...). ), using C instead of 1/λ, and removing the 1/m in the coefficients (without affecting the optimization result), the objective function of the SVM is given below.
Tip1: Why remove 1/m
The objective function multiplied by a constant does not affect the optimal solution, such as minimizing the following two optimization functions, the optimal solution is the same:
TIP2: Why use C instead of 1/λ?
First, it is clear that the parametric solution of the two optimization functions is consistent by replacing 1/λ with the parameter C.
Logistic regression: a+λb
Svm:ca+b
Both λ and C can be understood as the control parameters (regularization parameters) during the optimization of the objective function, and in the optimization process, the optimization of A and B is weighed, that is to say, λ and C are used to control and influence the optimization process.
where's 2.3 C ?
Ng explains this:
If C is very large, then the optimization is focused on: Selecting the parameter makes the first item equal to 0, which leads to a new optimization problem, and the form of the minimized objective function is changed, namely:
Constraints must also be met:
These two constraints are used to guarantee that the first item of the J (θ) is 0, in other words, by setting the C to a very large size, the problem of minimizing the objective function is converted to one: twotime planning with constraints (quadratic programming, QP), the overall description is as follows:
When solving this twotime planning problem, we get a decision boundary with the most interval nature.
The above is the teacher Ng to optimize the goal of SVM derivation process, at that time look here really puzzled, feel not system.
Just say this c, we found no, c no ... The TMD is embarrassing ... We also have to use C to control the degree of regularization ....
2.4 Contrast
Hangyuan Li the optimization target of SVM given in the statistical learning method (not Home plus):
Most of the paper is also this form, each step derivation by the wellreasoned, very good understanding.
Plus the regular term, which is the relaxation variable in SVM, the optimization goal is as follows:
the SVM optimization target given by Miss Ng:
Plus c Very large causes the first item to be 0, plus the constraints to get the final SVM optimization problem:
When I saw this, I was so baffled by it ............ So here I give a contrast, hoping that the same confused readers can enlightened.
assumption functions and decision boundaries (hypothesis function and decision boundary)
The hypothesis function of SVM is discussed in this paper, which is how to output the SVM in a sample. Different tutorial give different views, and there are differences in different SVM open source implementations.
Hangyuan Li The classification decision function given in the statistical learning method is as follows:
In some tutorial, the following classification decision functions may also be given:
The sample that is at the interval rejects the classification.
The difference between the two is actually how the sample is in the interval, the diagram below:
This is not a unified conclusion, but to choose according to the needs.
Iv. What is a support vector
Ng's course does not explain what support vectors are, here is a brief introduction.
：
The positive and negative samples from H have just landed on H1 and H2 respectively, so the sample is the support vector.
As can be seen, in fact, most of the sample for the classification of the superplane has no effect, can decide to separate the superplane is only a small part of the training sample, that is, support vector.
kernel functions (Kernel function)
Kernel function is the essence of SVM, and it is also the key for SVM to deal with the problem of sublinear classification.
The basic idea of the kernel function method is:
By using a transform to map the original feature to the new feature space, the linear classification model is studied in the new feature space, and then the nonlinear classification problem of the original feature space is solved.
In other words, the kernel function method is to solve the problem that the original characteristic space is linearly irreducible, and then the linear variational problem of the new feature space is transformed into one.
Ms Ng's course gives a visual introduction to the kernel functions, and what the kernel functions do can be seen as feature mappings:
and gives a way of mapping:
This is also the Gaussian kernel function.
Teacher Ng is very simple, more about the principle of nuclear function, it is recommended to study the classics.
Vi. Summary of the final
SVM is a good thing, easy to understand, easy to use, now a lot of opensource SVM Toolkit is the SVM algorithm encapsulation Excellent, take the tutorial can be divided into minutes to solve the simple classification problem. But at the same time, the problem of SVM is also very much, the point is also very rich, we also have students in the research group has been studying SVM, so we can not underestimate the SVM this artifact.
Usually when I get a machine learning problem I will take SVM to try first, so that can roughly assess the difficulty of the problem, and in time to find out the difficulty of the task where.
For Ng, the SVM is a relatively simple version, mastering the content of the course, the basic can be used in the task of SVM, such as LIBSVM. But want to understand this artifact deeply, still need to find this textbook to study well.
[Machine Learning] Coursera notesSupport Vector machines