Programmer Training Machine Learning SVM algorithm sharing

Last Update:2015-01-06 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.csdn.net/article/2012-12-28/2813275-Support-Vector-Machine

absrtact: support vector Machine (SVM) has become a very popular algorithm. This paper mainly expounds how SVM works, and also gives some examples of using Python scikits library. As an algorithm for training machine learning, SVM can be used to solve classification and regression problems, and kernel trick technology is used to transform data, and then to find an optimal boundary in the possible output according to the conversion information.

"CSDN report" Support vector Machines (SVM) has become a very popular algorithm. In this article, Greg lamp simply explains how it works, and he gives a few examples of using the Python scikits library. All the code is available on GitHub, and Greg lamp will further elaborate on the details of using scikits and Sklearn. CSDN This technical article is compiled and organized:

What is SVM?

SVM is an algorithm for training machine learning that can be used to solve classification and regression problems, and also uses a technique called kernel trick to convert data, and then, based on these transformations, to find an optimal boundary in the possible output. In short, it's about doing some very complex data transformations and then calculating how to separate the user's data based on predefined labels or outputs.

What makes it so powerful?

Of course, for SVM, it is fully capable of classifying and returning. In this article, Greg Lamp focuses on how to classify using SVM, especially non-linear SVM or SVM using nonlinear kernels. Non-linear SVM means that the algorithm calculates a boundary that is not necessarily a straight line, and the advantage is that it captures more complex relationships between the set of data points without having to rely on the user to perform the difficult transformations themselves. The disadvantage is that the training time is much longer because of more computational capacity.

What is kernel trick?

Kernel trick docking The data you receive: type in some of the features you think are more obvious, sort them out, and output some data that you don't really know, which is like unlocking a DNA strand. You start by looking for the vector of the data, and then pass it on to kernel trick, and then continue to decompose and recombine until a larger data set is formed, and often the data you see is very difficult to understand. This is where the magic is, the extended data set has a more pronounced boundary, and the SVM algorithm can calculate a more optimized hyper-plane.

Second, suppose you are a farmer and now you have a problem-you need to build a fence to prevent wolves from harming the herd. But where should the fence be built? If you are a data-driven farmer, then you need to build a "classifier" on your ranch, based on the position of the herd and the wolves, and compare the different classifiers as shown, we can see that the SVM has completed a perfect solution. Greg Lamp sees the story as a beautiful illustration of the advantages of using a non-linear classifier. It is obvious that both the logical pattern and the decision tree pattern are using a straight line approach.

The implementation code is as follows: farmer.py Python

Import NumPy as NP
Import Pylab as Pl
From Sklearn import SVM
From Sklearn import Linear_model
From Sklearn import tree
Import Pandas as PD
def plot_results_with_hyperplane (CLF, Clf_name, DF, PLT_NMBR):
X_min, X_max = df.x.min ()-. 5, Df.x.max () +. 5
Y_min, Y_max = df.y.min ()-. 5, Df.y.max () +. 5
# Step between points. i.e. [0, 0.02, 0.04, ...]
step =.
# to plot the boundary, we ' re going to create a matrix of every possible point
# then label each point as a wolf or cow using our classifier
xx, yy = Np.meshgrid (Np.arange (x_min, X_max, step),
Np.arange (y_min, Y_max, Step))
Z = clf.predict (Np.c_[xx.ravel (), Yy.ravel ()])
# This gets we predictions back into a matrix
Zz = z.reshape (xx.shape)
# Create a subplot (we ' re going to the than 1 plot on a given image)
Pl.subplot (2, 2, PLT_NMBR)
# Plot the Boundaries
Pl.pcolormesh (xx, yy, Z, cmap=pl.cm.Paired)
# Plot the Wolves and cows
For animal in Df.animal.unique ():
Pl.scatter (df[df.animal==animal].x,
df[Df.animal==animal].y,
marker=Animal,
label="cows" if animal== "x" Else "Wolves",
color=' black ',
c=df.animal_type, cmap=pl.cm.Paired)
Pl.title (Clf_name)
Pl.legend (loc="best")
data = Open ("Cows_and_wolves.txt"). Read ()
data = [Row.split (' \ t ') for row in Data.strip (). Split (' \ n ')]
Animals = []
For Y, row in Enumerate (data):
For x, item in enumerate (ROW):
# x ' s is cows, O ' s is Wolves
If item in [' O ', ' x ']:
Animals.append ([x, Y, item])
DF = PD. DataFrame (Animals, columns=["x", "Y", "Animal"])
df[' animal_type ' = df.animal.apply (lambda x:0 if x== "x" Else 1)
# Train using the X and Y position coordiantes
Train_cols = ["X", "Y"]
Clfs = {
"SVM": SVM. SVC (),
"Logistic": Linear_model. Logisticregression (),
"Decision tree": Tree. Decisiontreeclassifier (),
}
PLT_NMBR = 1
For Clf_name, CLF in Clfs.iteritems ():
Clf.fit (Df[train_cols], Df.animal_type)
Plot_results_with_hyperplane (CLF, Clf_name, DF, PLT_NMBR)
PLT_NMBR + = 1
Pl.show ()

Let the SVM do some more difficult work!

Admittedly, if the relationship between the independent variable and the dependent variable is nonlinear, it is difficult to approach the accuracy of SVM. If it's still hard to understand, take a look at the following example: Suppose we have a set of datasets that contain green and red point sets. We first plot their coordinates, which form a concrete shape-with a red outline, surrounded by green (which looks like the flag of Bangladesh). If, for some reason, we lose 1/3 of the data set, then as we recover, we want to find a way to maximize the contours of this lost 1/3 part.

So how do we speculate that the missing 1/3 part is closest to what shape? One way is to create a model that uses nearly 80% of the data information as a "training set." Greg Lamp chooses three different data models to try out separately:

Logical Model (GLM)
Decision Tree Model (DT)
Svm

Greg lamp trains each data model and then uses the models to speculate on the loss of 1/3 parts of the dataset. We can look at the results of these different models:

The implementation code is as follows: svmflag.py Python

Import NumPy as NP
Import Pylab as Pl
Import Pandas as PD
From Sklearn import SVM
From Sklearn import Linear_model
From Sklearn import tree
From Sklearn.metrics import Confusion_matrix
X_min, X_max = 0,
Y_min, Y_max = 0, ten
Step =. 1
# to plot the boundary, we ' re going to create a matrix of every possible point
# then label each point as a wolf or cow using our classifier
xx, yy = Np.meshgrid (Np.arange (x_min, X_max, Step), Np.arange (y_min, Y_max, Step))
DF = PD. DataFrame (data={' x ': Xx.ravel (), ' Y ': Yy.ravel ()})
df[' color_gauge ' = (df.x-7.5) **2 + (df.y-5) **2
df[' Color ' = df.color_gauge.apply (lambda x: "Red" if x <=-Else "green")
df[' color_as_int ' = df.color.apply (lambda x:0 if x== "red" Else 1)
Print "Points on flag:"
Print df.groupby (' color '). Size ()
Print
Figure = 1
# Plot a figure for the entire dataset
For color in Df.color.unique ():
idx = df. Color==color
Pl.subplot (2, 2, figure)
Pl.scatter (df[idx].x, df[idx].y, colorcolor=color)
Pl.title (' Actual ')
Train_idx = df.x <
Train = Df[train_idx]
Test = Df[-train_idx]
Print "Training Set Size:%d"% Len (train)
Print "Test Set Size:%d"% len (test)
# Train using the X and Y position coordiantes
cols = ["X", "Y"]
Clfs = {
"SVM": SVM. SVC (degree=0.5),
"Logistic": Linear_model. Logisticregression (),
"Decision tree": Tree. Decisiontreeclassifier ()
}
# racehorse different classifiers and plot the results
For Clf_name, CLF in Clfs.iteritems ():
Figure + = 1
# Train the classifier
Clf.fit (Train[cols], train.color_as_int)
# Get the predicted values from the test set
test[' predicted_color_as_int ' = Clf.predict (Test[cols])
test[' Pred_color ']
= Test.predicted_color_as_int.apply (lambda x: "Red" if x==0 Else "green")
# Create a new subplot on the plot
Pl.subplot (2, 2, figure)
# Plot each predicted color
For color in Test.pred_color.unique ():
# plot only rows where pred_color are equal to color
idx = test. Pred_color==color
Pl.scatter (test[idx].x, test[idx].y, colorcolor=color)
# Plot the training set as well
For color in Train.color.unique ():
idx = train. Color==color
Pl.scatter (train[idx].x, train[idx].y, colorcolor=color)
# Add a dotted line to show the boundary between the training and test set
# (Everything to the right of the "in the test set)
#this plots a vertical line
train_line_y = np.linspace (y_min, Y_max) #evenly spaced array from 0 to ten
train_line_x = np.repeat (ten, Len (train_line_y))
#repeat (threshold for Traininset) n times
# Add a black, dotted line to the subplot
Pl.plot (train_line_x, train_line_y, ' k--', color="Black")
Pl.title (Clf_name)
Print "Confusion Matrix for%s:"% clf_name
Print Confusion_matrix (Test.color, Test.pred_color)
Pl.show ()

Conclusion:

From these experimental results, there is no doubt that SVM is the absolute winner. But for the reasons, we might as well look at the DT model and the GLM model. It is clear that they are all using a straight line boundary. Greg Lamp's input model does not contain any conversion information when calculating the relationship between non-linear x, Y, and color. If Greg Lamp could define some specific conversion information so that the GLM model and the DT model would be able to output better results, why would they waste their time? In fact, there is no complex conversion or compression, SVM only analyzes the wrong 117/5000 points set (up to 98% accuracy, compared to the DT model is 51%, and GLM model only 12%!) ）

Where is the limitation?

Many people have doubts, since SVM is so powerful, but why can't we use SVM for everything? Unfortunately, the most magical part of SVM happens to be its biggest weakness! Complex Data transformation information and the results of boundary generation are difficult to elaborate. That's why it's often called black box, and the GLM model is just the opposite of the DT model, and they're easy to understand. (Compile/@CSDN Wang Peng, review/Zhonghao)

This article is compiled for CSDN, not reproduced without permission. If you want to reprint please contact [email protected]

Programmer Training Machine Learning SVM algorithm sharing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More