Compare Svm,mars and Bruto (R language) with a simple example

Last Update:2018-03-03 Source: Internet

Author: User

Tags daty svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Background restatement

This paper is the reproduction process of Table 12.2 in esl:12.3 support vector machine and kernel. The specific questions are as follows:
100 observations are generated in two categories. The first class has 4 standard normal independent features \ (x_1,x_2,x_3,x_4\). The second class also has four standard normal independent features, but the condition is \ (9\le \sum x_j^2\le 16\). This is a relatively simple question. Considering the second more difficult problem, we use 6 standard Gaussian noise features as the augmented feature.

Generate Data

# # ####################################### Generate dataset## # # ' No Noise Features ': num_noise = 0## ' Six Noise Features ' : num_noise = 6## #################################### #genXY <- functionn =  -,num_noise = 0) {# # class 1 M1 = Matrix(Rnorm(N (4+num_noise)),Ncol = 4+ Num_noise) # # class 2 m2 = Matrix(Nrow =NNcol = 4+ num_noise) for (I-in1: N) {while (TRUE) {m2[i,] = Rnorm(4+ num_noise) TMP = sum(M2[i,1:4]^2) if (TMP >= 9& TMP <=  -) Break}} X = Rbind(M1, M2) Y = Rep(C(1,2),Each =Nreturn(Data.frame(X =XY = As.factor(Y)))}

Model Training

SVM directly calls e1071 functions in a package svm
Both Bruto and Mars are call mda packages, and since both are used for regression, when converting to classification, the distance between the fitting value and the category label is compared, and the closer the class is divided
The original book mentions that the Mars does not limit the number of orders, but the actual programming, the set order of 10

Cross-validation Select the appropriate\ (c\)

I choose in two steps:

Coarse selection: Finding the optimal (c\) in a larger range
Subdivision: Subdivision near the best value selected in the previous step

Be careful to avoid the optimal value at the boundary value. Take Svm/poly5 as an example to illustrate, other similar

## SVM/poly5set.seed(123)poly5 = tune.svmdata =kernel ="polynomial"degree =5cost =2^(-4:8))summary(poly5)

The optimal \ (c\) selected at this time is 32, further refinement

set.seed(1234)poly5 = tune.svmdata =kernel ="polynomial"degree =5cost =seq(1664by =2))summary(poly5)

So \ (c\) takes 28.

Similarly, the optimization of other methods (c\), such as an experimental result is as follows:

Best Cost

Method
SV Classifier	2.6
Svm/poly 2	1
Svm/poly 5	28
Svm/poly 10	0.5

Of course, in practice we do not need to re-set parameters to train the model, because tune.svm() the return result contains the optimal model, called directly, such aspoly5$best.model

Calculate Test Error

Predict.mars2 <- function (model, newdata) {pred = predict(model, NewData)IfElse(Pred < 1.5,1,2)}calcerr <- function (model,n =  +,Nrep =  -,num_noise = 0,method = "SVM") {err = sapply(1: Nrep, function (i) {dat = Genxy(N,num_noise =num_noise) Datx = dat[,-Ncol(DAT)] Daty = dat[,Ncol(DAT)] if (method = ="SVM") pred = predict(Model,NewData =DATX) Else if (method = ="MARS") pred = predict.mars2(Model,NewData =DATX) Else if (method = ="Bruto") pred = predict.mars2(Model,NewData = As.matrix(DATX))sum(Pred! = daty)/(2*n)# attention!! The total number of observations are 2n, not n})return(List(Testerr = mean(ERR),SE = SD(err))}

It is worth noting that for Bruto and Mars, because the program treats it as a regression model, it needs to be further converted to a category label. Because the categories in the program are numbered with 1 and 2, the first class is judged if the fitted value is greater than 1.5, or greater than the second class.

Results

Comparing it with Table 12.2, it can be seen that the error rate of each method and the relative size of the standard deviation are quite consistent.

Bayesian Error Rate

For Category 1,
\[\sum X_j^2\sim \chi^2 (4) \]
For Category 2,
\[\sum X_j^2\sim \frac{\chi^2 (4) I (9\le\chi^2 (4) \le)}{\int_9^{16} f (t) dt}\]
where \ (f (t) \) is the density function of \ (\chi^2 (4) \) .

So the Bayes error rate is

\[\frac{1}{2}\int_{9}^{16}f (t) Dt\approx 0.029\]

The complete code can be found in Skin-of-the-orange. R

Permanent link to this article: impersonation: Tab. 12.2

Compare Svm,mars and Bruto (R language) with a simple example

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More