How to convert to a LIBSVM supported data format and do regression analysis

Source: Internet
Author: User
Tags svm rbf kernel

The data in this experiment is from the teacher gave 2006-2008 years of date, 24 hours of temperature, electrical load data, as well as 2009 date, 24 hours of temperature data, the purpose is to predict the 2009 24 hours of electricity load, the experimental data is not given in this paper.

The steps to predict with LIBSVM are generally: the data is normalized, converted into the format required by the LIVSVM, then the parameters are chosen, the best parameters are used to model 2006-2008 3 years, and the model is used to predict the power load in 2009. In the actual process, I first used 2006-2007 years of data modeling to predict the 2008 data to get the test error. It turns out that a 2006-2007-year data model is used to predict the 2008 power load, which is better than 2007-year data modeling to predict the 2008 power load. So I finally modeled it with 2006-2008 three years of data to predict.

when LIBSVM training a model, the parameters set are:
-S SVM type, with a value of 0,1,2,3,4 regression, select 3 or 4.
The-t kernel function type, the value is 0,1,2,3 0 is the linear kernel function, 1 is the polynomial kernel function, 2 is the RBF radial basis kernel function, 3 is the sigmoid kernel function.
-G Gamma, which is a parameter option for polynomial, RBF, sigmoid kernel functions. The default is 1/k,k is the number of attributes/categories.
-C is a loss function set for C-svc, E-svr, and Nu-svr, which defaults to 1.

Detailed parameter description see LIBSVM use method and parameter setting (GO).

Here are the steps to make a regression prediction:

1. Convert data to LIBSVM required format

The data format requires:

..
Target Property 1th Property: Value 2nd Property: Value ...
2 1:7 2:5 ...
1 1:4 2:2 ...

That is, if it is a classification problem, the first column is the category attribute.

Download a WRITE4LIBSVM.M format translator on the Web, run directly in MATLAB, and select the data file you want to convert, which is very easy to use.

Write4libsvm.m

function WRITE4LIBSVM%in order to make the data to meet the LIBSVM format requirements of the data Format conversion Note that the original format is the data format of the mat, converted to txt or dat can be. %The original data is saved in the following format:%[Label first property value second property value ...]%The converted file format satisfies the LIBSVM format requirements, namely:% [Label1: The first property value2: Second Property value3: Third property value ...] %[email protected]%2004.6. -[filename, pathname]= Uigetfile ({'*.mat', ...        'data File (*.mat)'; ...        '*.*','All Files (*. *)'}, ...    'Select data File'); TryS=load ([pathname filename]); FieldName=FieldNames (S); STR=Cell2mat (fieldName); B=GetField (S,STR); [M,n]=size (B); [FileName, pathname]= Uiputfile ({'*.txt;*.dat','data File (*.txt;*.dat)';'*.*','All Files (*. *)'},'Save data File'); FID= fopen ([pathname filename],'W'); if(fid~=-1)         fork=1: M fprintf (FID,'%3d', B (k,1));  forKK =2: N fprintf (FID,'\t%d', (kk-1)); fprintf (FID,':'); fprintf (FID,'%d', B (K,KK)); End K fprintf (FID,'\ n');    End fclose (FID); ElseMsgBox ('Unable to save file!'); EndCatchEnd
2. Select a kernel function type

I chose the RBF kernel function.

2. Normalization of data

If you do not do normalization, the final prediction error will be very large.
The property is normalized by a program. At first I did not do normalization, the results test error mape up to 14%, do attribute normalization processing, the test data mape is 3.9556%.

Clear;load ('X1.mat');%X1.mat is a training set. Load ('X2.mat');%X2.mat is a test set. X1_1=Normalization (X1); X2_1=Normalization (X2);% Save As X1_1.mat X2_1.mat then run **write4libsvm.m**convert files X1_1.csv and x2_1.csv into a format that fits your needs. % into D:\softwares_diy\MATLAB\R2014a\toolbox\libsvm-3.21 Directory, will D:\softwares_diy\MATLAB\R2014a\toolbox\libsvm-3.21\matlab Add to Path [Y1, X1]= Libsvmread ('X1_1.csv');% Y1 X1 is 2006-data for 2008 years. [Y2, X2]= Libsvmread ('X2_1.csv');%Y2 X2 is the data for 2009. Y1_train= Y1 (1:17520,:); % .-07 years of data to do training X1_train= X1 (1:17520,:); Y1_test= Y1 (17521: End,:);%08 years of data to do test x1_test= X1 (17521: End,:);
3. Parameter optimization

The important parameters to be adjusted are-C and-G. -c Specifies the loss function,-G is the gamma value setting for the polynomial, RBF, sigmoid kernel functions.

I use the program svm.cg.m to find the optimal parameters C and G by specifying the variation range of C and the range of G.

Here is the forecast code :

%finding the best C and Gresult1= [];% .-07 years of data training, 08 of data to do the test. %svmcg (train_label,train,cmin,cmax,gmin,gmax,v,cstep,gstep,accstep)The variation range of the% parameter C is [2^cmin,2^Cmax]The variation range of the% parameter G is [2^gmin,2^Gmax]%Cstep is the change step of C, and Gstep is the change step of G. [BESTACC,BESTC,BESTG]= SVMCG (Y1_train,x1_train,0,8,-1,4,2,1,1,0.9);%ran for a long time to come out cmd= ['- s 3-t 2','- C', Num2str (BESTC),'- G', Num2str (BESTG)];model=Libsvmtrain (Y1_train, X1_train, cmd); [Y_08_pre,mse,decision_values]=libsvmpredict (Y1_test,x1_test,model); MAPE= Mean (ABS (y_test_pre-y1_test)./y1_test);%Calculate the Mapermse of 08 years= sqrt (mean (y_test_pre-y1_test). ^2)); MAE= Mean (ABS (y_test_pre-y1_test)); MSE= Mean ((y_test_pre-y1_test). ^2); Clear model cmd y_test_pre MSE decision_values MAPE RMSE MAE MSE bestacc BESTC Bestg;% .-08 years of data to do training, 09 test. [BESTACC,BESTC,BESTG]= SVMCG (y1,x1,0,8,-1,4,2,1,1,0.9); CMD= ['- s 3-t 2','- C', Num2str (BESTC),'- G', Num2str (BESTG)];model=Libsvmtrain (Y1, X1, cmd); [Y_09_pre,mse,decision_values]= Libsvmpredict (Y2,x2,model);

Where Y_09_pre is predicting a 2009-year power load of 24 hours a day, since there is no real value for the 2009-year power load, ignoring the libsvmpredict return value of MSE.

Svm.cg.m

function [BESTACC,BESTC,BESTG] =svmcg (train_label,train,cmin,cmax,gmin,gmax,v,cstep,gstep,accstep)%svmcg Cross validation by Faruto%email:[email protected] QQ:516667408http//Blog.sina.com.cn/faruto BNU%last modified2009.8. at%Super Moderator @ www.ilovematlab.cn%%About the parameters of svmcgifNargin <TenAccstep=1.5; endifNargin <8Accstep=1.5; Cstep=1; Gstep=1; endifNargin <7Accstep=1.5; V=3; Cstep=1; Gstep=1; endifNargin <6Accstep=1.5; V=3; Cstep=1; Gstep=1; Gmax=5; endifNargin <5Accstep=1.5; V=3; Cstep=1; Gstep=1; Gmax=5; Gmin= -5; endifNargin <4Accstep=1.5; V=3; Cstep=1; Gstep=1; Gmax=5; Gmin= -5; Cmax=5; endifNargin <3Accstep=1.5; V=3; Cstep=1; Gstep=1; Gmax=5; Gmin= -5; Cmax=5; Cmin= -5; end%%x:c y:g Cg:acc[x,y]=Meshgrid (Cmin:cstep:cmax,gmin:gstep:gmax); [M,n]=size (X); CG=zeros (m,n);Percent record ACC with different C &G,and Find the BESTACC with the smallest CBESTC=0; Bestg=0; BESTACC=0; Basenum=2; fori =1: M forj =1: n cmd= ['- v', Num2str (v),'- C', Num2str (Basenum^x (i,j)),'- G', Num2str (basenum^Y (I,J))]; CG (I,J)=Libsvmtrain (Train_label, train, cmd); ifCG (I,J) >BESTACC BESTACC=CG (I,J); BESTC= basenum^X (I,J); BESTG= basenum^Y (I,J); Endif(CG (i,j) = = Bestacc && bestc > basenum^X (i,j)) Bestacc=CG (I,J); BESTC= basenum^X (I,J); BESTG= basenum^Y (I,J); End EndEndPercent to draw the ACC with different C &G[c,h]= Contour (X,Y,CG, -: Accstep: -); Clabel (C,h,'FontSize',Ten,'Color','R'); Xlabel ('log2c','FontSize',Ten); Ylabel ('log2g','FontSize',Ten); grid on;

How to convert to a LIBSVM supported data format and do regression analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.