Wine data comes from the UCI database and records the chemical composition of wine 13 of different varieties in the same region of Italy, so as to achieve automatic wine Classification through scientific methods.
The data of this classification has a total of 178 samples, each of which has 13 attributes, and provides a correct classification for each sample, which is used to verify the accuracy of SVM classification.
First, we can draw a data visualization diagram:
% Load Test Data wine, including the data in the matrix of classnumber = 3, wine: 178*13, and the column vector load chapter_wineclass.mat of wine_labes: 178*1; % plot the box visualization map of test data figure; boxplot (wine, 'orientation', 'horizontal ', 'labels', categories); Title ('box visualization map of wine Data ', 'fontsize', 12); xlabel ('attribute value ', 'fontsize', 12); grid on; % plot the dimension chart figuresubplot (, 1) of the test data ); hold onfor run = 1: 178 plot (run, wine_labels (run), '*'); endxlabel ('samples', 'fontsize', 10 ); ylabel ('category label', 'fontsize', 10); Title ('class', 'fontsize', 10); For run = subplot (, run ); hold on; STR = ['B B', num2str (run-1)]; for I = 1: 178 plot (I, wine (I, run-1 ), '*'); End xlabel ('samples', 'fontsize', 10); ylabel ('Property value', 'fontsize', 10); Title (STR, 'fontsize ', 10); End
(Figure 1)
(Figure 2)
Figure 1 shows the box visualization of wine data, and Figure 2 shows the box diagram of wine. It is difficult to tell which type of wine each type is. Next we will try to use SVM for classification.
Data preprocessing
% Selected training set and Test Set % use 1-30 of the first class, 60-95 of the second class, And 131-153 of the third class as the training set train_wine = [Wine ,:); wine (60: 95, :); wine (131: 153, :)]; % the labels of the corresponding training set must also be separated. train_wine_labels = [wine_labels (); wine_labels (60: 95); wine_labels (131: 153)]; % convert 31-59 of the First Class, 96-130 of the second class, test_wine = [Wine (31: 59, :); wine (96: 130, :); wine (154: 178, :)]; % The labels of the corresponding test set should also be separated. test_wine_labels = [wine_labels (31: 59); wine_labels (96: 130); wine_labels (154: 178)]; <strong> % data preprocessing </strong> % data preprocessing: normalize the training set and test set to the [0, 1] interval [mtrain, ntrain] = size (train_wine); [mtest, ntest] = size (test_wine); dataset = [train_wine; test_wine]; % mapminmax is the built-in normalization function of matlab [dataset_scale, PS] = mapminmax (Dataset ', 0, 1 ); dataset_scale = dataset_scale '; train_wine = dataset_scale (1: mtrain, :); test_wine = dataset_scale (mtrain + 1) :( mtrain + mtest ),:);
SVM network creation, training, and Prediction
<Span style = "font-size: 12px;"> % SVM network training model = svmtrain (train_wine_labels, train_wine, '-C 2-G 1 '); % SVM network prediction [predict_label, accuracy, dec_value1] = svmpredict (test_wine_labels, test_wine, model); </span>
Result Analysis
% Result Analysis % actual classification and prediction classification chart of the test set % the chart shows that only one test sample is the correct figure; Hold on; plot (test_wine_labels, 'o'); plot (predict_label, 'r * '); xlabel ('test set samples', 'fontsize', 12); ylabel ('category label ', 'fontsize', 12); legend ('actual Test Set category', 'prediction Test Set category'); Title ('actual classification of Test Set and prediction category ', 'fontsize', 12); grid on;
The SVM classification accuracy reaches 98.8764%, and only one of the 89 test samples is incorrectly classified. It can be seen that SVM is powerful in data classification!
End