Http://archive.ics.uci.edu/ml/
The database is a machine learning database proposed by the University of California at the University of Virginia (universityofcaliforniairvine). There are currently 187 datasets in this database, and the number of these databases is increasing, UCI dataset is a common standard test dataset.
The "multiplefeatures" database on UCI is a handwritten digital recognition problem. The digital image of each number is represented by 649 features in six groups.
UCI data can be read using MATLAB's dlmread (or textread or using MATLAB's imported data). However, you need to replace a number with a category other than a number, such as 1/2/3, otherwise, the data cannot be read.
Each data file (*. Data) contains records of many individual samples described in the form of "property-value" pairs. The *. info file contains a large amount of documents. (Some files _ generate _
Databases; they do not contain *. Data Files .) As a supplement to datasets and domain knowledge, the utilities directory contains useful information for using this dataset.
The following uses Iris in UCI as an example to describe the dataset:
Ucidata \ Iris has three files:
Index
Iris. Data
Iris. Names
Index is a folder directory that lists all the files in this folder. For example, the index content in Iris is as follows:
Index of Iris
18 Mar 1996 105 Index
08 mar 1993 4551 Iris. Data
30 May 1989 2604 Iris. Names
Iris. Data is an IRIS data file with the following content:
5.1, 3.5, 1.4, 0.2, iris-setosa
4.9, 3.0, 1.4, 0.2, iris-setosa
4.7, 3.2, 1.3, 0.2, iris-setosa
......
7.0, 3.2, 4.7, 1.4, iris-versicolor
6.9, 3.1, 4.9, 1.5, iris-versicolor
......
6.3, 3.3, 6.0, 2.5, iris-virginica
6.4, 3.2, 4.5, 1.5, iris-versicolor
5.8, 2.7, 5.1, 1.9, iris-virginica
7.1, 3.0, 5.9, 2.1, iris-virginica
......
As shown above, the attributes are directly separated by commas, and there is no space (5.1, 3.5, 1.4, 0.2,) in the middle. The last column is the value corresponding to the row attribute, that is, the decision attribute iris-setosa.
.
Iris. Names describes some information about the irir data, such as the data title, data source, previous usage, recent information, number of instances, and instance attributes:
......
7. Attribute Information:
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
-- Iris setosa
-- Iris versicolour
-- Iris virginica
......
9. Class distribution: 33.3% for each of 3 classes.
For examples of using this data, please refer to other papers or content later on this site.
The following uses wine data as an example to import MATLAB and uses the libsvm mentioned above for testing.
> Uiimport ('Wine. data ')
Import data. The wine array 178*14 appears at the workspace.
Extract tags and data attributes and save them to the data on the MATLAB platform.
> Wine_label = wine (:, 1 );
> Wine_data = wine (:, 2: End );
> Save winedat. Mat
(You can directly> load winedat next time)
Obtain the wine model from the SVM training model.
> Modelw = svmtrain (wine_label, wine_data );
.*
Optimization finished, # iter = 239.
Nu = 0.892184.
OBJ =-61.125695, Rn = 0.131965
NSv = 130, nbsv = 53
.*
Optimization finished, # iter = 193.
Nu = 0.882853.
OBJ =-50.421538, fig =-0.166754
NSv = 107, nbsv = 42
.*
Optimization finished, # iter = 214.
Nu = 0.800233.
OBJ =-53.411663, fig =-0.286931
NSv = 119, nbsv = 44
Total nSv = 178
Classification Result
> [Plabelw, accuracyw] = svmpredict (wine_label, wine_data, modelw );
Accuracy = 100% (178/178) (classification)