Machine Learning UCI database

Source: Internet
Author: User

Http://archive.ics.uci.edu/ml/

 

The database is a machine learning database proposed by the University of California at the University of Virginia (universityofcaliforniairvine). There are currently 187 datasets in this database, and the number of these databases is increasing, UCI dataset is a common standard test dataset.

 

The "multiplefeatures" database on UCI is a handwritten digital recognition problem. The digital image of each number is represented by 649 features in six groups.

 

 

UCI data can be read using MATLAB's dlmread (or textread or using MATLAB's imported data). However, you need to replace a number with a category other than a number, such as 1/2/3, otherwise, the data cannot be read.

 

Each data file (*. Data) contains records of many individual samples described in the form of "property-value" pairs. The *. info file contains a large amount of documents. (Some files _ generate _
Databases; they do not contain *. Data Files .) As a supplement to datasets and domain knowledge, the utilities directory contains useful information for using this dataset.

 

The following uses Iris in UCI as an example to describe the dataset:

 

Ucidata \ Iris has three files:

Index

Iris. Data

Iris. Names

 

Index is a folder directory that lists all the files in this folder. For example, the index content in Iris is as follows:

Index of Iris

18 Mar 1996 105 Index

08 mar 1993 4551 Iris. Data

30 May 1989 2604 Iris. Names

 

Iris. Data is an IRIS data file with the following content:

5.1, 3.5, 1.4, 0.2, iris-setosa

4.9, 3.0, 1.4, 0.2, iris-setosa

4.7, 3.2, 1.3, 0.2, iris-setosa

......

7.0, 3.2, 4.7, 1.4, iris-versicolor

6.9, 3.1, 4.9, 1.5, iris-versicolor

......

6.3, 3.3, 6.0, 2.5, iris-virginica

6.4, 3.2, 4.5, 1.5, iris-versicolor

 

5.8, 2.7, 5.1, 1.9, iris-virginica

7.1, 3.0, 5.9, 2.1, iris-virginica

......

As shown above, the attributes are directly separated by commas, and there is no space (5.1, 3.5, 1.4, 0.2,) in the middle. The last column is the value corresponding to the row attribute, that is, the decision attribute iris-setosa.

.

 

Iris. Names describes some information about the irir data, such as the data title, data source, previous usage, recent information, number of instances, and instance attributes:

......

7. Attribute Information:

1. sepal length in cm

2. sepal width in cm

3. petal length in cm

4. petal width in cm

5. class:

-- Iris setosa

-- Iris versicolour

-- Iris virginica

......

9. Class distribution: 33.3% for each of 3 classes.

 

For examples of using this data, please refer to other papers or content later on this site.

 

The following uses wine data as an example to import MATLAB and uses the libsvm mentioned above for testing.

 

> Uiimport ('Wine. data ')

Import data. The wine array 178*14 appears at the workspace.

Extract tags and data attributes and save them to the data on the MATLAB platform.

> Wine_label = wine (:, 1 );

> Wine_data = wine (:, 2: End );

> Save winedat. Mat

 

(You can directly> load winedat next time)

 

Obtain the wine model from the SVM training model.

> Modelw = svmtrain (wine_label, wine_data );

.*

Optimization finished, # iter = 239.

Nu = 0.892184.

OBJ =-61.125695, Rn = 0.131965

NSv = 130, nbsv = 53

.*

Optimization finished, # iter = 193.

Nu = 0.882853.

OBJ =-50.421538, fig =-0.166754

NSv = 107, nbsv = 42

.*

Optimization finished, # iter = 214.

Nu = 0.800233.

OBJ =-53.411663, fig =-0.286931

NSv = 119, nbsv = 44

Total nSv = 178

 

Classification Result

> [Plabelw, accuracyw] = svmpredict (wine_label, wine_data, modelw );

Accuracy = 100% (178/178) (classification)

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.