Four most popular machine learning datasets [go]

Source: Internet
Author: User

Machine learning algorithms must act on data. The nature of data determines whether the applied machine learning algorithms are suitable, and the quality of data determines the performance of algorithms. Therefore, it is important to study and analyze data. This article, as the first part of the study data series, lists four of the most popular machine learning datasets.

Iris

Iris, also known as Iris flower dataset, is a type of multi-variable analysis dataset. Which of the three types of Iris flower belong to (setosa, versicolour, virginica) are predicted by four attributes: the length, the width, and the length of the petals.

Dataset features: Multi-Variable Number of records: 150 Fields: Life
Attribute features: Real Number Attribute quantity: 4 Donation date 1988-07-01
Related Applications: Category Missing Value? None Website hits: 563347
Adult

This data is extracted from the U.S. Census database in 1994 and can be used to predict whether the income of residents exceeds 50 K $/year. The dataset variable shows whether the annual income exceeds 50 K $. The attribute variables include age, type of work, education, occupation, race, and other important information. It is worth mentioning that, there are 7 category variables in 14 property variables.

Dataset features: Multi-Variable Number of records: 48842 Fields: Society
Attribute features: Type, integer Attribute quantity: 14 Donation date 1996-05-01
Related Applications: Category Missing Value? Yes Website hits: 393977
Wine

This dataset contains 178 records from 3 wines of different origins. The 13 properties are 13 Chemical Ingredients of wine. Chemical analysis can be used to infer the origin of the wine. It is worth mentioning that all attribute variables are continuous variables.

Dataset features: Multi-Variable Number of records: 178 Fields: Physical
Attribute features: Integer, real number Attribute quantity: 13 Donation date 1991-07-01
Related Applications: Category Missing Value? None Website hits: 337319
Car Evaluation

This is a dataset about automobile evaluation. The class variables are automobile evaluation. (unacc, ACC, good, and vgood) represent (unacceptable, acceptable, good, and very good ), the six Property variables are "purchase price", "maintenance fee", "number of doors", "Number of people allowed", "trunk size", and "security 」. It is worth mentioning that all the six attribute variables are ordered class variables. For example, the "Number of people allowed" value can be "2, 4, more", and the "Security" value can be "low, Med, high 」.

Dataset features: Multi-Variable Number of records: 1728 Fields: N/A
Attribute features: Category Type Attribute quantity: 6 Donation date 1997-06-01
Related Applications: Category Missing Value? None Website hits: 272901
Summary

By comparing the differences between the above four datasets, we can simply summarize: when a large amount of data needs to be tested, we can think of "adult"; when we want to study the correlation between variables, we can select only the "Iris" and "Wine" of the integer or real number as the variable values. To study logistic regression, we can select only two types of "adult" for the class variable values 」; to study class variable conversion, we can select the "car evaluation" where the attribute variable is an ordered class 」. For more attempts, you need to know more about these datasets.

The preceding Dataset: Http://archive.ics.uci.edu/ml/

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.