Introduction to Data sets
1. "Abalone Age" dataset (Abalone data set). is to predict the life of abalone by predicting the rings of abalone, the rings of abalone. The data set comes from the UCI (University of California,irvine,uci) database for machine learning.
A total of eight properties were: gender, length, diameter, etc.
An introduction to specific properties
Method one: Using BP
Method using Elm
Method Three: Using SVM
Me: Through the Xmind function found that in fact, for a new method function is integrated can be directly used, we have to do is to know the specific meaning of each function, and know the approximate process. Understanding is the foundation of everything and the basis for our free use of functions
2. Introduction to "Whether there is heart disease" episode
(Statlog (Heart) Data Set) is to determine whether the interviewee has heart disease by studying the values of age, sex, blood pressure and other properties.
Characteristics of the specific attributes:
Chest Pain Chest Pain
Resting blood pressure resting blood pressure
Serum Cholestoral Serum Bile acid
Fasting blood sugar fasting blood glucose
Resting electrocardiographic results Resting ECG results
Maxinum heart rate achieved Max Heartbeat
Exercise induced angina exercise induced angina pectoris
Oldpeak
The slope of the peak exercise St segment the slope of the St segment during peak exercise
Number of major vessels vascular capacity
Thal Tal
Input: 13 attribute output: Yes 1, no 0
The three-way process is:
3. Introduction to "Cancer patient Survival" episode
(Haberman ' s survival Data Set '), is to determine the patient's survival status by the age of the aged patient, the year of operation, the number of three positive axillary lymph nodes detected.
The three properties were: The age of the patient's operation, the year of the patient's operation, the number of positive axillary lymph nodes detected
Patient's survival: 1 means the patient survived for five years or more, and 2 of them didn't live for 5 years.
Input: Three properties
Output: Two labels
4. "Wheat Seed Set" (Seed Data Set)
Determine the seed type by the physical characteristics of the different three wheat seeds (Kama, Rosa, Canadian)
Specific properties:
Perimeter Perimeter
Compactness Compact
Length of kernel cores
Width of kernel core width
Asymmetry coefficient asymmetry coefficient
Length of kernel groove grain length
Input: These attributes above
Output: It's the kind of discrimination that belongs.
5. "Does the Indians have diabetes"?
(Pima Indians Diabetes Data Set) is determined by studying the properties of eight numeric types and then by the corresponding conclusions.
The last part of the dataset is a categorized attribute: 0 means no diabetes; 1 indicates
Plasma glucose concentration A 2 hours in an oral glucose tolerance test
In oral glucose tolerance test, the plasma glucose concentration was 2 hours.
diastolic blood pressure diastolic pressure
Triceps skin fold thickness three head muscle skin pleat thickness
2-hours Serum Insulin 2 hours Serum insulin
Body mass Index body mass index
Diabetes Pedigree Function Diabetic Pedigree
6. "General Wine category"
(Wine Data Set) records the results of chemical composition analysis of three different varieties of wine in the same region of Italy.
The specific properties are:
Read the research on comparison and analysis of data mining classification algorithms based on Neural network Master of Engineering, Anhui University: Changkai (ii) Introduction to Datasets