Data Mining dataset Resources

Source: Internet
Author: User

1. Climate Monitoring Data Set http://cdiac.ornl.gov/ftp/ndp026b

2. Some useful websites for downloading test Datasets

Http://www.cs.toronto.edu /~ Roweis/data.html

Http://www.cs.toronto.edu /~ Roweis/data.html

Http://kdd.ics.uci.edu/summary.task.type.html

Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/

Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/

Http://www.phys.uni.torun.pl /~ Duch/software.html

You can find the Reuters dataset http://www.research.att.com/~ in the URL below /~ Lewis/reuters21578.html

 

Various datasets are available on the following websites:

Http://kdd.ics.uci.edu/summary.data.type.html

 

Text Classification. Another dataset is usable, that is, the Rainbow dataset.

Http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

3. After finding a lot of test datasets, the comrades who write the papers must at least use them to test the effectiveness of the algorithms.

There may be some inaccessible, but there are always some accessible:

Machine Learning dataset collected by UCI

Ftp://pami.sjtu.edu.cn/

Http://www.ics.uci.edu /~ Mlearn // mlrepository.htm

Statlib

Http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm

Http://lib.stat.cmu.edu/

Sample Database

Http://kdd.ics.uci.edu/

Http://www.ics.uci.edu /~ Mlearn/mlrepository.html

Websites for fund Data Mining

Http://www.gotofund.com/index.asp

Http://lans.ece.utexas.edu /~ Strehl/

Reuters Dataset

Http://www.research.att.com /~ Lewis/reuters21578.html

Various datasets:

Http://kdd.ics.uci.edu/summary.data.type.html

Http://www.mlnet.org/cgi-bin/mlnetois.pl? File1_datasets.html

Http://lib.stat.cmu.edu/datasets/

Http://dctc.sjtu.edu.cn/adaptive/datasets/

Http://fimi.cs.helsinki.fi/data/

Http://www.almaden.ibm.com/software/quest/Resources/index.shtml

Http://miles.cnuce.cnr.it /~ Palmeri/datam/DCI/

Text Classification & Web

Http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

Http://www.w3.org/TR/WD-logfile-960221.html

Http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog

Http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html

Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/

Http://www.web-caching.com/traces-logs.html

Http://www-2.cs.cmu.edu/webkb

Http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf

Http://www.cs.cornell.edu/projects/kddcup/index.html

 

URL of time series data

Http://www.stat.wisc.edu /~ Reinsel/bjr-data/

Test data of the Apriori algorithm

Http://www.almaden.ibm.com/cs/quest/syndata.html

Data generator Link

Http://www.cse.cuhk.edu.hk /~ KDD/data_collection.html

Http://www.almaden.ibm.com/cs/quest/syndata.html

 

Association:

Http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar

Http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData

WEKA:

Http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar

1. A jarfile containing 37 classification problems, originally obtainedfrom the UCI Repository

Http://prdownloads.sourceforge.net/weka/datasets-UCI.jar

2. A jarfile containing 37 Regression Problems, obtained from varioussources

Http://prdownloads.sourceforge.net/weka/datasets-numeric.jar

3. A jarfile containing 30 regression datasets collected by Luis torgo

Http://prdownloads.sourceforge.net/weka/regression-datasets.jar

Cancer genes:

Http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

Financial data:

Http://lisp.vse.cz/pkdd99/Challenge/chall.htm

 

Provided by another person

Http://www.cs.toronto.edu /~ Roweis/data.html

Http://kdd.ics.uci.edu/summary.task.type.html

Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/

Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/

Http://www.phys.uni.torun.pl /~ Duch/software.html

You can find the Reuters dataset at the URL below.

Http://www.research.att.com /~ Lewis/reuters21578.html

Various datasets are available on the following websites:

Http://kdd.ics.uci.edu/summary.data.type.html

Text Classification. Another dataset is usable, that is, the Rainbow dataset.

Http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

 

Download the financial data (~ 17.5 m zipped file ,~ 67 m unzipped data)

Download the medical data (~ 2 m zipped file ,~ 6 m unzipped data)

Http://lisp.vse.cz/pkdd99/Challenge/chall.htm

 

Kdnuggets-related linked dataset:

Http://www.kdnuggets.com/datasets/index.html

There is another good resource Website: http://kdd.ics.uci.edu/. The following figure shows the data resources contained in the bread ):

Direct marketing

Kddcup 1998 data

 

GIS

Forest covertype

 

Indexing

Corel image features

Pseudo periodic Synthetic Time Series

 

Intrusion Detection

Kddcup 1999 data

 

Process Control

Synthetic Control Chart Time Series

 

Recommendation Systems

Entree chicagorecommendation data

 

Robots

Pioneer-1 mobile robot data

Robot execution failures

 

Sign Language Recognition

Australian sign language data

High-quality authentication Alian sign language data

 

Text Categorization

20 newsgroups data

Reuters-21578 Text Categorization collection

Nsfresearch awards merge acts 199 0-2003

 

World Wide Web

Microsoft anonymous Web Data

MSNBC anonymous Web Data

Syskill webert Web Data

 

Here I found another one, which was found on a foreigner's blog. (One day before Children's Day)

Http://www.fs.fed.us/fire/fuelman/

(Sina Weibo: @ quanliang _ machine learning)

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.