Dataset-used for data mining, information retrieval, and Knowledge Discovery

Source: Internet
Author: User

1. Climate Monitoring Data Set http://cdiac.ornl.gov/ftp/ndp026b

2. Some useful websites for downloading test Datasets

Http://www.cs.toronto.edu /~ Roweis/data.html
Http://www.cs.toronto.edu /~ Roweis/data.html
Http://kdd.ics.uci.edu/summary.task.type.html
Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
Http://www.phys.uni.torun.pl /~ Duch/software.html
You can find the Reuters dataset http://www.research.att.com/~ in the URL below /~ Lewis/reuters21578.html

Various datasets are available on the following websites:
Http://kdd.ics.uci.edu/summary.data.type.html

Text Classification. Another dataset is usable, that is, the Rainbow dataset.
Http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

3. After finding a lot of test datasets, the comrades who write the papers must at least use them to test the effectiveness of the algorithms.
There may be some inaccessible, but there are always some accessible:

Machine Learning dataset collected by UCI
Ftp://pami.sjtu.edu.cn/
Http://www.ics.uci.edu /~ Mlearn // mlrepository.htm

Statlib
Http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
Http://lib.stat.cmu.edu/

Sample Database
Http://kdd.ics.uci.edu/
Http://www.ics.uci.edu /~ Mlearn/mlrepository.html

Websites for fund Data Mining
Http://www.gotofund.com/index.asp

Http://lans.ece.utexas.edu /~ Strehl/

Reuters Dataset
Http://www.research.att.com /~ Lewis/reuters21578.html

Various datasets:
Http://kdd.ics.uci.edu/summary.data.type.html
Http://www.mlnet.org/cgi-bin/mlnetois.pl? File1_datasets.html
Http://lib.stat.cmu.edu/datasets/
Http://dctc.sjtu.edu.cn/adaptive/datasets/
Http://fimi.cs.helsinki.fi/data/
Http://www.almaden.ibm.com/software/quest/Resources/index.shtml
Http://miles.cnuce.cnr.it /~ Palmeri/datam/DCI/

Text Classification & Web
Http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

Http://www.w3.org/TR/WD-logfile-960221.html
Http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
Http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
Http://www.web-caching.com/traces-logs.html
Http://www-2.cs.cmu.edu/webkb
Http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
Http://www.cs.cornell.edu/projects/kddcup/index.html

 


URL of time series data
Http://www.stat.wisc.edu /~ Reinsel/bjr-data/

 

Test data of the Apriori algorithm
Http://www.almaden.ibm.com/cs/quest/syndata.html

Data generator Link
Http://www.cse.cuhk.edu.hk /~ KDD/data_collection.html
Http://www.almaden.ibm.com/cs/quest/syndata.html
Association:
Http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
Http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynDataOriginal article addressHttp://www.cnblogs.com/bobomouse/archive/2007/05/26/760513.html

WEKA:
Http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1. A jarfile containing 37 classification problems, originally obtained from the UCI Repository
Http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2. A jarfile containing 37 Regression Problems, obtained from varous sources
Http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3. A jarfile containing 30 regression datasets collected by Luis torgo
Http://prdownloads.sourceforge.net/weka/regression-datasets.jar

Cancer genes:
Http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

Financial data:
Http://lisp.vse.cz/pkdd99/Challenge/chall.htm

 

Provided by another person
Http://www.cs.toronto.edu /~ Roweis/data.html
Http://kdd.ics.uci.edu/summary.task.type.html
Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
Http://www.phys.uni.torun.pl /~ Duch/software.html
You can find the Reuters dataset at the URL below.
Http://www.research.att.com /~ Lewis/reuters21578.html

Various datasets are available on the following websites:
Http://kdd.ics.uci.edu/summary.data.type.html

Text Classification. Another dataset is usable, that is, the Rainbow dataset.
Http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
Download the financial data (~ 17.5 m zipped file ,~ 67 m unzipped data)
Download the medical data (~ 2 m zipped file ,~ 6 m unzipped data)
Http://lisp.vse.cz/pkdd99/Challenge/chall.htm


Kdnuggets-related linked dataset:
Http://www.kdnuggets.com/datasets/index.html

 

There is another good resource Website: http://kdd.ics.uci.edu/. The following figure shows the data resources contained in the bread ):

 

Direct marketing
KDD Cup 1998 data

GIS
Forest covertype

Indexing
Corel image features
Pseudo periodic Synthetic Time Series

Intrusion Detection
KDD cup 1999 data

Process Control
Synthetic Control Chart Time Series

Recommendation Systems
Entree Chicago recommendation data

Robots
Pioneer-1 mobile robot data
Robot execution failures

Sign Language Recognition
Australian sign language data
High-quality authentication Alian sign language data

Text Categorization
20 newsgroups data
Reuters-21578 Text Categorization collection
NSF research awards merge acts 199 0-2003

World Wide Web
Microsoft anonymous Web Data
MSNBC anonymous Web Data
Syskill webert Web Data

Here I found another http://www.fs.fed.us/fire/fuelman/ that I found on a foreigner's blog.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.