1. Climate Monitoring Data Set http://cdiac.ornl.gov/ftp/ndp026b
2. Some useful websites for downloading test Datasets
Http://www.fs.fed.us/fire/fuelman/
Http://www.cs.toronto.edu /~ Roweis/data.html
Http://www.cs.toronto.edu /~ Roweis/data.html
Http://kdd.ics.uci.edu/summary.task.type.html
Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
Http://www.phys.uni.torun.pl /~ Duch/software.html
You can find the Reuters dataset in the URL below: http://www.research.att.com /~ Lewis/reuters21578.html
The web site has a variety of datasets: http://kdd.ics.uci.edu/summary.data.type.html
Text Classification. Another dataset is usable, that is, the Rainbow dataset.
Http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
3. Machine Learning dataset collected by UCI
Ftp://pami.sjtu.edu.cn/
Http://www.ics.uci.edu /~ Mlearn // mlrepository.htm
4. statlib
Http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
Http://lib.stat.cmu.edu/
5. Websites for fund Data Mining
Http://www.gotofund.com/index.asp
Http://lans.ece.utexas.edu /~ Strehl/
6. Perform text classification & Web
Http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
Http://www.w3.org/TR/WD-logfile-960221.html
Http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
Http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
Http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
Http://www.web-caching.com/traces-logs.html
Http://www-2.cs.cmu.edu/webkb
Http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
Http://www.cs.cornell.edu/projects/kddcup/index.html
7. Time Series Data URL
Http://www.stat.wisc.edu /~ Reinsel/bjr-data/
8. test data of the Apriori algorithm
Http://www.almaden.ibm.com/cs/quest/syndata.html
9. Data generator Link
Http://www.cse.cuhk.edu.hk /~ KDD/data_collection.html
Http://www.almaden.ibm.com/cs/quest/syndata.html
10. Association:
Http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
Http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData
11. WEKA:
Http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1. A jarfile containing 37 classification problems, originally obtained from the UCI Repository
Http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2. A jarfile containing 37 Regression Problems, obtained from varous sources
Http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3. A jarfile containing 30 regression datasets collected by Luis torgo
Http://prdownloads.sourceforge.net/weka/regression-datasets.jar
12. cancer genes:
Http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
13. Financial data:
Http://lisp.vse.cz/pkdd99/Challenge/chall.htm
14. A good resource website is:Http://kdd.ics.uci.edu/, the data resources contained in the bread are as follows (by application field ):
Direct marketing
KDD Cup 1998 data
GIS
Forest covertype
Indexing
Corel image features
Pseudo periodic Synthetic Time Series
Intrusion Detection
KDD cup 1999 data
Process Control
Synthetic Control Chart Time Series
Recommendation Systems
Entree Chicago recommendation data
Robots
Pioneer-1 mobile robot data
Robot execution failures
Sign Language Recognition
Australian sign language data
High-quality authentication Alian sign language data
Text Categorization
20 newsgroups data
Reuters-21578 Text Categorization collection
NSF research awards merge acts 199 0-2003
World Wide Web
Microsoft anonymous Web Data
MSNBC anonymous Web Data
Syskill webert Web Data