Kaggle Big Data Contest Platform Introduction
Big Data Competition platform, domestic is mainly Tianchi Big Data competition and datacastle, foreign main is kaggle.kaggle is a data mining competition platform, The website is: https://www.kaggle.com/. A lot of institutions, enterprises will issue, description, expectations posted on the Kaggle, in a competitive way to the vast number of data scientists to collect solutions, embodies the idea of collective wisdom. Once everyone has registered on the site, You can download data sets of interesting items, analyze data, construct models, and resolve the problem submission results. There will be a ranking according to the result, the merit may also obtain the bonus/interview opportunity and so on.
Figure 1 shows the game in progress after entering the Kaggle official website, the types of these competitions are different and can be filtered to show that there are all categories,faatured,recruitment,research,playground, Getting Started,in class These 7 options. Show as Featured race (pink strips on the left) general bonuses are relatively generous and the competition is relatively large; the competition shown for the study (yellow strips on the left) Less bonus; show as recruitment , although there is no bonus, but can be released to the project company internship/interview opportunities, which also gives the company to recruit talent another way. Shown as Playground for the practice race, Mainly used for beginner practiced hand, for beginners, it is recommended to start here . Getting Started inside to teach you step-by-step data mining, is a good introductory tutorial. In addition to these open competitions, Kaggle will also want active participants to offer private competitions, and to provide Kaggle-in-class programs for university groups. Kaggle's Blog No free hunch is also a good place to study, providing the data science News,kaggle news,kernels,tutorials, And winner ' s interviews these columns.
Figure 1 Kaggle Home
Competition Process :
1. Enter the contest item of interest, download the dataset (CSV format), the data set generally includes training data set and test data set, view data description and task description, clarify the requirement;
2. Build a model with any language or algorithm you are good at, train with training sets, and then use the trained model to speculate on the labels of the test set and generate a test set labels as the final submission file;
3. The system will select 25% of the data from the submitted file for preliminary evaluation, according to the evaluation results are accurate and ranked. At the end of the game, use the remaining 75% of the data for final evaluation as the last accuracy rate.
Kernels:
Kernels provides the environment for data analysis, datasets, codes, and output styles, and clicking inside is the following: This is similar to the Jupyper Notebook. In this case, Python can be directly compiled, can switch between code and markdown freely, Can be easily reproduced and shared. One more thing is that you may not need to download the datasets, or configure local Python and various libraries (such as pandas,numpy, etc.), to do data mining directly on the web. Kernel can also share code (beginner's Good Learning places), in forum (forums) to answer questions can also be points.
Reference documents:
[1] Kaggle Machine learning competition winner and winner of the source code summary: http://suanfazu.com/t/kaggle/230
[2] approaching (almost) any machine learning problem | Abhishek Thakur
Kaggle Big Data Contest Platform Introduction