The--digit of the Kaggle contest title recognizer

Source: Internet
Author: User

Classify handwritten digits using the famous MNIST data

This competition was the first in a series of tutorial competitions designed to introduce people to machine learning.

The goal-competition is-to-take an image of a handwritten a-digit, and determine what's digit is. As the competition progresses, we'll release tutorials which explain different machine learning algorithms To get started.


The data for this competition were taken from the MNIST dataset. The MNIST ("Modified National Institute of Standards and technology") datasets are a classic within the machine learning com  Munity that have been extensively studied. More detail about the dataset, including machine learning algorithms that has been tried on it and their levels of succes S, can is found at http://yann.lecun.com/exdb/mnist/index.html.

Title Link: Http://www.kaggle.com/c/digit-recognizer

Digital recognition of handwriting

Data Description: Http://www.kaggle.com/c/digit-recognizer/data

Each picture is 28 pixels long, each pixel is represented by a number (between 0~255), so each picture is represented by a 28x28 number. The training data contains a list of label and 784 column pixel values. The test data does not have a label column. Objective: To train the training data, to obtain the model and predict the label value of the test data.

The following restores the picture from the pixel value to the actual picture, using Ipython notebook:

In [1]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>
Pwd
C:\Users\zhaohf\Desktop
In [5]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>
CD .. / .. / .. / Workspace / Kaggle / Digitrecognizer / Data /
C:\workspace\kaggle\DigitRecognizer\Data
In [6]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>
Ls
The volume in drive C is the OS volume serial number that is the 6C93-0DF3 C:\workspace\kaggle\DigitRecognizer\Data directory 2015/01/15  16:04    <DIR>          . 2015/01/15  16:04    <DIR>          . 2014/12/28  15:06           240,909 rf_benchmark.csv2015/01/15  16:04        51,118,294 test.csv2014/12/28  15:06        51,118,296 test.csv.bak2014/12/28  15:06        76,775,041 train.csv               4 files    179,252,540 bytes               2 directories 105,536,135,168 bytes available
In [7]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>
Import Pandas  as PD
DF PD. Read_csv (' train.csv ',header=0). Head () #只要前5行
In [8]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>
Df
OUT[8]: ...
label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 4 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rowsx785 Columns

In [9]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>
DF [' label ']
OUT[9]:
0    0name:label, Dtype:int64
In [14]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>
DF DF. IX [:,' pixel0 ':] #去除label列
In [15]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>
Df
OUT[15]: ...
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rowsx784 Columns

In [21]:<textarea tabindex="0" style="position:absolute; padding-top:0px; padding-left:0px; width:1px; height:1em; outline:none medium"></textarea>
% matplotlib inline
Import matplotlib. Pyplot  as PLT
 for I inch Range (DF. Shape [0]):
    img DF. IX [i]. values. Reshape (())
    plt. subplot (2,5,i+1)
    plt. Imshow (img)


The following is the use of random forests for training and forecasting:

Import NumPy as Npfrom sklearn.ensemble import randomforestclassifierfrom numpy Import savetxt,loadtxttrain = Loadtxt ('.. /data/train.csv ', delimiter= ', ', skiprows=1) X_train = Np.array ([x[1:] for X in train]) print X_train.shapey_train = Np.array ([x[0] for x in train]) print y_train.shapex_test = Loadtxt (' ... /data/test.csv ', delimiter= ', ', Skiprows=1) print X_test.shapeprint ' Training ... ' RF = Randomforestclassifier (n_ estimators=100) print ' predicting ... ' Rf_model = Rf.fit (x_train,y_train) pred = [[index+1,x] for index,x in enumerate (rf_ Model.predict (x_test))]savetxt ('.. /submissions/myrf_benchmark.csv ', pred,delimiter= ', ', fmt= '%d,%d ', header= ' Imageid,label ', comments= ') print ' done. '

First Submission Results:


The--digit of the Kaggle contest title recognizer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.