The--digit of the Kaggle contest title recognizer

Last Update:2015-01-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Classify handwritten digits using the famous MNIST data

This competition was the first in a series of tutorial competitions designed to introduce people to machine learning.

The goal-competition is-to-take an image of a handwritten a-digit, and determine what's digit is. As the competition progresses, we'll release tutorials which explain different machine learning algorithms To get started.

The data for this competition were taken from the MNIST dataset. The MNIST ("Modified National Institute of Standards and technology") datasets are a classic within the machine learning com Munity that have been extensively studied. More detail about the dataset, including machine learning algorithms that has been tried on it and their levels of succes S, can is found at http://yann.lecun.com/exdb/mnist/index.html.

Title Link: Http://www.kaggle.com/c/digit-recognizer

Digital recognition of handwriting

Data Description: Http://www.kaggle.com/c/digit-recognizer/data

Each picture is 28 pixels long, each pixel is represented by a number (between 0~255), so each picture is represented by a 28x28 number. The training data contains a list of label and 784 column pixel values. The test data does not have a label column. Objective: To train the training data, to obtain the model and predict the label value of the test data.

The following restores the picture from the pixel value to the actual picture, using Ipython notebook:

In [1]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>

Pwd

C:\Users\zhaohf\Desktop

In [5]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>

CD .. / .. / .. / Workspace / Kaggle / Digitrecognizer / Data /

C:\workspace\kaggle\DigitRecognizer\Data

In [6]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>

Ls

The volume in drive C is the OS volume serial number that is the 6C93-0DF3 C:\workspace\kaggle\DigitRecognizer\Data directory 2015/01/15  16:04    <DIR>          . 2015/01/15  16:04    <DIR>          . 2014/12/28  15:06           240,909 rf_benchmark.csv2015/01/15  16:04        51,118,294 test.csv2014/12/28  15:06        51,118,296 test.csv.bak2014/12/28  15:06        76,775,041 train.csv               4 files    179,252,540 bytes               2 directories 105,536,135,168 bytes available

In [7]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>

Import Pandas  as PD

DF PD. Read_csv (' train.csv ',header=0). Head () #只要前5行

In [8]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>

Df

OUT[8]: ...

	label
0	1	...
1	0	...
2	1	...
3	4	...
4	0	...

5 rowsx785 Columns

In [9]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>

DF [' label ']

OUT[9]:

0    0name:label, Dtype:int64

In [14]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>

DF DF. IX [:,' pixel0 ':] #去除label列

In [15]:<textarea tabindex="0" spellcheck="false" autocapitalize="off" autocorrect="off" wrap="off" style="position: absolute; padding-top: 0px; padding-left: 0px; width: 1px; height: 1em; outline: none medium;"></textarea>

Df

OUT[15]: ...

	pixel0	pixel1	pixel2	pixel3	pixel4	pixel5	pixel6	pixel7	pixel8	pixel9		pixel774	pixel775	pixel776	pixel777	pixel778	pixel779	pixel780	pixel781	pixel782	pixel783
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

5 rowsx784 Columns

In [21]:<textarea tabindex="0" style="position:absolute; padding-top:0px; padding-left:0px; width:1px; height:1em; outline:none medium"></textarea>

% matplotlib inline

Import matplotlib. Pyplot  as PLT

 for I inch Range (DF. Shape [0]):

    img DF. IX [i]. values. Reshape (())

    plt. subplot (2,5,i+1)

    plt. Imshow (img)

The following is the use of random forests for training and forecasting:

Import NumPy as Npfrom sklearn.ensemble import randomforestclassifierfrom numpy Import savetxt,loadtxttrain = Loadtxt ('.. /data/train.csv ', delimiter= ', ', skiprows=1) X_train = Np.array ([x[1:] for X in train]) print X_train.shapey_train = Np.array ([x[0] for x in train]) print y_train.shapex_test = Loadtxt (' ... /data/test.csv ', delimiter= ', ', Skiprows=1) print X_test.shapeprint ' Training ... ' RF = Randomforestclassifier (n_ estimators=100) print ' predicting ... ' Rf_model = Rf.fit (x_train,y_train) pred = [[index+1,x] for index,x in enumerate (rf_ Model.predict (x_test))]savetxt ('.. /submissions/myrf_benchmark.csv ', pred,delimiter= ', ', fmt= '%d,%d ', header= ' Imageid,label ', comments= ') print ' done. '

First Submission Results:

The--digit of the Kaggle contest title recognizer

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The--digit of the Kaggle contest title recognizer

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

	label
0	1	...
1	0	...
2	1	...
3	4	...
4	0	...

	pixel0	pixel1	pixel2	pixel3	pixel4	pixel5	pixel6	pixel7	pixel8	pixel9		pixel774	pixel775	pixel776	pixel777	pixel778	pixel779	pixel780	pixel781	pixel782	pixel783
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

	label
0	1	...
1	0	...
2	1	...
3	4	...
4	0	...

	pixel0	pixel1	pixel2	pixel3	pixel4	pixel5	pixel6	pixel7	pixel8	pixel9		pixel774	pixel775	pixel776	pixel777	pixel778	pixel779	pixel780	pixel781	pixel782	pixel783
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

The--digit of the Kaggle contest title recognizer

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

	label
0	1	...
1	0	...
2	1	...
3	4	...
4	0	...

	pixel0	pixel1	pixel2	pixel3	pixel4	pixel5	pixel6	pixel7	pixel8	pixel9		pixel774	pixel775	pixel776	pixel777	pixel778	pixel779	pixel780	pixel781	pixel782	pixel783
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0