Python Scikit-learn Learning notes-handwritten numerals recognition

Last Update:2015-04-28 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a handwritten numeral recognition experiment, which is a case of sklearn in real life. The original example URL has the corresponding description and code.

The first experiment has a data volume of 1797, which is stored in the Sklearn dataset. We can get it directly from it. Each data is composed of Image,target two parts. Image is a size of 8*8 images, Target is the category of the image, in our opinion the category is handwritten number 0-9.

The code starts by loading the data into.

<span style= "Font-family:microsoft Yahei;" ># Standard scientific Python importsimport Matplotlib.pyplot as plt# Import datasets, classifiers and performance MetR Icsfrom sklearn Import Datasets, SVM, metrics# the digits datasetdigits = datasets.load_digits () </span>

After that, the first four training data were extracted to draw them out. Inside the enumerate function usage see the following URL:

http://blog.csdn.net/suofiya2008/article/details/5603861

<span style= "Font-family:microsoft Yahei;" >images_and_labels = List (Zip (digits.images, digits.target)) for index, (image, label) in enumerate (images_and_ Labels[:4]):    plt.subplot (2, 4, index + 1)    Plt.axis (' off ')    plt.imshow (image, Cmap=plt.cm.gray_r, interpolation= ' nearest ')    Plt.title (' Training:%i '% label) </span>

Then the trainer is the vector machine Classifier svc.

The principle of vector machine can take a look at this blog post:

Http://www.cnblogs.com/v-July-v/archive/2012/06/01/2539022.html

Here it only prescribes the gamma of the parameter.

More optional parameters are available in the following URLs:

Http://scikit-learn.org/0.13/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC

In SVM, I try to change the kernel function, except kernel= ' sigmoid ' effect is poor, the other effect is not very different.

<span style= "Font-family:microsoft Yahei;" ># to apply a classifier on this data, we need to flatten the image, to# turn the data in a (samples, feature) matrix:n _samples = Len (digits.images) data = Digits.images.reshape ((n_samples,-1)) # Create a classifier:a support vector classifi Erclassifier = SVM. SVC (gamma=0.001,kernel= ' poly ') # We Learn the digits on the first half of the Digitsclassifier.fit (Data[:n_samples/2], di GITS.TARGET[:N_SAMPLES/2]) </span>

Then there is the training and testing session, where it divides all the data into two parts. Half to do the training set, half to do the test set.

<span style= "Font-family:microsoft Yahei;" ># now predict the value of the digit on the second half:expected = digits.target[n_samples/2:]predicted = Classifier . Predict (DATA[N_SAMPLES/2:]) Print ("Classification report for classifier%s:\n%s\n"      % (classifier, Metrics.classification_report (expected, predicted)) print ("Confusion matrix:\n%s"% Metrics.confusion_matrix ( expected, predicted) </span>

Let's talk about the parameters of the test here. The first is Precision,recall,F1-score, support these four parameters.

F1-score is through Precision,recall the two are counted. formulas such as:

Support is the supporting degree, which indicates the number of data identified.

The second is the confusion matrix: in the image accuracy evaluation, mainly used to compare the classification results and the actual measured value, the accuracy of the classification results can be displayed in a confusion matrix inside. The confusion matrix is calculated by comparing the position and classification of each measured cell with the corresponding position and classification in the classified image. Each column of the confusion matrix represents the actual measured information, and the numerical value in each column corresponds to the number of the corresponding category in the classification image; each row of the confusion matrix represents the categorical information of the data, and the value in each row equals the number of categorical cells in the corresponding category of the measured cell.

Then you can draw the data from several test sets.

<span style= "Font-family:microsoft Yahei;" >images_and_predictions = List (Zip (DIGITS.IMAGES[N_SAMPLES/2:], predicted)) for index, (image, prediction) in Enumerate (Images_and_predictions[:4]):    plt.subplot (2, 4, index + 5)    Plt.axis (' off ')    plt.imshow (image, Cmap=plt.cm.gray_r, interpolation= ' nearest ')    Plt.title (' prediction:%i '% prediction) </span>

The original example URL

http://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html# Example-classification-plot-digits-classification-py

Python Scikit-learn Learning notes-handwritten numerals recognition

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More