Python Scikit-learn Learning notes-handwritten numerals recognition

Source: Internet
Author: User
Tags svm

This is a handwritten numeral recognition experiment, which is a case of sklearn in real life. The original example URL has the corresponding description and code.

The first experiment has a data volume of 1797, which is stored in the Sklearn dataset. We can get it directly from it. Each data is composed of Image,target two parts. Image is a size of 8*8 images, Target is the category of the image, in our opinion the category is handwritten number 0-9.

The code starts by loading the data into.

<span style= "Font-family:microsoft Yahei;" ># Standard scientific Python importsimport Matplotlib.pyplot as plt# Import datasets, classifiers and performance MetR Icsfrom sklearn Import Datasets, SVM, metrics# the digits datasetdigits = datasets.load_digits () </span>

After that, the first four training data were extracted to draw them out. Inside the enumerate function usage see the following URL:

http://blog.csdn.net/suofiya2008/article/details/5603861

<span style= "Font-family:microsoft Yahei;" >images_and_labels = List (Zip (digits.images, digits.target)) for index, (image, label) in enumerate (images_and_ Labels[:4]):    plt.subplot (2, 4, index + 1)    Plt.axis (' off ')    plt.imshow (image, Cmap=plt.cm.gray_r, interpolation= ' nearest ')    Plt.title (' Training:%i '% label) </span>

Then the trainer is the vector machine Classifier svc.

The principle of vector machine can take a look at this blog post:

Http://www.cnblogs.com/v-July-v/archive/2012/06/01/2539022.html

Here it only prescribes the gamma of the parameter.

More optional parameters are available in the following URLs:

Http://scikit-learn.org/0.13/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC

In SVM, I try to change the kernel function, except kernel= ' sigmoid ' effect is poor, the other effect is not very different.

<span style= "Font-family:microsoft Yahei;" ># to apply a classifier on this data, we need to flatten the image, to# turn the data in a (samples, feature) matrix:n _samples = Len (digits.images) data = Digits.images.reshape ((n_samples,-1)) # Create a classifier:a support vector classifi Erclassifier = SVM. SVC (gamma=0.001,kernel= ' poly ') # We Learn the digits on the first half of the Digitsclassifier.fit (Data[:n_samples/2], di GITS.TARGET[:N_SAMPLES/2]) </span>

Then there is the training and testing session, where it divides all the data into two parts. Half to do the training set, half to do the test set.

<span style= "Font-family:microsoft Yahei;" ># now predict the value of the digit on the second half:expected = digits.target[n_samples/2:]predicted = Classifier . Predict (DATA[N_SAMPLES/2:]) Print ("Classification report for classifier%s:\n%s\n"      % (classifier, Metrics.classification_report (expected, predicted)) print ("Confusion matrix:\n%s"% Metrics.confusion_matrix ( expected, predicted) </span>

Let's talk about the parameters of the test here. The first is Precision,recall,F1-score, support these four parameters.

F1-score is through Precision,recall the two are counted. formulas such as:

Support is the supporting degree, which indicates the number of data identified.

The second is the confusion matrix: in the image accuracy evaluation, mainly used to compare the classification results and the actual measured value, the accuracy of the classification results can be displayed in a confusion matrix inside. The confusion matrix is calculated by comparing the position and classification of each measured cell with the corresponding position and classification in the classified image. Each column of the confusion matrix represents the actual measured information, and the numerical value in each column corresponds to the number of the corresponding category in the classification image; each row of the confusion matrix represents the categorical information of the data, and the value in each row equals the number of categorical cells in the corresponding category of the measured cell.

Then you can draw the data from several test sets.

<span style= "Font-family:microsoft Yahei;" >images_and_predictions = List (Zip (DIGITS.IMAGES[N_SAMPLES/2:], predicted)) for index, (image, prediction) in Enumerate (Images_and_predictions[:4]):    plt.subplot (2, 4, index + 5)    Plt.axis (' off ')    plt.imshow (image, Cmap=plt.cm.gray_r, interpolation= ' nearest ')    Plt.title (' prediction:%i '% prediction) </span>


The original example URL

http://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html# Example-classification-plot-digits-classification-py


Python Scikit-learn Learning notes-handwritten numerals recognition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.