This is a handwritten numeral recognition experiment, which is a case of sklearn in real life. The original example URL has the corresponding description and code.
The first experiment has a data volume of 1797, which is stored in the Sklearn dataset. We can get it directly from it. Each data is composed of Image,target two parts. Image is a size of 8*8 images, Target is the category of the image, in our opinion the category is handwritten number 0-9.
The code starts by loading the data into.
<span style= "Font-family:microsoft Yahei;" ># Standard scientific Python importsimport Matplotlib.pyplot as plt# Import datasets, classifiers and performance MetR Icsfrom sklearn Import Datasets, SVM, metrics# the digits datasetdigits = datasets.load_digits () </span>
After that, the first four training data were extracted to draw them out. Inside the enumerate function usage see the following URL:
http://blog.csdn.net/suofiya2008/article/details/5603861
<span style= "Font-family:microsoft Yahei;" >images_and_labels = List (Zip (digits.images, digits.target)) for index, (image, label) in enumerate (images_and_ Labels[:4]): plt.subplot (2, 4, index + 1) Plt.axis (' off ') plt.imshow (image, Cmap=plt.cm.gray_r, interpolation= ' nearest ') Plt.title (' Training:%i '% label) </span>
Then the trainer is the vector machine Classifier svc.
The principle of vector machine can take a look at this blog post:
Http://www.cnblogs.com/v-July-v/archive/2012/06/01/2539022.html
Here it only prescribes the gamma of the parameter.
More optional parameters are available in the following URLs:
Http://scikit-learn.org/0.13/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC
In SVM, I try to change the kernel function, except kernel= ' sigmoid ' effect is poor, the other effect is not very different.
<span style= "Font-family:microsoft Yahei;" ># to apply a classifier on this data, we need to flatten the image, to# turn the data in a (samples, feature) matrix:n _samples = Len (digits.images) data = Digits.images.reshape ((n_samples,-1)) # Create a classifier:a support vector classifi Erclassifier = SVM. SVC (gamma=0.001,kernel= ' poly ') # We Learn the digits on the first half of the Digitsclassifier.fit (Data[:n_samples/2], di GITS.TARGET[:N_SAMPLES/2]) </span>
Then there is the training and testing session, where it divides all the data into two parts. Half to do the training set, half to do the test set.
<span style= "Font-family:microsoft Yahei;" ># now predict the value of the digit on the second half:expected = digits.target[n_samples/2:]predicted = Classifier . Predict (DATA[N_SAMPLES/2:]) Print ("Classification report for classifier%s:\n%s\n" % (classifier, Metrics.classification_report (expected, predicted)) print ("Confusion matrix:\n%s"% Metrics.confusion_matrix ( expected, predicted) </span>
Let's talk about the parameters of the test here. The first is Precision,recall,F1-score, support these four parameters.
F1-score is through Precision,recall the two are counted. formulas such as:
Support is the supporting degree, which indicates the number of data identified.
The second is the confusion matrix: in the image accuracy evaluation, mainly used to compare the classification results and the actual measured value, the accuracy of the classification results can be displayed in a confusion matrix inside. The confusion matrix is calculated by comparing the position and classification of each measured cell with the corresponding position and classification in the classified image. Each column of the confusion matrix represents the actual measured information, and the numerical value in each column corresponds to the number of the corresponding category in the classification image; each row of the confusion matrix represents the categorical information of the data, and the value in each row equals the number of categorical cells in the corresponding category of the measured cell.
Then you can draw the data from several test sets.
<span style= "Font-family:microsoft Yahei;" >images_and_predictions = List (Zip (DIGITS.IMAGES[N_SAMPLES/2:], predicted)) for index, (image, prediction) in Enumerate (Images_and_predictions[:4]): plt.subplot (2, 4, index + 5) Plt.axis (' off ') plt.imshow (image, Cmap=plt.cm.gray_r, interpolation= ' nearest ') Plt.title (' prediction:%i '% prediction) </span>
The original example URL
http://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html# Example-classification-plot-digits-classification-py
Python Scikit-learn Learning notes-handwritten numerals recognition