Python scikit-learn 學習筆記

Python scikit-learn 學習筆記—手寫數字識別

最後更新：2015-04-28 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：python sklearn

這是一個手寫數位識別實驗，是一個sklearn在現實中使用的案例。原例網址裡有相應的說明和代碼。

首先實驗的資料量為1797，儲存在sklearn的dataset裡。我們可以直接從中擷取。每一個資料是有image,target兩部分組成。Image是一個尺寸為8*8映像，target是映像的類別，在我們看來類別就是手寫的數字0-9.

代碼一開始，將資料載入。

<span style="font-family:Microsoft YaHei;"># Standard scientific Python importsimport matplotlib.pyplot as plt# Import datasets, classifiers and performance metricsfrom sklearn import datasets, svm, metrics# The digits datasetdigits = datasets.load_digits()</span>

之後，抽取了前四個訓練資料將他們畫了出來。裡面enumerate函數用法參見如下網址：

http://blog.csdn.net/suofiya2008/article/details/5603861

<span style="font-family:Microsoft YaHei;">images_and_labels = list(zip(digits.images, digits.target))for index, (image, label) in enumerate(images_and_labels[:4]):    plt.subplot(2, 4, index + 1)    plt.axis('off')    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')    plt.title('Training: %i' % label)</span>

然後訓練器為向量機分類器SVC。

向量機的原理可以看一下這一篇部落格：

http://www.cnblogs.com/v-July-v/archive/2012/06/01/2539022.html

這裡它只規定了參數gamma

更多的選擇性參數在如下網址中：

http://scikit-learn.org/0.13/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC

在SVM中，我嘗試變換了一下核心功能，除了kernel=’sigmoid‘效果比較差，其他的效果差別不大。

<span style="font-family:Microsoft YaHei;"># To apply a classifier on this data, we need to flatten the image, to# turn the data in a (samples, feature) matrix:n_samples = len(digits.images)data = digits.images.reshape((n_samples, -1))# Create a classifier: a support vector classifierclassifier = svm.SVC(gamma=0.001,kernel='poly')# We learn the digits on the first half of the digitsclassifier.fit(data[:n_samples / 2], digits.target[:n_samples / 2])</span>

之後是訓練和測試環節，在這裡它將所有的資料分成了兩部分。一半去做訓練集，一半去做測試集。

<span style="font-family:Microsoft YaHei;"># Now predict the value of the digit on the second half:expected = digits.target[n_samples / 2:]predicted = classifier.predict(data[n_samples / 2:])print("Classification report for classifier %s:\n%s\n"      % (classifier, metrics.classification_report(expected, predicted)))print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))</span>

這裡說一下測試的參數。首先是precision，recall，f1-score，support這四個參數。

f1-score是通過precision，recall兩者算出來的。計算公式如：

support為支援度，表示識別出來的資料個數。

其次是混淆矩陣：在映像精度評價中，主要用於比較分類結果和實際測得值，可以把分類結果的精度顯示在一個混淆矩陣裡面。混淆矩陣是通過將每個實測像元的位置和分類與分類映像中的相應位置和分類像比較計算的。混淆矩陣的每一列代表了實際測得資訊，每一列中的數值等於實際測得像元在分類圖象中對應於相應類別的數量；混淆矩陣的每一行代表了資料的分類資訊，每一行中的數值等於分類像元在實測像元相應類別中的數量。

之後將幾個測試集中的資料畫下來就好啦~

<span style="font-family:Microsoft YaHei;">images_and_predictions = list(zip(digits.images[n_samples / 2:], predicted))for index, (image, prediction) in enumerate(images_and_predictions[:4]):    plt.subplot(2, 4, index + 5)    plt.axis('off')    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')    plt.title('Prediction: %i' % prediction)</span>

原例網址

http://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html#example-classification-plot-digits-classification-py

Python scikit-learn 學習筆記—手寫數字識別

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More