Python scikit-learn 學習筆記—PCA+SVMFace Service

最後更新：2015-05-16 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：sklearn 機器學習 python Face Service pca

Face Service是一項實用的技術。但是這種技術總是感覺非常神秘，在sklearn中看到了Face Service的example，代碼網址如下：

http://scikit-learn.org/0.13/auto_examples/applications/face_recognition.html#example-applications-face-recognition-py

首先介紹一些PCA和SVM的功能，PCA叫做主元分析，它可以從多元事物中解析出主要影響因素，揭示事物的本質，簡化複雜的問題。計算主成分的目的是將高維資料投影到較低維空間。

PCA 主要用於資料降維，對於一系列例子的特徵組成的多維向量，多維向量裡的某些元素本身沒有區分性，比如某個元素在所有的例子中都為1，或者與1差距不大，那麼這個元素本身就沒有區分性，用它做特徵來區分，貢獻會非常小。所以我們的目的是找那些變化大的元素，即方差大的那些維，而去除掉那些變化不大的維，從而使特徵留下的都是精品，而且計算量也變小了。

SVM叫做支援向量機，之前的部落格有所涉及有。SVM方法是通過一個非線性映射p，把樣本空間映射到一個高維乃至無窮維的特徵空間中，使得在原來的樣本空間中非線性可分的問題轉化為在特徵空間中的線性可分的問題。下面的部落格也有相關講解：

http://blog.csdn.net/viewcode/article/details/12840405

再看看實驗採用的資料集，資料集叫做Labeled Faces in the Wild。大約200M左右。整個有10000張圖片，5700個人，1700人有兩張或以上的照片。相關的網址：http://vis-www.cs.umass.edu/lfw/index.html

最後看一下代碼的實現吧

lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)n_samples, h, w = lfw_people.images.shapeX = lfw_people.datan_features = X.shape[1]y = lfw_people.targettarget_names = lfw_people.target_namesn_classes = target_names.shape[0]print "Total dataset size:"print "n_samples: %d" % n_samplesprint "n_features: %d" % n_featuresprint "n_classes: %d" % n_classes

這一段負責下載資料，並且把資料的維度顯示出來。

<span style="font-size:14px;">n_components = 150print "Extracting the top %d eigenfaces from %d faces" % (    n_components, X_train.shape[0])t0 = time()pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)print "done in %0.3fs" % (time() - t0)eigenfaces = pca.components_.reshape((n_components, h, w))print "Projecting the input data on the eigenfaces orthonormal basis"t0 = time()X_train_pca = pca.transform(X_train)X_test_pca = pca.transform(X_test)print "done in %0.3fs" % (time() - t0)</span>

這一段就是條用了PCA的演算法，PCA的reference網址：

http://scikit-learn.org/0.13/modules/generated/sklearn.decomposition.RandomizedPCA.html#sklearn.decomposition.RandomizedPCA

<span style="font-family:Microsoft YaHei;font-size:14px;">print "Fitting the classifier to the training set"t0 = time()param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],              'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }clf = GridSearchCV(SVC(kernel='rbf', class_weight='auto'), param_grid)clf = clf.fit(X_train_pca, y_train)print "done in %0.3fs" % (time() - t0)print "Best estimator found by grid search:"print clf.best_estimator_</span>

真正用於訓練的資料不多。

這一段調用了SVM的演算法，還用了相關的網格搜尋尋找最佳的參數C和gamma，SVM的reference：

http://scikit-learn.org/0.13/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC

<span style="font-size:14px;">print "Predicting the people names on the testing set"t0 = time()y_pred = clf.predict(X_test_pca)print "done in %0.3fs" % (time() - t0)print classification_report(y_test, y_pred, target_names=target_names)print confusion_matrix(y_test, y_pred, labels=range(n_classes))def plot_gallery(images, titles, h, w, n_row=3, n_col=4):    """Helper function to plot a gallery of portraits"""    pl.figure(figsize=(1.8 * n_col, 2.4 * n_row))    pl.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)    for i in range(n_row * n_col):        pl.subplot(n_row, n_col, i + 1)        pl.imshow(images[i].reshape((h, w)), cmap=pl.cm.gray)        pl.title(titles[i], size=12)        pl.xticks(())        pl.yticks(())def title(y_pred, y_test, target_names, i):    pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]    true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]    return 'predicted: %s\ntrue:      %s' % (pred_name, true_name)prediction_titles = [title(y_pred, y_test, target_names, i)                     for i in range(y_pred.shape[0])]plot_gallery(X_test, prediction_titles, h, w)eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]plot_gallery(eigenfaces, eigenface_titles, h, w)pl.show()</span>

剩下的就是測試一下，並且把識別的圖簡單的Po出來，效果如下：

一個是識別的圖，一個是特徵圖。趕緊去試一試吧~

Python scikit-learn 學習筆記—PCA+SVMFace Service

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More