Scikit-learn Machine learning Module (next)

Source: Internet
Author: User
Tags svm

Cross-validation

The general method of dividing only 70% and 30% into training sets and test sets is called simple cross-validation. Another k-fold cross-validation means dividing the data into K-parts, taking the K-1 as training, and the other as the test set, taking the average generalization error as the least model, specifically:

CLF = Kneighborsclassifier ()
>>> from sklearn.model_selection import cross_val_score >>> cross_val_score (CLF, X, y, CV = 5)
Alternatively, CV = shufflesplit (n_splits = 5) can be used to pass the CV to
A list containing 5 generalization errors will be returned.


using cross validation for super parameter optimization

For the regularization linear model, there are Ridge regression (L2 regularization) and lasso regression (L1), so that when

From Sklearn.linear_model import Ridge, Lasso
Model = Ridge ()//Use the default alpha parameter, or you can supply alpha values

Cross_val_score (model, X, y). Mean () The average generalization error of cross-validation is obtained.


When we provide a series of alpha values, we can use the GRIDSEARCHCV function to automatically find the optimal alpha value:

From Sklearn.grid_search import GRIDSEARCHCV
GSCV = GRIDSEARCHCV (Model (), Dict (Alpha=alphas), cv=3). Fit (X, y)

Scikit-learn also provides an inline CV model, such as

From Sklearn.linear_model import Ridgecv, LASSOCV
Model = RIDGECV (Alphas=alphas, cv=3). Fit (X, y)
This method can get the same result as GRIDSEARCHCV, but if it is to do the verification of the model, it also needs to use the Cross_val_score function

Scores = Cross_val_score (Model (Alphas=alphas, cv=3), X, Y, cv=3)

non-supervised learning: reduction and visualization of peacekeeping

In (above) we refer to PCA, which provides dimension parameters, whitening parameters, and so on:

From sklearn.decomposition import PCA
>>> PCA = PCA (n_components = 2, whiten = True) >>> PCA. Fit (x) is a PCA model fitted with X data that can be used to reduce dimension mapping of new data:

X_PCA = Pca.transform (X)
At this time, the characteristics of X_PCA data decreased to 2 D, and the feature distribution after dimensionality reduction followed 0 mean, STD was 1

2-and 3-dimensional data features can be shown better in graphs


Manifold Learning Method : Sklearn.manifold.TSNE

The method is powerful, but for statistical analysis it is more difficult to control. In the case of numbers, 64-D features, mapping 2-D can achieve visualization:

# Fit and transform with a Tsne
>>> from sklearn.manifold import tsne >>> Tsne = Tsne (n_components = 2, random_state = 0) >>&G T x_2d = Tsne. Fit_transform (X) >>> # Visualize the data >>> plt. Scatter (x_2d [:, 0], x_2d [:, 1], C = y) This method is not available for new data, so you need to use Fit_transform ()


Example: Characteristic face (chaining Pca&svms)

A

About face data sets , there are labeled Faces in the Wild (http://vis-www.cs.umass.edu/lfw/), Sklearn offers

Sklearn.datasets.fetch_lfw_people ()

However, you can also use a lightweight

From Sklearn import datasets
faces = Datasets. Fetch_olivetti_faces ()

The same is divided into training sets and test sets:

From sklearn.model_selection import Train_test_split
X_train, X_test, y_train, y_test = Train_test_split (faces. Data, faces. Target, random_state = 0) (ii)

Next, there are too many primitive dimensions, so preprocessing via PCA :

From Sklearn import decomposition
PCA = decomposition. PCA (n_components = whiten = True) pca. Fit (X_train) Thus, you can get 150 main components (PCA.COMPONENTS_), which is the image of the original size, but the data will then be based on these 150 feature faces .

X_TRAIN_PCA = Pca.transform (X_train)
X_TEST_PCA = PCA. Transform (x_test) reduced to 150 D


Three

Learning classification using SVM model

From Sklearn import SVM
CLF = SVM. SVC (C = 5, gamma = 0.001) clf. Fit (X_TRAIN_PCA, Y_train) visualization results:
Fig = Plt.figure (figsize= (8, 6))
For I in range: Ax = Fig. Add_subplot (3, 5, i + 1, xticks = [], yticks = []) ax. Imshow (X_test [i]. Reshape (faces. images [0]. Shape), CMap = plt. cm. Bone) y_pred = CLF. Predict (X_TEST_PCA [I, NP. Newaxis]) [0] color = (' black ' if y_pred = = Y_test [i] else ' red ') ax. Set_title (faces. Target [y_pred], fontsize = ' small ', color = color)

In addition, Sklearn also provides a pipelining method that can be directly embedded together in two models:

From Sklearn.pipeline Import pipeline
CLF = Pipeline ([' PCA ', decomposition. PCA (</

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.