Scikit-learn Machine learning Module (next)

Last Update:2018-07-31 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cross-validation

The general method of dividing only 70% and 30% into training sets and test sets is called simple cross-validation. Another k-fold cross-validation means dividing the data into K-parts, taking the K-1 as training, and the other as the test set, taking the average generalization error as the least model, specifically:

CLF = Kneighborsclassifier ()

>>> from sklearn.model_selection import cross_val_score >>> cross_val_score (CLF, X, y, CV = 5)

Alternatively, CV = shufflesplit (n_splits = 5) can be used to pass the CV to

A list containing 5 generalization errors will be returned.

using cross validation for super parameter optimization

For the regularization linear model, there are Ridge regression (L2 regularization) and lasso regression (L1), so that when

From Sklearn.linear_model import Ridge, Lasso

Model = Ridge ()//Use the default alpha parameter, or you can supply alpha values

Cross_val_score (model, X, y). Mean () The average generalization error of cross-validation is obtained.

When we provide a series of alpha values, we can use the GRIDSEARCHCV function to automatically find the optimal alpha value:

From Sklearn.grid_search import GRIDSEARCHCV

GSCV = GRIDSEARCHCV (Model (), Dict (Alpha=alphas), cv=3). Fit (X, y)

Scikit-learn also provides an inline CV model, such as

From Sklearn.linear_model import Ridgecv, LASSOCV

Model = RIDGECV (Alphas=alphas, cv=3). Fit (X, y)

This method can get the same result as GRIDSEARCHCV, but if it is to do the verification of the model, it also needs to use the Cross_val_score function

Scores = Cross_val_score (Model (Alphas=alphas, cv=3), X, Y, cv=3)

non-supervised learning: reduction and visualization of peacekeeping

In (above) we refer to PCA, which provides dimension parameters, whitening parameters, and so on:

From sklearn.decomposition import PCA

>>> PCA = PCA (n_components = 2, whiten = True) >>> PCA. Fit (x) is a PCA model fitted with X data that can be used to reduce dimension mapping of new data:

X_PCA = Pca.transform (X)

At this time, the characteristics of X_PCA data decreased to 2 D, and the feature distribution after dimensionality reduction followed 0 mean, STD was 1

2-and 3-dimensional data features can be shown better in graphs

Manifold Learning Method : Sklearn.manifold.TSNE

The method is powerful, but for statistical analysis it is more difficult to control. In the case of numbers, 64-D features, mapping 2-D can achieve visualization:

# Fit and transform with a Tsne

>>> from sklearn.manifold import tsne >>> Tsne = Tsne (n_components = 2, random_state = 0) &GT;&GT;&G T x_2d = Tsne. Fit_transform (X) >>> # Visualize the data >>> plt. Scatter (x_2d [:, 0], x_2d [:, 1], C = y) This method is not available for new data, so you need to use Fit_transform ()

Example: Characteristic face (chaining Pca&svms)

About face data sets , there are labeled Faces in the Wild (http://vis-www.cs.umass.edu/lfw/), Sklearn offers

Sklearn.datasets.fetch_lfw_people ()

However, you can also use a lightweight

From Sklearn import datasets

faces = Datasets. Fetch_olivetti_faces ()

The same is divided into training sets and test sets:

From sklearn.model_selection import Train_test_split

X_train, X_test, y_train, y_test = Train_test_split (faces. Data, faces. Target, random_state = 0) (ii)

Next, there are too many primitive dimensions, so preprocessing via PCA :

From Sklearn import decomposition

PCA = decomposition. PCA (n_components = whiten = True) pca. Fit (X_train) Thus, you can get 150 main components (PCA.COMPONENTS_), which is the image of the original size, but the data will then be based on these 150 feature faces .：

X_TRAIN_PCA = Pca.transform (X_train)

X_TEST_PCA = PCA. Transform (x_test) reduced to 150 D

Three

Learning classification using SVM model

From Sklearn import SVM

CLF = SVM. SVC (C = 5, gamma = 0.001) clf. Fit (X_TRAIN_PCA, Y_train) visualization results:

Fig = Plt.figure (figsize= (8, 6))

For I in range: Ax = Fig. Add_subplot (3, 5, i + 1, xticks = [], yticks = []) ax. Imshow (X_test [i]. Reshape (faces. images [0]. Shape), CMap = plt. cm. Bone) y_pred = CLF. Predict (X_TEST_PCA [I, NP. Newaxis]) [0] color = (' black ' if y_pred = = Y_test [i] else ' red ') ax. Set_title (faces. Target [y_pred], fontsize = ' small ', color = color)

In addition, Sklearn also provides a pipelining method that can be directly embedded together in two models:

From Sklearn.pipeline Import pipeline

CLF = Pipeline ([' PCA ', decomposition. PCA (</

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More