Cross-validation
The general method of dividing only 70% and 30% into training sets and test sets is called simple cross-validation. Another k-fold cross-validation means dividing the data into K-parts, taking the K-1 as training, and the other as the test set, taking the average generalization error as the least model, specifically:
CLF = Kneighborsclassifier ()
>>> from sklearn.model_selection import cross_val_score >>> cross_val_score (CLF, X, y, CV = 5)
Alternatively, CV = shufflesplit (n_splits = 5) can be used to pass the CV to
A list containing 5 generalization errors will be returned.
using cross validation for super parameter optimization
For the regularization linear model, there are Ridge regression (L2 regularization) and lasso regression (L1), so that when
From Sklearn.linear_model import Ridge, Lasso
Model = Ridge ()//Use the default alpha parameter, or you can supply alpha values
Cross_val_score (model, X, y). Mean () The average generalization error of cross-validation is obtained.
When we provide a series of alpha values, we can use the GRIDSEARCHCV function to automatically find the optimal alpha value:
From Sklearn.grid_search import GRIDSEARCHCV
GSCV = GRIDSEARCHCV (Model (), Dict (Alpha=alphas), cv=3). Fit (X, y)
Scikit-learn also provides an inline CV model, such as
From Sklearn.linear_model import Ridgecv, LASSOCV
Model = RIDGECV (Alphas=alphas, cv=3). Fit (X, y)
This method can get the same result as GRIDSEARCHCV, but if it is to do the verification of the model, it also needs to use the Cross_val_score function
Scores = Cross_val_score (Model (Alphas=alphas, cv=3), X, Y, cv=3)
non-supervised learning: reduction and visualization of peacekeeping
In (above) we refer to PCA, which provides dimension parameters, whitening parameters, and so on:
From sklearn.decomposition import PCA
>>> PCA = PCA (n_components = 2, whiten = True) >>> PCA. Fit (x) is a PCA model fitted with X data that can be used to reduce dimension mapping of new data:
X_PCA = Pca.transform (X)
At this time, the characteristics of X_PCA data decreased to 2 D, and the feature distribution after dimensionality reduction followed 0 mean, STD was 1
2-and 3-dimensional data features can be shown better in graphs
Manifold Learning Method : Sklearn.manifold.TSNE
The method is powerful, but for statistical analysis it is more difficult to control. In the case of numbers, 64-D features, mapping 2-D can achieve visualization:
# Fit and transform with a Tsne
>>> from sklearn.manifold import tsne >>> Tsne = Tsne (n_components = 2, random_state = 0) >>&G T x_2d = Tsne. Fit_transform (X) >>> # Visualize the data >>> plt. Scatter (x_2d [:, 0], x_2d [:, 1], C = y) This method is not available for new data, so you need to use Fit_transform ()
Example: Characteristic face (chaining Pca&svms)
A
About face data sets , there are labeled Faces in the Wild (http://vis-www.cs.umass.edu/lfw/), Sklearn offers
Sklearn.datasets.fetch_lfw_people ()
However, you can also use a lightweight
From Sklearn import datasets
faces = Datasets. Fetch_olivetti_faces ()
The same is divided into training sets and test sets:
From sklearn.model_selection import Train_test_split
X_train, X_test, y_train, y_test = Train_test_split (faces. Data, faces. Target, random_state = 0) (ii)
Next, there are too many primitive dimensions, so preprocessing via PCA :
From Sklearn import decomposition
PCA = decomposition. PCA (n_components = whiten = True) pca. Fit (X_train) Thus, you can get 150 main components (PCA.COMPONENTS_), which is the image of the original size, but
the data will then be based on these 150 feature faces .:
X_TRAIN_PCA = Pca.transform (X_train)
X_TEST_PCA = PCA. Transform (x_test) reduced to 150 D
Three
Learning classification using SVM model
From Sklearn import SVM
CLF = SVM. SVC (C = 5, gamma = 0.001) clf. Fit (X_TRAIN_PCA, Y_train) visualization results:
Fig = Plt.figure (figsize= (8, 6))
For I in range: Ax = Fig. Add_subplot (3, 5, i + 1, xticks = [], yticks = []) ax. Imshow (X_test [i]. Reshape (faces. images [0]. Shape), CMap = plt. cm. Bone) y_pred = CLF. Predict (X_TEST_PCA [I, NP. Newaxis]) [0] color = (' black ' if y_pred = = Y_test [i] else ' red ') ax. Set_title (faces. Target [y_pred], fontsize = ' small ', color = color)
In addition, Sklearn also provides a pipelining method that can be directly embedded together in two models:
From Sklearn.pipeline Import pipeline
CLF = Pipeline ([' PCA ', decomposition. PCA (</