Scikit-learn Machine Learning Module (PART I)

Source: Internet
Author: User
Tags svm

the data in the Scikit-learn

data Format : 2-D array or matrix, [N_samples, N_features]

contains DataSet: Iris data, digits data, Boston data (housing price), diabetes data for example:

From sklearn.datasets import Load_iris

   
   >>> iris = Load_iris ()--> which contains Iris.data and Iris.target
We can go through print (data. DESCR) To view more information about the dataset


the basic principle of machine learning in Scikit-learn

linear regression :

From Sklearn.linear_model import linearregression
Parameters in the model can be initialized, such as:

Model = Linearregression (normalize=true)
When given the training data x and Y, the model fit requires only the invocation of:

Model.fit (X, y)
In addition, you can see the training coefficients by calling the model's COEF_ value


Nearest neighbor algorithm :

From Sklearn Import Neighbors
The neighbors contains the KNN algorithm model, which is set by the following call (parameter sets the number of nearest neighbor N):

KNN = neighbors. Kneighborsclassifier (N_neighbors=1)

   
   knn.fit (X, y)
Because the KNN algorithm does not need to train, the forecast sample can find the nearest sample directly through the given sample to classify accordingly:

Knn.predict (x), for example x = [[3, 5, 4, 2]]

linear SVM Classification :

From SKLEARN.SVM import linearsvc
Linearsvc (loss= ' L1 ') or L2



From the above two examples can be seen, we have different types of algorithms "estimator" given to model variables, model in the training samples to learn, only need to call Model.fit (X, y);

For a supervised evaluator, the way to predict new data is: Model.predict (x_new)

For classification problems, some evaluators provide the Model.predict_proba () method, which returns the probability of each category, and the most likely category corresponds to the model.predict ()

For unsupervised evaluator, the feature can be converted, unsupervised conversion refers only to the conversion of statistical information using features, including mean value, standard deviation, boundary and so on, such as standardization, PCA method dimensionality reduction and so on.

For example, the difference between model.transform () and Model.fit_transform (x, y = None) is

Fit_transform need to fit the data first, the fitting described here is not the kind of fitting that contains the target Y, but rather the corresponding statistic information, such as mean and standard deviation, according to the given data .

and transform is generally used for testing data, do not need to be fitted, but directly using training data to fit good statistical information, such as mean and standard deviation, the test data processing;

Other model methods need to be used to check again.


Data dimensionality reduction PCA

PCA, principal component analysis, the data can be reduced dimension, in the case of hand characters:

From sklearn.decomposition import PCA

   
   >>> PCA = PCA (n_components=2) # #降至2个维度

   
   >>> proj = Pca.fit_transform (digits.data) # #

   
   

Gauss naive Bayesian classification

Gauss naive Bayesian Classification method is a simple and fast method, if the simple and fast method enough to make the results satisfactory, then do not waste too much CPU resource design complex algorithm-->sklearn.naive_bayes. Gaussiannb

Gauss Naive Bayes makes a Gaussian fitting of each data of the same label, and then makes a rough classification of the test data, although the fitting of the real world is not very accurate, but also very good, especially for the text data

From Sklearn.naive_bayes import GAUSSIANNB
From sklearn.model_selection import Train_test_split
Train_test_split can automatically randomly divide data into training sets and test sets:

X_train, X_test, y_train, y_test = Train_test_split (Digits.data, Digits.target)
This algorithm calls the method to be consistent with the above, the concrete use to study the parameters

CLF = GAUSSIANNB ()
Clf.fit (X_train, Y_train)
When testing

predicted = Clf.predict (x_test)


Quantitative Analysis of the results

There are a number of mature metrics in the module sklearn.metrics:

From Sklearn import metrics

   
   >>> print (Metrics.classification_report (expected, predicted))
For the classification of the evaluation, will return precision precision, recall recall, f1-score and support

The other is a confusing matrix, which is invoked as follows:

Metrics.confusion_matrix (expected, predicted)
can help us see the error points of each category

Sometimes we can draw the relationship between each dimension's features and results, and manually select useful features.


gradient boosting tree regression

GBT is a very powerful regression tree.

From sklearn.ensemble import Gradientboostingregressor
CLF = Gradientboostingregressor ()

   
   clf.fit (X_train, y_train)

   
   

   
   predicted = Clf.predict (x_test)


Other than that:

(regression) Decision tree decision

From Sklearn.tree import Decisiontreeregressor


Cond......









Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.