Basic machine learning for Sklearn (classification method)

Last Update:2018-07-26 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

KNN principle:

There is a collection of sample data, also called a training sample set, and there is a label for each data in the sample set, that is, we know the correspondence between each data in the sample set and the owning category. After entering new data with no labels, each feature of the new data is compared with the characteristics of the data in the sample set, and the algorithm extracts the category labels of the most similar data (nearest neighbor) in the sample set. In general, only the first k most similar data in the sample data set is selected, which is the source of the KNN algorithm K, usually K is an integer not greater than 20. Finally, select the most frequently occurring classification of the K most similar data as the classification of the new data.

Code:

#-*-coding:utf-8-*-from Sklearn import datasets #导入内置数据集模块 from sklearn.neighbors import Kneighborsclassifier #导入sklear In the N.neighbors module, the KNN class import NumPy as NP Iris=datasets.load_iris () #导入鸢尾花的数据集, Iris is a dataset with sample data inside Iris_x=iris.data iris_y= Iris.target indices = np.random.permutation (len (iris_x)) #permutation接收一个数作为参数 (150), produces a 0-149-dimensional array, but is randomly disturbed iris_x_ Train = iris_x[indices[:-10]] [#随机选取140个样本作为训练数据集 iris_y_train = iris_y[indices[:-10]] # and select the labels for these 140 samples as labels for the training dataset iris_x _test = iris_x[indices[-10:] # The remaining 10 samples as test data set iris_y_test = iris_y[indices[-10:] # and the remaining 10 samples corresponding to the label as the test data and the label KNN = Knei Ghborsclassifier () # defines a KNN classifier object Knn.fit (Iris_x_train, Iris_y_train) # calls the training method of the object, mainly receives two parameters: training data set and its sample label Iris_y_ predict = Knn.predict (iris_x_test) # calls the test method of the object, mainly receives a parameter: Test DataSet score = Knn.score (Iris_x_test, Iris_y_test, Sample_ Weight=none) # Call the object's scoring method to calculate the accuracy of print (' iris_y_predict = ') print (iris_y_predict) # Output test results print (' iris_y_test = ') prin T (iris_y_test) # outputs the correct label of the original test data set to make it easier to compare print ' accuracy: ', Score # output Accuracy calculation results

SVM principle:

SVM can be used to classify, be SVC, can be used to predict, or become a regression, is the SVR

Code:

From Sklearn import SVM

X = [[0, 0], [1, 1], [1, 0]]  # Training Sample
y = [0, 1, 1]  # training target
CLF = SVM. SVC ()  
clf.fit (X, y)  # Training SVC Model

result = Clf.predict ([2, 2])  # Predictive test sample
print result # to get a  predictive value

Additionally, a training model is added to load and save:

# Save the well-trained model to TRAIN_MODEL.M
joblib.dump (CLF, "train_model.m")
# model load
CLF = Joblib.load ("Train_ MODEL.M ")

Integrated approach stochastic Forest Principles:

Integrated learning solves a single prediction problem by building several model combinations. It works by generating multiple classifiers/models, learning and making predictions independently of each other. These predictions are finally combined into single predictions, thus making predictions better than any single classification. Random Forest is a sub-class of integrated learning.

Code:

#coding =utf-8 from

sklearn import datasets from
sklearn.ensemble import  randomforestclassifier
# Applying the iris DataSet
import numpy as NP
Iris=datasets.load_iris ()
#导入鸢尾花的数据集, Iris is a dataset with sample data inside
iris_x= Iris.data
iris_y=iris.target

indices = np.random.permutation (len (iris_x))
#permutation接收一个数作为参数 (150 ), produces a 0-149-dimensional array, but is randomly disturbed by
X_train = iris_x[indices[:-10]]
 #随机选取140个样本作为训练数据集
y_train = iris_y[ INDICES[:-10]
# and select the label of the 140 samples as the label of the training data set
X_test = iris_x[indices[-10:]
# # The remaining 10 samples as a test data set
Y_ Test = iris_y[indices[-10:]
# and the remaining 10 samples corresponding to the label as the test data and the label

#分类器: Free forest
clfs = {' Random_forest ': Randomforestclassifier (n_estimators=50)}

#构建分类器, training sample, predictive score
def try_different_method (CLF):
    clf.fit (X_train,y_train.ravel ())
    Score = Clf.score (X_test,y_test.ravel ())
    print (' The score is: ', score) for

Clf_key in Clfs.keys ():
    Print (' The classifier is: ', clf_key)
    CLF = Clfs[clf_key]
    try_different_method (CLF)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More