Basic machine learning for Sklearn (classification method)

Source: Internet
Author: User
Tags svm

1.

KNN principle:

There is a collection of sample data, also called a training sample set, and there is a label for each data in the sample set, that is, we know the correspondence between each data in the sample set and the owning category. After entering new data with no labels, each feature of the new data is compared with the characteristics of the data in the sample set, and the algorithm extracts the category labels of the most similar data (nearest neighbor) in the sample set. In general, only the first k most similar data in the sample data set is selected, which is the source of the KNN algorithm K, usually K is an integer not greater than 20. Finally, select the most frequently occurring classification of the K most similar data as the classification of the new data.


Code:

#-*-coding:utf-8-*-from Sklearn import datasets #导入内置数据集模块 from sklearn.neighbors import Kneighborsclassifier #导入sklear In the N.neighbors module, the KNN class import NumPy as NP Iris=datasets.load_iris () #导入鸢尾花的数据集, Iris is a dataset with sample data inside Iris_x=iris.data iris_y= Iris.target indices = np.random.permutation (len (iris_x)) #permutation接收一个数作为参数 (150), produces a 0-149-dimensional array, but is randomly disturbed iris_x_ Train = iris_x[indices[:-10]] [#随机选取140个样本作为训练数据集 iris_y_train = iris_y[indices[:-10]] # and select the labels for these 140 samples as labels for the training dataset iris_x _test = iris_x[indices[-10:] # The remaining 10 samples as test data set iris_y_test = iris_y[indices[-10:] # and the remaining 10 samples corresponding to the label as the test data and the label KNN = Knei Ghborsclassifier () # defines a KNN classifier object Knn.fit (Iris_x_train, Iris_y_train) # calls the training method of the object, mainly receives two parameters: training data set and its sample label Iris_y_ predict = Knn.predict (iris_x_test) # calls the test method of the object, mainly receives a parameter: Test DataSet score = Knn.score (Iris_x_test, Iris_y_test, Sample_ Weight=none) # Call the object's scoring method to calculate the accuracy of print (' iris_y_predict = ') print (iris_y_predict) # Output test results print (' iris_y_test = ') prin T (iris_y_test) # outputs the correct label of the original test data set to make it easier to compare print ' accuracy: ', Score # output Accuracy calculation results
 

2.

SVM principle:

SVM can be used to classify, be SVC, can be used to predict, or become a regression, is the SVR


Code:

From Sklearn import SVM

X = [[0, 0], [1, 1], [1, 0]]  # Training Sample
y = [0, 1, 1]  # training target
CLF = SVM. SVC ()  
clf.fit (X, y)  # Training SVC Model

result = Clf.predict ([2, 2])  # Predictive test sample
print result # to get a  predictive value

Additionally, a training model is added to load and save:

# Save the well-trained model to TRAIN_MODEL.M
joblib.dump (CLF, "train_model.m")
# model load
CLF = Joblib.load ("Train_ MODEL.M ")
3.

Integrated approach stochastic Forest Principles:

Integrated learning solves a single prediction problem by building several model combinations. It works by generating multiple classifiers/models, learning and making predictions independently of each other. These predictions are finally combined into single predictions, thus making predictions better than any single classification. Random Forest is a sub-class of integrated learning.

Code:

#coding =utf-8 from

sklearn import datasets from
sklearn.ensemble import  randomforestclassifier
# Applying the iris DataSet
import numpy as NP
Iris=datasets.load_iris ()
#导入鸢尾花的数据集, Iris is a dataset with sample data inside
iris_x= Iris.data
iris_y=iris.target

indices = np.random.permutation (len (iris_x))
#permutation接收一个数作为参数 (150 ), produces a 0-149-dimensional array, but is randomly disturbed by
X_train = iris_x[indices[:-10]]
 #随机选取140个样本作为训练数据集
y_train = iris_y[ INDICES[:-10]
# and select the label of the 140 samples as the label of the training data set
X_test = iris_x[indices[-10:]
# # The remaining 10 samples as a test data set
Y_ Test = iris_y[indices[-10:]
# and the remaining 10 samples corresponding to the label as the test data and the label

#分类器: Free forest
clfs = {' Random_forest ': Randomforestclassifier (n_estimators=50)}

#构建分类器, training sample, predictive score
def try_different_method (CLF):
    clf.fit (X_train,y_train.ravel ())
    Score = Clf.score (X_test,y_test.ravel ())
    print (' The score is: ', score) for

Clf_key in Clfs.keys ():
    Print (' The classifier is: ', clf_key)
    CLF = Clfs[clf_key]
    try_different_method (CLF)



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.