Machine Learning (iv) classification algorithm--k nearest neighbor algorithm KNN

Source: Internet
Author: User
Tags assert

First, K Nearest Neighbor Algorithm Foundation

KNN-------K-Nearest neighbor algorithm--------K-nearest Neighbors

Thought is extremely simple

Less applied Mathematics (nearly 0)

Good effect (disadvantage?) )

Can explain many of the details of the machine learning algorithm use process

A more complete process for characterizing machine learning applications

ImportNumPy as NPImportMatplotlib.pyplot as PLT implements our own KNN create a simple test case raw_data_x= [[3.393533211, 2.331273381],              [3.110073483, 1.781539638],              [1.343808831, 3.368360954],              [3.582294042, 4.679179110],              [2.280362439, 2.866990263],              [7.423436942, 4.696522875],              [5.745051997, 3.533989803],              [9.172168622, 2.511101045],              [7.792783481, 3.424088941],              [7.939820817, 0.791637231]]raw_data_y= [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]x_train=Np.array (raw_data_x) Y_train=Np.array (raw_data_y) X_trainarray ([[3.39353321, 2.33127338],       [ 3.11007348, 1.78153964],       [ 1.34380883, 3.36836095],       [ 3.58229404, 4.67917911],       [ 2.28036244, 2.86699026],       [ 7.42343694, 4.69652288],       [ 5.745052, 3.5339898 ],       [ 9.17216862, 2.51110105],       [ 7.79278348, 3.42408894],       [ 7.93982082, 0.79163723]]) Y_trainarray ([0, 0, 0, 0, 0,1, 1, 1, 1, 1])

the process of KNN
 fromMathImportsqrtdistances= [] forX_traininchx_train:d= sqrt (np.sum (x_train-x) **2)) Distances.append (d) distances[4.812566907609877, 5.229270827235305, 6.749798999160064, 4.6986266144110695, 5.83460014556857, 1.4900114024329525, 2.354574897431513, 1.3761132675144652, 0.3064319992975, 2.5786840957478887]distances= [sqrt (np.sum (x_train-x) **2))              forX_traininchx_train]distances[4.812566907609877, 5.229270827235305, 6.749798999160064, 4.6986266144110695, 5.83460014556857, 1.4900114024329525, 2.354574897431513, 1.3761132675144652, 0.3064319992975, 2.5786840957478887]np.argsort (distances) array ([8, 7, 5, 6, 9, 3, 0, 1, 4, 2]) Nearest=Np.argsort (distances) K= 6topk_y= [Y_train[neighbor] forNeighborinchnearest[:k]]topk_y[1, 1, 1, 1, 1, 0] fromCollectionsImportcountervotes=Counter (topk_y) Votescounter ({0:1, 1:5}) Votes.most_common (1)[(1, 5)]predict_y= Votes.most_common (1) [0][0]predict_y1

Second, the Machine learning algorithm encapsulation in Scikit-learn
knn/knnn.py

ImportNumPy as NP fromMathImportsqrt fromCollectionsImportCounterclassKnnclassifier:def __init__(self, k):"""Initialize the KNN classifier"""        assertK >= 1,"k must be valid"SELF.K=k Self._x_train=None Self._y_train=NonedefFit (self, X_train, Y_train):"""Training KNN classifier based on training data set X_train and Y_train"""        assertX_train.shape[0] = =Y_train.shape[0],"The size of X_train must is equal to the size of Y_train"        assertSELF.K <=X_train.shape[0],"The size of X_train must is at least K."Self._x_train=X_train Self._y_train=Y_trainreturn SelfdefPredict (self, x_predict):"""Returns a result vector representing x_predict given the x_predict of the data set to be predicted"""        assertSelf._x_train is  notNone andSelf._y_train is  notNone,"must fit before predict!"        assertX_PREDICT.SHAPE[1] = = self._x_train.shape[1],                 "The feature number of x_predict must be equal to X_train"y_predict= [Self._predict (x) forXinchX_predict]returnNp.array (y_predict)def_predict (self, x):"""returns the predicted result value of x given a single data to be predicted x"""        assertX.shape[0] = = self._x_train.shape[1],             "The feature number of x must is equal to X_train"Distances= [sqrt (np.sum (x_train-x) * * 2))                      forX_traininchSelf._x_train] Nearest=Np.argsort (distances) topk_y= [Self._y_train[i] forIinchNEAREST[:SELF.K]] votes=Counter (topk_y)returnVotes.most_common (1) [0][0]def __repr__(self):return "KNN (k=%d)"% SELF.K

knn_function/knn.py

ImportNumPy as NP fromMathImportsqrt fromCollectionsImportCounterdefknn_classify (k, X_train, Y_train, X):assert1 <= k <= X_train.shape[0],"k must be valid"    assertX_train.shape[0] = =Y_train.shape[0],"The size of x_train must equal to the size of Y_train"    assertX_TRAIN.SHAPE[1] = =X.shape[0],"The feature number of x must is equal to X_train"Distances= [sqrt (Np.sum ((x_train-x) **2)) forX_traininchX_train] Nearest=Np.argsort (distances) topk_y= [Y_train[i] forIinchNearest[:k]] votes=Counter (topk_y)returnVotes.most_common (1) [0][0]

III. training data sets, test data sets

Determine the performance of machine learning algorithms

playml/knn.py

ImportNumPy as NP fromMathImportsqrt fromCollectionsImportCounterclassKnnclassifier:def __init__(self, k):"""Initialize the KNN classifier"""        assertK >= 1,"k must be valid"SELF.K=k Self._x_train=None Self._y_train=NonedefFit (self, X_train, Y_train):"""Training KNN classifier based on training data set X_train and Y_train"""        assertX_train.shape[0] = =Y_train.shape[0],"The size of X_train must is equal to the size of Y_train"        assertSELF.K <=X_train.shape[0],"The size of X_train must is at least K."Self._x_train=X_train Self._y_train=Y_trainreturn SelfdefPredict (self, x_predict):"""Returns a result vector representing x_predict given the x_predict of the data set to be predicted"""        assertSelf._x_train is  notNone andSelf._y_train is  notNone,"must fit before predict!"        assertX_PREDICT.SHAPE[1] = = self._x_train.shape[1],                 "The feature number of x_predict must be equal to X_train"y_predict= [Self._predict (x) forXinchX_predict]returnNp.array (y_predict)def_predict (self, x):"""returns the predicted result value of x given a single data to be predicted x"""        assertX.shape[0] = = self._x_train.shape[1],             "The feature number of x must is equal to X_train"Distances= [sqrt (np.sum (x_train-x) * * 2))                      forX_traininchSelf._x_train] Nearest=Np.argsort (distances) topk_y= [Self._y_train[i] forIinchNEAREST[:SELF.K]] votes=Counter (topk_y)returnVotes.most_common (1) [0][0]def __repr__(self):return "KNN (k=%d)"% SELF.K

playml/model_selection.py

ImportNumPy as NPdefTrain_test_split (X, y, test_ratio=0.2, seed=None):"""divide data X and y by Test_ratio into X_train, X_test, Y_train, Y_test"""    assertX.shape[0] = =Y.shape[0],"The size of X must is equal to the size of Y"    assert0.0 <= Test_ratio <= 1.0,         "test_ration must be valid"    ifseed:np.random.seed (seed) shuffled_indexes=np.random.permutation (len (X)) Test_size= Int (len (X) *test_ratio) test_indexes=Shuffled_indexes[:test_size] Train_indexes=shuffled_indexes[test_size:] X_train=X[train_indexes] Y_train=Y[train_indexes] X_test=X[test_indexes] Y_test=Y[test_indexes]returnX_train, X_test, Y_train, y_test

playml/__init__.py

Machine Learning (iv) classification algorithm--k nearest neighbor algorithm KNN

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.