First, K Nearest Neighbor Algorithm Foundation
KNN-------K-Nearest neighbor algorithm--------K-nearest Neighbors
Thought is extremely simple
Less applied Mathematics (nearly 0)
Good effect (disadvantage?) )
Can explain many of the details of the machine learning algorithm use process
A more complete process for characterizing machine learning applications
ImportNumPy as NPImportMatplotlib.pyplot as PLT implements our own KNN create a simple test case raw_data_x= [[3.393533211, 2.331273381], [3.110073483, 1.781539638], [1.343808831, 3.368360954], [3.582294042, 4.679179110], [2.280362439, 2.866990263], [7.423436942, 4.696522875], [5.745051997, 3.533989803], [9.172168622, 2.511101045], [7.792783481, 3.424088941], [7.939820817, 0.791637231]]raw_data_y= [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]x_train=Np.array (raw_data_x) Y_train=Np.array (raw_data_y) X_trainarray ([[3.39353321, 2.33127338], [ 3.11007348, 1.78153964], [ 1.34380883, 3.36836095], [ 3.58229404, 4.67917911], [ 2.28036244, 2.86699026], [ 7.42343694, 4.69652288], [ 5.745052, 3.5339898 ], [ 9.17216862, 2.51110105], [ 7.79278348, 3.42408894], [ 7.93982082, 0.79163723]]) Y_trainarray ([0, 0, 0, 0, 0,1, 1, 1, 1, 1])
the process of KNN
fromMathImportsqrtdistances= [] forX_traininchx_train:d= sqrt (np.sum (x_train-x) **2)) Distances.append (d) distances[4.812566907609877, 5.229270827235305, 6.749798999160064, 4.6986266144110695, 5.83460014556857, 1.4900114024329525, 2.354574897431513, 1.3761132675144652, 0.3064319992975, 2.5786840957478887]distances= [sqrt (np.sum (x_train-x) **2)) forX_traininchx_train]distances[4.812566907609877, 5.229270827235305, 6.749798999160064, 4.6986266144110695, 5.83460014556857, 1.4900114024329525, 2.354574897431513, 1.3761132675144652, 0.3064319992975, 2.5786840957478887]np.argsort (distances) array ([8, 7, 5, 6, 9, 3, 0, 1, 4, 2]) Nearest=Np.argsort (distances) K= 6topk_y= [Y_train[neighbor] forNeighborinchnearest[:k]]topk_y[1, 1, 1, 1, 1, 0] fromCollectionsImportcountervotes=Counter (topk_y) Votescounter ({0:1, 1:5}) Votes.most_common (1)[(1, 5)]predict_y= Votes.most_common (1) [0][0]predict_y1
Second, the Machine learning algorithm encapsulation in Scikit-learn
knn/knnn.py
ImportNumPy as NP fromMathImportsqrt fromCollectionsImportCounterclassKnnclassifier:def __init__(self, k):"""Initialize the KNN classifier""" assertK >= 1,"k must be valid"SELF.K=k Self._x_train=None Self._y_train=NonedefFit (self, X_train, Y_train):"""Training KNN classifier based on training data set X_train and Y_train""" assertX_train.shape[0] = =Y_train.shape[0],"The size of X_train must is equal to the size of Y_train" assertSELF.K <=X_train.shape[0],"The size of X_train must is at least K."Self._x_train=X_train Self._y_train=Y_trainreturn SelfdefPredict (self, x_predict):"""Returns a result vector representing x_predict given the x_predict of the data set to be predicted""" assertSelf._x_train is notNone andSelf._y_train is notNone,"must fit before predict!" assertX_PREDICT.SHAPE[1] = = self._x_train.shape[1], "The feature number of x_predict must be equal to X_train"y_predict= [Self._predict (x) forXinchX_predict]returnNp.array (y_predict)def_predict (self, x):"""returns the predicted result value of x given a single data to be predicted x""" assertX.shape[0] = = self._x_train.shape[1], "The feature number of x must is equal to X_train"Distances= [sqrt (np.sum (x_train-x) * * 2)) forX_traininchSelf._x_train] Nearest=Np.argsort (distances) topk_y= [Self._y_train[i] forIinchNEAREST[:SELF.K]] votes=Counter (topk_y)returnVotes.most_common (1) [0][0]def __repr__(self):return "KNN (k=%d)"% SELF.K
knn_function/knn.py
ImportNumPy as NP fromMathImportsqrt fromCollectionsImportCounterdefknn_classify (k, X_train, Y_train, X):assert1 <= k <= X_train.shape[0],"k must be valid" assertX_train.shape[0] = =Y_train.shape[0],"The size of x_train must equal to the size of Y_train" assertX_TRAIN.SHAPE[1] = =X.shape[0],"The feature number of x must is equal to X_train"Distances= [sqrt (Np.sum ((x_train-x) **2)) forX_traininchX_train] Nearest=Np.argsort (distances) topk_y= [Y_train[i] forIinchNearest[:k]] votes=Counter (topk_y)returnVotes.most_common (1) [0][0]
III. training data sets, test data sets
Determine the performance of machine learning algorithms
playml/knn.py
ImportNumPy as NP fromMathImportsqrt fromCollectionsImportCounterclassKnnclassifier:def __init__(self, k):"""Initialize the KNN classifier""" assertK >= 1,"k must be valid"SELF.K=k Self._x_train=None Self._y_train=NonedefFit (self, X_train, Y_train):"""Training KNN classifier based on training data set X_train and Y_train""" assertX_train.shape[0] = =Y_train.shape[0],"The size of X_train must is equal to the size of Y_train" assertSELF.K <=X_train.shape[0],"The size of X_train must is at least K."Self._x_train=X_train Self._y_train=Y_trainreturn SelfdefPredict (self, x_predict):"""Returns a result vector representing x_predict given the x_predict of the data set to be predicted""" assertSelf._x_train is notNone andSelf._y_train is notNone,"must fit before predict!" assertX_PREDICT.SHAPE[1] = = self._x_train.shape[1], "The feature number of x_predict must be equal to X_train"y_predict= [Self._predict (x) forXinchX_predict]returnNp.array (y_predict)def_predict (self, x):"""returns the predicted result value of x given a single data to be predicted x""" assertX.shape[0] = = self._x_train.shape[1], "The feature number of x must is equal to X_train"Distances= [sqrt (np.sum (x_train-x) * * 2)) forX_traininchSelf._x_train] Nearest=Np.argsort (distances) topk_y= [Self._y_train[i] forIinchNEAREST[:SELF.K]] votes=Counter (topk_y)returnVotes.most_common (1) [0][0]def __repr__(self):return "KNN (k=%d)"% SELF.K
playml/model_selection.py
ImportNumPy as NPdefTrain_test_split (X, y, test_ratio=0.2, seed=None):"""divide data X and y by Test_ratio into X_train, X_test, Y_train, Y_test""" assertX.shape[0] = =Y.shape[0],"The size of X must is equal to the size of Y" assert0.0 <= Test_ratio <= 1.0, "test_ration must be valid" ifseed:np.random.seed (seed) shuffled_indexes=np.random.permutation (len (X)) Test_size= Int (len (X) *test_ratio) test_indexes=Shuffled_indexes[:test_size] Train_indexes=shuffled_indexes[test_size:] X_train=X[train_indexes] Y_train=Y[train_indexes] X_test=X[test_indexes] Y_test=Y[test_indexes]returnX_train, X_test, Y_train, y_test
playml/__init__.py
Machine Learning (iv) classification algorithm--k nearest neighbor algorithm KNN