First introduce the principle of KNN:
KNN is classified by calculating the distance between the different eigenvalue values.
The overall idea is that if a sample is in the K most similar in the feature space (that is, the nearest neighbor in the feature space) Most of the samples belong to a category, then the sample belongs to that category as well.
K is usually an integer that is not greater than 20. In the KNN algorithm, the selected neighbors are the objects that have been correctly categorized. This method determines the category to which the sample is to be divided based on the category of the nearest one or several samples in the categorical decision-making.
The key problem to be solved by KNN algorithm is K-value selection, which directly affects the classification results.
If the large k value is chosen, it is equivalent to using the training example in the larger field to predict, the advantage is that it can reduce the learning estimation error, but the disadvantage is that the approximate error of learning will increase.
If you choose a smaller k value, which is equivalent to using a training instance in a smaller field to predict, the "learning" approximation error will decrease, and only training instances that are closer to or similar to the input instance will work on the prediction results, while the problem is that the "learning" estimate error will increase, in other words, The decrease of K value means that the whole model becomes complex and easy to fit;
Here is the KNN tensorflow implementation process, the code from GitHub, slightly modified:
Import NumPy as Npimport TensorFlow as tf# here is a test using TensorFlow's own data set, the following is the imported dataset code from Tensorflow.examples.tutorials.mnist Import input_datamnist = Input_data.read_data_sets ("/tmp/data/", One_hot=true) xtrain, Ytrain = Mnist.train.next_batch #从数据集中选取5000个样本作为训练集Xtest, ytest = mnist.test.next_batch #从数据集中选取200个样本作为测试集 # input Placeholder XTR = Tf.placeholder ("F Loat ", [None, 784]) Xte = Tf.placeholder (" float ", [784]) # calculates the L1 distance distance = tf.reduce_sum (Tf.abs (Tf.add (XTR, Tf.negative ( Xte)) (Reduction_indices=1)) # Gets the minimum distance index pred = tf.arg_min (distance, 0) #分类精确度accuracy = 0.# Initialize variable init = Tf.global_ Variables_initializer () # Runs the session, training the model with TF. Session () as Sess: # Run initialization sess.run (INIT) # Traverse test data for I in range (len (Xtest)): # Gets the nearest neighbor index of the current sample nn _index = Sess.run (pred, Feed_dict={xtr:xtrain, Xte:xtest[i,:]}) #向占位符传入训练数据 # nearest Neighbor category label compared to real label print ("Tes T ", I," prediction: ", Np.argmax (Ytr[nn_index])," True Class: ", Np.argmax (Ytest[i])) # Calculate accuracy if n P.aRgmax (Ytrain[nn_index]) = = Np.argmax (Ytest[i]): Accuracy + = 1./len (Xtest) print ("done!") Print ("Accuracy:", accuracy)
The above is the use of TensorFlow to achieve KNN process.
Focus on:
The whole process of tensorflow is to design the calculation diagram first, then run the session, perform the process of calculation diagram, the whole process of data visibility is poor.
The calculation of the above accuracy and comparison of the real and predictive labels actually use numpy and Python variables.
TensorFlow implementation of KNN (K-nearest neighbor) algorithm