Probe into the acceleration of numpy vector operation by K nearest neighbor algorithm
Anise Bean's "anise" word has ...
The k nearest neighbor algorithm is implemented using three ways to calculate the image distance:
1. The most basic double cycle
2. Using the BROADCA mechanism of numpy to realize single cycle
3. Using the mathematical properties of broadcast and matrices to achieve non-cyclic
The picture is stretched into a one-dimensional array
X_train: (Train_num, one-dimensional array)
X: (Test_num, one-dimensional array)
Method validation
Import NumPy as NPA = Np.array ([[[1,1,1],[2,2,2],[3,3,3]]) b = Np.array ([[4,4,4],[5,5,5],[6,6,6],[7,7,7]])
Double loop:
Dists = Np.zeros ((3,4)) for I in a range (3): for J in range (4): dists[i][j] = np.sqrt (Np.sum (Np.square (a[i)-b[j])) Print (dists)
[[5.19615242 6.92820323 8.66025404 10.39230485]
[3.46410162 5.19615242 6.92820323 8.66025404]
[1.73205081 3.46410162 5.19615242 6.92820323]]
Single cycle:
Dists=np.zeros ((3,4)) for I in Range (3): dists[i] = np.sqrt (Np.sum (Np.square (a[i)-B), Axis=1)) print (dists)
[[5.19615242 6.92820323 8.66025404 10.39230485]
[3.46410162 5.19615242 6.92820323 8.66025404]
[1.73205081 3.46410162 5.19615242 6.92820323]]
No loops:
R1= (Np.sum (Np.square (a), Axis=1) * (Np.ones ((b.shape[0],1))). Tr2=np.sum (Np.square (b), Axis=1) * (Np.ones ((a.shape[0],1))) R3=-2*np.dot (a,b.t) print (NP.SQRT (R1+R2+R3))
[[5.19615242 6.92820323 8.66025404 10.39230485]
[3.46410162 5.19615242 6.92820323 8.66025404]
[1.73205081 3.46410162 5.19615242 6.92820323]]
Principle of non-cyclic algorithm:
(Note that the schematic-validation code-the implementation of the variable is not strictly one by one corresponding, there are adjustments)
The full code is implemented as follows:
Import NumPy as Npclass Knearsneighbor (): Def _init_ (self): Pass Def train (self, x, y): self. X_train = x self.y_train = y # Select the way to calculate the distance def predict (self, x, k=1, num_loops=0) using several loop bodies: if Num_loop s = = 0:dist = Self.compute_distances_no_loops (X) elif Num_loops = = 1:dist = Self.compute_di Stances_one_loops (x) elif Num_loops = = 2:dist = Self.compute_distances_two_loops (x) Else: Raise ValueError (' Invalid value%d '% num_loops) return dist def compute_distances_two_loops (self, X): Num_test = x.shape[0] Num_train = self. X_train.shape[0] dists = Np.zeros ((num_test, Num_train)) for I in Range (Num_test): for J in range (Num_train): dists[i][j] = np.sqrt (Np.sum (Np.square (x[i)-Self. X_TRAIN[J])) return dists def compute_distances_one_loops (self, X): Num_test = x.shape[0] Num_tra in = self. X_TRAIN.SHAPE[0] Dists = Np.zeros ((num_test,num_train)) for I in Range (num_test): dists[i] = np.sqrt (Np.sum (NP . Square (X[i]-Self. X_train), Axis=1) return dists def compute_distances_no_loops (self, X): # num_test = x.shape[0] # Num_train = self. X_TRAIN.SHAPE[0] # dists = Np.zeros ((num_test,num_train)) dists = Np.sqrt ( -2*np.dot (X, self. X_train. T) + np.sum (Np.square (self). X_train), Axis=1) * (Np.ones ((x.shape[0],1))) + Np.sum (Np.square (X), Axis=1) * (Np.ones (x_train). shape[0],1)). T) return dists # Predictive label def predict_labels (self, dists, k=1): Num_test = dists.shape[0] y_pred = Np.zeros (num_test) for I in Range (num_test): closest_y = Self.y_train[np.argsort (Dists[i]) [: K]] # "" "Follow Sort index by Distance "fetch nearest K index" follow index for training label "y_pred[i" = Np.argmax (Np.bincount (closest_y)) # Poll, note Np.bincount () and Np.argmax () The magical return y_pred on the ballot
Cross-validation selects the value of the super-parameter K
We have implemented(implemented) the k-nearest Neighbor classifier(category) but we set the value k = 5 arbitrarily(arbitrarily). We'll now determine the best value of this hyperparameter with cross-validation(cross-validation).
Import NumPy as Npnum_folds = 5k_choices = [1, 3, 5, 8, ten, $,,, 100]x_train_folds = []y_train_folds = []######] ########################################################################### TODO: # # Split up the training data into folds. After splitting, X_train_folds and # # Y_train_folds should each being lists of length num_folds, where # # Y _train_folds[i] is the label vector for the points in X_train_folds[i]. # # Hint:look up the NumPy array_split function. ################################################################################ #X_train_folds = Np.split (X_ Train, num_folds) Y_train_folds = Np.split (Y_train, Num_folds) ##################################################### ############################ END of YOUR CODE ################ ################################################################## A Dictionary holding the accuracies for different values of K so we find# when running cross-validation. After running cross-validation,# K_to_accuracies[k] should is a list of length num_folds giving the different# accuracy VA Lues that we found if using that value of K.k_to_accuracies = {}####################################################### ########################## TODO: # Perform K-fold Cross validation to find the best value of K. For each # # possible value of K, run the K-nearest-neighbor algorithm num_folds times, #. Use all but one of the folds as training data and the # # last fold as a validation set. Store the accuracies for all fold and all # # values of K in the K_to_accuracies dictionary. ################################################################################ #for K in K_choices:k_to_accuraci Es[k]=np.zeros (Num_folds) for I in Range (num_folds): Xtr = Np.concatenate ((Np.array (X_train_folds) [: I],np.array (X_train_fol DS) [(i+1):]), axis=0) Ytr = Np.concatenate ((Np.array (Y_train_folds) [: I],np.array (Y_train_folds) [(i+1):]) , axis=0) Xte = Np.array (X_train_folds) [i] yte = Np.array (Y_train_folds) [i] # [Num_of_folds, N Um_in_flods, feature_of_x], [Num_of_pictures, feature_of_x] Xtr = Np.reshape (Xtr, (x_train.shape[0] * 4/5,- 1)) Ytr = Np.reshape (Ytr, (y_train.shape[0] * 4/5,-1)) Xte = Np.reshape (Xte, (X_train.shape[0]/5,-1)) Yte = Np.reshape (Yte, (Y_train.shape[0]/5,-1)) Classifier.train (XTR, ytr) yte_pred = Class Ifier.predict (Xte, k) yte_pred = Np.reshape (yte_pred, (Yte_pred.shape[0],-1)) accuracy = np.sum (yte_pred = = Yte, Dtype=float)/len (yte) # bool to int, we need to display specified as float k_to_accuracies[k][i] = accuracy######################## ######################################################### END of YOUR CODE ########### ####################################################################### Print out the computed accuraciesfor K in Sorted (k_to_accuracies): For accuracy in k_to_accuracies[k]: print ' k =%d, accuracy =%f '% (k, accuracy)
"cs231n" Job 1 question 1 Selection _ code understanding k Nearest Neighbor Algorithm & cross-validation Select parameter parameters