Proximity algorithm
Or, K nearest neighbor (Knn,k-nearestneighbor) classification algorithm is one of the simplest methods in data mining classification. The so-called K nearest neighbor is the meaning of K's closest neighbour, saying that each sample can be represented by its nearest K-neighbor.
About k nearest neighbor algorithm, a very good article: KNN algorithm understanding
Industry applications: Customer churn prediction, fraud detection, etc. (more suitable for classification of rare events)
Written in front of: Python2.7
Data iris:http://pan.baidu.com/s/1bhuq0a test Data set: Iris's 1th row of data; Training data: Iris 2 to 150 rows of data
1 #Coding:utf-82 ImportPandas as PD3 ImportNumPy as NP4 5 classKnna (object):6 7 #Get Training Data set8 defGettraindata (self):9DataSet = Pd.read_csv ('C:\pythonwork\practice_data\iris.csv', header=None)TenDATASETNP = Np.array (dataset[1:150]) OneTraindata = Datasetnp[:,0:datasetnp.shape[1]-1]#Get Training Data ALabels = datasetnp[:,datasetnp.shape[1]-1]#Get Training Data Categories - returnTraindata,labels - #categories to get test data the defclassify (self, testData, traindata, labels, k): - #calculates the Euclidean distance between the test data and the training data -Dist = [] - forIinchRange (len (traindata)): +TD = Traindata[i,:]#Training Data -Dist.append (Np.linalg.norm (TESTDATA-TD))#European distance +Dist_collection = Np.array (Dist)#get all Euclidean distances and convert to array type ADist_index = Dist_collection.argsort () [0:k]#in ascending order, get the top K subscript atK_labels = Labels[dist_index]#get the category of the corresponding subscript - - #calculates the number of categories in K data -K_labels = List (k_labels)#Convert to List type -Labels_count = {} - forIinchK_labels: inLabels_count[i] = K_labels.count (i)#count the occurrences of each category -Testdata_label = Max (Labels_count, Key=labels_count.get)#the most frequently occurring categories to returnTestdata_label + - the if __name__=='__main__': *KN =Knna () $Traindata,labels = Kn.gettraindata ()#get the training data set, IRIS 149 data from line 2nd to line 150thPanax NotoginsengTestData = Np.array ([5.1, 3.5, 1.4, 0.2])#take the 1th line of data in Iris -K = 10#number of nearest neighbor data theTestdata_label = kn.classify (testdata,traindata,labels,k)#get the classification category of test data + Print 'categories of test data:', Testdata_label
--python implementation of KNN algorithm