--python implementation of KNN algorithm

Source: Internet
Author: User

Proximity algorithm

Or, K nearest neighbor (Knn,k-nearestneighbor) classification algorithm is one of the simplest methods in data mining classification. The so-called K nearest neighbor is the meaning of K's closest neighbour, saying that each sample can be represented by its nearest K-neighbor.

About k nearest neighbor algorithm, a very good article: KNN algorithm understanding

Industry applications: Customer churn prediction, fraud detection, etc. (more suitable for classification of rare events)

Written in front of: Python2.7

Data iris:http://pan.baidu.com/s/1bhuq0a test Data set: Iris's 1th row of data; Training data: Iris 2 to 150 rows of data

1 #Coding:utf-82 ImportPandas as PD3 ImportNumPy as NP4 5 classKnna (object):6 7     #Get Training Data set8     defGettraindata (self):9DataSet = Pd.read_csv ('C:\pythonwork\practice_data\iris.csv', header=None)TenDATASETNP = Np.array (dataset[1:150]) OneTraindata = Datasetnp[:,0:datasetnp.shape[1]-1]#Get Training Data ALabels = datasetnp[:,datasetnp.shape[1]-1]#Get Training Data Categories -         returnTraindata,labels -     #categories to get test data the     defclassify (self, testData, traindata, labels, k): -         #calculates the Euclidean distance between the test data and the training data -Dist = [] -          forIinchRange (len (traindata)): +TD = Traindata[i,:]#Training Data -Dist.append (Np.linalg.norm (TESTDATA-TD))#European distance +Dist_collection = Np.array (Dist)#get all Euclidean distances and convert to array type ADist_index = Dist_collection.argsort () [0:k]#in ascending order, get the top K subscript atK_labels = Labels[dist_index]#get the category of the corresponding subscript -  -         #calculates the number of categories in K data -K_labels = List (k_labels)#Convert to List type -Labels_count = {} -          forIinchK_labels: inLabels_count[i] = K_labels.count (i)#count the occurrences of each category -Testdata_label = Max (Labels_count, Key=labels_count.get)#the most frequently occurring categories to         returnTestdata_label +  -  the if __name__=='__main__': *KN =Knna () $Traindata,labels = Kn.gettraindata ()#get the training data set, IRIS 149 data from line 2nd to line 150thPanax NotoginsengTestData = Np.array ([5.1, 3.5, 1.4, 0.2])#take the 1th line of data in Iris -K = 10#number of nearest neighbor data theTestdata_label = kn.classify (testdata,traindata,labels,k)#get the classification category of test data +     Print 'categories of test data:', Testdata_label

--python implementation of KNN algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.