defstart (): Group,labels=CreateDataSet ()returnClassify0 ([3,3], group, labels, 4)defCreateDataSet (): Group= Array ([[[1,2],[2,3],[1,1],[4,5]])#freely defined here, representing a well-known sorted datasetLabels = ['A','A','B','B'] returnGroup, Labelsdefclassify0 (inx,dataset,labels,k):"""InX is the input test sample, is a [x, Y] style dataset is the training sample set labels is the training sample label K is the most similar to the top K""" #the shape of the matrix is a tuple, and if Dataset.shape is called directly, it is returned (4,2), i.e. #returns the matrix (number of rows, columns), #so shape[0] Gets the number of rows in the dataset, #the number of rows is the quantity of the sample #Shape[1] Returns the number of columns in the datasetDatasetsize =Dataset.shape[0]################## #说明代码 ######################## Print("Dataset.shape[0] Returns the number of rows in the matrix:") Print(datasetsize) cols= Dataset.shape[1] Print("Dataset.shape[1] Returns the number of columns of a matrix:") Print(cols)Print(Dataset.shape)Print("dataset.shape Type:") Print(Type (dataset.shape))################################################### #here mat is the abbreviation of Maxtrix, Diffmat, i.e. the difference of the matrix, the result is also the matrix #For a description of the tile function, see http://www.cnblogs.com/Sabre/p/7976702.html #Simply put Inx (this example is []) on the "line" of this dimension, copied the Datasetsize times (this example datasetsize==4), in the "column" dimension, copied 1 times #form a matrix such as [[1,1],[1,1],[1,1],[1,1]] to operate with a dataset #This is done because the Euclidean distance formula is used to find the distance between the input point and the existing point #This is the 1th step, to give the difference between the point [4] and the known point, the output is a matrixDiffmat = Tile (InX, (datasetsize,1))-DataSet################## #说明代码 ######################## Print("Diffmat:"+str (diffmat))################################################### #squared The matrix, that is, the squared of the differenceSqdiffmat = Diffmat * * 2################## #说明代码 ######################## Print("Sqdiffmat:"+str (sqdiffmat))################################################### #sum (Axis=1) is the addition of the values in each row of the matrix, such as [[0 0] [1 1] [0 1] [9 9]] will get [0,2,1,18], get the sum of squares #sum (axis=0) is the addition of the values in each column in the matrixSqdistances = Sqdiffmat.sum (Axis=1) ################## #说明代码 ######################## Print("sqdistances:"+str (sqdistances))################################################### #take the square and take the root, get the distance, the output arraydistances = sqdistances * * 0.5################## #说明代码 ######################## Print("the distance from the unknown point to each known point:", distances)################################################### #Argsort (), Place the index of the elements in the array from small to large in the order of the small-to-large position #after an array [0 2 1 18],argsort, get [0 2 1 3], the smallest is at the front, the position is 0, the second is the element with index 2, or 1 #The third Small is the index of 1, that is 2, the fourth is the index of 3, that is #This ensures that the position of the original array element is unchanged so that the label can be matchedSorteddistindicies =Distances.argsort ()################## #说明代码 ######################## Print("Index Location:", Sorteddistindicies)################################################### #Create an empty dictionaryClassCount = {} #the K value is a comparison of the first K samples forIinchRange (k):#returns the value indexed as sorteddistindicies[i] in distances #in this example, the following are: #sorteddistindicies[0]==0, then labels[0]== ' a ', voteilabel== ' a ' #sorteddistindicies[1]==2, then labels[2]== ' B ', voteilabel== ' B ' #sorteddistindicies[2]==1, then labels[0]== ' a ', voteilabel== ' a ' #sorteddistindicies[3]==18, then labels[0]== ' B ', voteilabel== ' B 'Voteilabel =Labels[sorteddistindicies[i]]################## #说明代码 ######################## Print("label"+ STR (i) +":"+Voteilabel)################################################### #dict.get (Key, Default=none), returns its corresponding value for key keys, or returns default if the Dict does not contain a key (note that default is None) #The first time you call Classcount.get, there is no value in ClassCountClasscount[voteilabel] = classcount.get (Voteilabel, 0) + 1################## #说明代码 ######################## Print("Section"+str (i+1) +"second visit, classcount["+ Voteilabel +"] value is:"+str (Classcount[voteilabel]))Print("the contents of the ClassCount are:") Print(ClassCount)################################################### #sorted (Iterable[,cmp,[,key[,reverse=true]]) #function: Return A new sorted list from the items in iterable. #The first parameter is a iterable, and the return value is a list that sorts the elements in iterable. #The optional parameters are three, CMP, key, and reverse. #1) CMP Specifies a custom comparison function that receives two parameters (elements of iterable), returns a negative number if the first argument is less than the second argument, returns 0 if the first argument is equal to the second argument, or returns a positive number if the first argument is greater than the second argument. The default value is None. #2) key specifies a function that receives a parameter, which is used to extract a keyword from each element for comparison. The default value is None. #3) Reverse is a Boolean value. If set to True, list elements are sorted in reverse order. #Operator.itemgetter (1) This is difficult to explain, with the following examples to understand #a=[11,22,33] #B = Operator.itemgetter (2) #B (a) #Output: #B = Operator.itemgetter (2,0,1) #B (a) #output: (33,11,22) #the Operator.itemgetter function returns not a value, but a function that acts on the object to get the value #more complex, not much explanation hereSortedclasscount = sorted (Classcount.items (), Key=operator.itemgetter (1), reverse=True)Print(Sortedclasscount)#returns the smallest value after a positive order, that is, the value of "K min neighbor" determines the category of the test sample Print("final result, test sample category:", end="") Print(sortedclasscount[0][0])returnSORTEDCLASSCOUNT[0][0]
if __name__= ="__main__": start ()
Output Result:
Dataset.shape[0] Returns the number of rows in the matrix: 4 Dataset.shape[1] Returns the number of columns of a matrix: 2 (4, 2) dataset.shape Type: <class ' tuple ' > diffmat:[[2 1] [1 0] [2 2] [ -1-2]] sqdiffmat:[[4 1] [1 0] [4 4] [1 4]] sqdistances:[5 1 8 5] distance from unknown point to each known point: [2.23606798 1.2.82842712 2.23606798] index Position: [1 0 3 2] label 0:a 1th visit, Classcount[a] value is: 1 the contents of the ClassCount are: {' A ': 1} Label 1: A 2nd visit, Classcount[a] value is: 2 the contents of the ClassCount are: {' A ': 2} label 2:b 3rd visit, Classcount[b] value is: 1 the contents of the ClassCount are: {' A ': 2, ' B ': 1} label 3:b 4th visit, Classcount[b] value is: 2 the contents of the ClassCount are: {' A ': 2, ' B ': 2} [(' A ', 2), (' B ', 2)] final result, test sample category: A [finished in 5.3s] |
The list of programs in machine learning combat 2-1 K nearest Neighbor algorithm what did Classify0 do?