Simple Description
This algorithm is mainly used to measure the distance between different feature values. With this distance, you can classify them.
KNN for short.
Known: the training set and the label of each training set.
Next, compare with the data in the training set to calculate the most similar k Distance. Select the category with the most similarity data. As the classification of new data.
Python instance
Copy codeThe Code is as follows:
#-*-Coding: cp936 -*-
# In Windows, cp936 encoding is used. It is better to use UTF-8 in linux.
From numpy import * # introduce the scientific computing package
Import operator # classic python function library. Operator module.
# Create a dataset
Def createDataSet ():
Group = array ([[1.0, 1.1], [1.0, 1.0], [0.1], [0,])
Labels = ['A', 'A', 'B', 'B']
Return group, labels
# Algorithm core
# Classification: input vector used for classification. It will be classified soon.
# DataSet: training sample set
# Labels: Label Vector
Def classfy0 (datasets, dataSet, labels, k ):
# Distance Calculation
DataSetSize = dataSet. shape [0] # obtain the number of rows in the array. I know that there are several training data sets.
DiffMat = tile (partition, (dataSetSize, 1)-dataSet # tile: Functions in numpy. Tile expands the original array into four identical arrays. DiffMat obtains the difference between the target and the training value.
SqDiffMat = diffMat ** 2 # each element is square
SqDistances = sqDiffMat. sum (axis = 1) # multiply the corresponding column to obtain the square of each distance.
Distances = sqDistances ** 0.5 # Start, get the distance.
SortedDistIndicies = distances. argsort () # sort in ascending order
# Select the nearest k points.
ClassCount = {}
For I in range (k ):
VoteIlabel = labels [sortedDistIndicies [I]
ClassCount [voteIlabel] = classCount. get (voteIlabel, 0) + 1
# Sorting
SortedClassCount = sorted (classCount. iteritems (), key = operator. itemgetter (1), reverse = True)
Return sortedClassCount [0] [0]
Unexpected gains
Add a self-written module to the default search path of python: Create a xxx. pth file under the python/lib/-packages directory and write it to the path of the Self-written module.