Implementation of KNN in Python practice

Source: Internet
Author: User

Implementation of KNN in Python practice

Using Python to implement K-Nearest Neighbor classification algorithm (KNN) is already a common topic, and there are already a lot of information on the Internet. However, I decided to record my learning experience here.

1. Configure the numpy Library

The numpy library is a third-party library used by Python for matrix operations. Most mathematical operations depend on this library. For the configuration of the numpy library, see the twists and turns of configuring the third-party library Numpy and matplotlib in Python, after the configuration is complete, import the numpy Library to the current project.

2. Prepare training samples

Here, we simply construct four points and use corresponding labels as KNN training samples:

#================================ Create a training sample ==== def createdataset (): group = array ([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]) labels = ['A ', 'B', 'C', 'D'] return group, labels

Here is a small detail, that is, when using the array () function to construct and initialize the numpy matrix object, there must be only one parameter. Therefore, you must enclose the parameter in brackets in the code, the following call method is invalid:

group = array([1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1])

3. Create a category Function

K-Nearest Neighbor algorithms are generally classified based on the Euclidean distance. Therefore, we need to subtract the input data and the training data from each dimension, then sum the square, and then open the square, as shown below:

#=============================== Euclidean distance classification ====================== ==== def classify (delimiter, dataset, labels, k): DataSetSize = Dataset. shape [0] # number of rows for obtaining data. shape [1] Columns diffmat = tile (partition, (DataSetSize, 1)-Dataset SqDiffMat = diffmat ** 2 SqDistances = SqDiffMat. sum (axis = 1) Distance = SqDistances ** 0.5 SortedDistanceIndicies = Distance. argsort () ClassCount = {}

Here, the tile () function is a numpy matrix Extension function. For example, in this example, the training sample has four two-dimensional coordinate points. For the input sample (one two-dimensional coordinate point ), first, we need to expand it into a matrix with four rows and one column. Then, we need to perform matrix subtraction, sum in the square method, and calculate the distance. After calculating the distance, call the sort member function argsort () of the matrix object to sort the distance in ascending order. Here is a tips for Pycharm to view the source code: When writing this program, we are not sure whether argsort () is a member function of the array object, right-click the function and choose Go to Declaration to jump to the Declaration code of the argsort () function, by viewing the Code's subordination, you can confirm that the array class does contain this member function. The call is correct:

After sorting the distance, you can determine the type of the current sample based on the labels corresponding to the first K minimum distance values:

    for i in range(k):        VoteiLabel = labels[SortedDistanceIndicies[i]]        ClassCount[VoteiLabel] = ClassCount.get(VoteiLabel, 0) + 1    SortedClassCount = sorted(ClassCount.items(), key = operator.itemgetter(1), reverse = True)

A small problem is that the member function of dict. iteritems () is used to retrieve dictionary elements in Python2, and the function of dict. items () is changed in Python3. "Key = operator. itemgetter (1)" means to specify the function to sort the second-dimensional elements in the dictionary. Note that you need to import the symbol library operator before. Here, we can determine the ownership of the test sample by recording the number of times each type of tag appears in the top K Distance.

4. Test

The complete KNN test code is provided here:

# Coding: utf-8from numpy import * import operator #============================ create a training sample ============ def createdataset (): group = array ([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]) labels = ['A ', 'B', 'C', 'D'] return group, labels #=============================== Euclidean distance classification ================== ===== def classify (delimiter, dataset, labels, k): DataSetSize = Dataset. shape [0] # number of rows for obtaining data. shape [1] Columns diffmat = tile (partition, (DataSetSize, 1)-Dataset SqDiffMat = diffmat ** 2 SqDistances = SqDiffMat. sum (axis = 1) Distance = SqDistances ** 0.5 SortedDistanceIndicies = Distance. argsort () ClassCount ={} for I in range (k): VoteiLabel = labels [SortedDistanceIndicies [I] ClassCount [VoteiLabel] = ClassCount. get (VoteiLabel, 0) + 1 SortedClassCount = sorted (ClassCount. items (), key = operator. itemgetter (1), reverse = True) return SortedClassCount [0] [0] Groups, Labels = createdataset () Result = classify ([0, 0], Groups, Labels, 1) print (Result)

Run the code and the program promises the result "C ". One thing to mention here is that for the classification problem of a single training sample (each type has only one training sample), the K value of KNN should be set to 1.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.