The KNN implementation of Python combat

Source: Internet
Author: User

The KNN implementation of Python combat

Using Python to implement the K-Nearest neighbor classification algorithm (KNN) has been a commonplace problem, the internet has a lot of information, but here I decided to record their learning experience.

1. Configure NumPy Library

The NumPy Library is a third-party library used by Python for matrix operations, and most mathematical operations rely on this library for configuration of the NumPy library see: Python Configuration third-party libraries NumPy and Matplotlib's tortuous path, After the configuration is complete, import the NumPy library into the current project as a whole.

2. Prepare training samples

Here is a simple construction of four points with corresponding tags as a training sample of KNN:

# ==================== Create training Sample ====================def CreateDataSet ():    group = Array ([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]])    Labels = [' A ', ' B ', ' C ', ' D ']    return group, labels

Here is a small detail, that is, through the array () function to construct and initialize the NumPy matrix object, to ensure that only one parameter, so in the code need to enclose the parameters in brackets, such as the following method is not legal call:

Group = Array ([1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1])

3. Create a classification function

K Nearest neighbor algorithm is generally classified according to Euclidean distance, so we need to sum the input data and training data in each dimension, and then prescribe the following:

# ==================== Euclidean distance classification ====================def classify (INX, Dataset, labels, k):    datasetsize = Dataset.shape[0]  # Number of rows to get data, shape[1] number of    Diffmat = Tile (Inx, (datasetsize, 1))-DataSet    Sqdiffmat = diffmat* * *    sqdistances = sqdiffmat.sum (axis=1)    Distance = sqdistances**0.5    sorteddistanceindicies = Distance.argsort ()    ClassCount = {}

Here the tile () function is the matrix extension function of NumPy, for example, the training sample has four two-dimensional coordinate points, for the input sample (a two-dimensional coordinate point), you need to expand the its first to a 4 rows 1 columns of the matrix, and then in the matrix subtraction, in the flat method summation, and then open the square calculation distance. After the distance is calculated, the sort member function of the call matrix Object Argsort () sorts the distances in ascending order. Here is a pycharm view of the source of life tips: Join in writing this program we are not sure whether Argsort () is a member function of the array object, we select this function and then right-click Go to, Declaration, This jumps to the declaration code slice of the Argsort () function, and by looking at the dependencies of the code to confirm that the array class does contain this member function, the call has no problem:

After the distance is sorted, the next step is to determine which class the current sample belongs to, based on the label of the first k minimum distance value:

    For I in range (k):        Voteilabel = labels[sorteddistanceindicies[i]]        Classcount[voteilabel] = Classcount.get ( Voteilabel, 0) + 1    sortedclasscount = sorted (Classcount.items (), key = Operator.itemgetter (1), reverse = True)

One small problem here is that getting a dictionary element in Python2 is using the Dict.iteritems () member function instead of the Dict.items () function in Python3. "Key = Operator.itemgetter (1)" means that the specified function is ordered for the second-dimension element in the dictionary, note that it is necessary to import the symbol library operator before. This is to decide the attribution of the test sample by recording the number of occurrences of each category of labels in the top K distance values.

4. Testing

The full KNN test code is given here:

# coding:utf-8from numpy Import *import operator# ==================== Create training Samples ==================== Def createdataset (): group = Array ([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]]) labels = [' A ', ' B ', ' C ', ' D '] Retu RN Group, labels# ==================== Euclidean distance classification ====================def classify (INX, Dataset, labels, k): Datasetsize = Da TASET.SHAPE[0] # Number of rows to get data, shape[1] number of Diffmat = Tile (Inx, (datasetsize, 1))-DataSet Sqdiffmat = diffmat**2 Sqdi stances = Sqdiffmat.sum (axis=1) Distance = sqdistances**0.5 sorteddistanceindicies = Distance.argsort () ClassCoun t = {} for I in range (k): Voteilabel = Labels[sorteddistanceindicies[i]] Classcount[voteilabel] = Classco    Unt.get (Voteilabel, 0) + 1 Sortedclasscount = sorted (Classcount.items (), key = Operator.itemgetter (1), reverse = True) Return sortedclasscount[0][0]groups, Labels = createdataset () result = classify ([0, 0], Groups, Labels, 1) print (Result) /pre>

Run the code and the program promises the result "C". One thing to mention here is that for the classification of a single training sample (only one training sample per class), the KNN K value should be set to 1.

QQ group 290551701 gathers a lot of Internet elite, technical director, architect, project Manager! Open source technology research, Welcome to the industry, Daniel and beginners are interested in engaging in IT industry personnel to enter!

The KNN implementation of Python combat

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.