Python3.2 implement data classification based on KNN algorithm

Source: Internet
Author: User

1 Preface

I have been reading machine learning practices over the past few days. The primary reason for buying this book is that it is implemented using Python. During this time, I have become more and more fond of Python. After reading it, it was really good. The book's interpretation and implementation of some classic machine learning algorithms are all very popular. Today, I understood the KNN algorithm and implemented it in Python. The code is mainly based on the example in the book. After reading it, I added a comment.

2. Basic principles of KNN algorithm

KNN is a type of supervised learning that requires you to prepare a dataset (sample data) with known classification results in advance. Its basic principle is simple. For a dataset to be classified, compare its feature values with the feature values corresponding to the sample data, then, the classification result tags corresponding to the k data with the most similar features in the sample set and the data to be classified are extracted, finally, find the tag with the most appearance as the final classification result of the data to be classified.

3. A simple problem to be solved

Existing data:

Group = array ([[1.0, 1.1], [1.0, 1.0], [0.1], [0,])
Labels = ['A', 'A', 'B', 'B']

Two groups of data to be classified:

[1.0, 0.8]

[0.5, 0.5]

Find the category of the two groups of data to be classified

4. Code

From numpy import * import operator # existing data, and corresponding label group = array ([1.0, 1.1], [1.0, 1.0], [], [0, 0], [0, 0.1]) labels = ['A', 'A', 'B', 'B: calculate the dataSet to be classified and the existing dataSet with its tags to obtain the most likely category parameter of the dataSet to be classified: dataSet: Existing dataSet, using createDataSet () use the createDataSet () function to obtain the labels of an existing dataSet. Use the createDataSet () function to obtain k: set the minimum distance '''def classify0 (distance, dataSet, labels, k ): dataSetSize = dataSet. shape [0] # obtain the number of rows in the dataset # Calculation distance # tile (a, (B, c): repeat the content of a B on the row, repeat the column c times # the result of the following code line is to extend the dataset to be classified to the same scale as the existing one, and then make the difference diffMat = tile (partition, (dataSetSize, (1)-dataSet sqDiffMat = diffMat ** 2 # Calculate the squared sqDistances = sqDiffMat for the difference. sum (axis = 1) # sum distances = sqDistances for each row ** 0.5 # Open sortedDistIndicies = distances for the preceding result. argsort () # create an index for the results of the opening party # Calculate the Lable classCount ={}# to create an empty dictionary and a category dictionary for the minimum k points, save the number of classes for I in range (k ): # search for k Nearest Neighbor voteIlabel = labels [sortedDistIndicies [I] in a loop # first find out the classCount [voteIlabel] = classCount of the Label value corresponding to the I value in the index table of the open results. get (voteIlabel, 0) + 1 # store the current label and the corresponding category value sortedClassCount = sorted (classCount. items (), key = operator. itemgetter (1), reverse = True) # Sort the category dictionary in reverse order, move forward with a large number of levels # return result return sortedClassCount [0] [0] # return the first value in the Level dictionary, that is, the most likely Label value # print (classify0 ([1.0, 0.8], group, labels, 3) print (classify0 ([0.5, 0.5], group, labels, 3 ))


5. Execution result


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.