Python3.2 implement data classification based on KNN algorithm

Last Update:2014-05-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 Preface

I have been reading machine learning practices over the past few days. The primary reason for buying this book is that it is implemented using Python. During this time, I have become more and more fond of Python. After reading it, it was really good. The book's interpretation and implementation of some classic machine learning algorithms are all very popular. Today, I understood the KNN algorithm and implemented it in Python. The code is mainly based on the example in the book. After reading it, I added a comment.

2. Basic principles of KNN algorithm

KNN is a type of supervised learning that requires you to prepare a dataset (sample data) with known classification results in advance. Its basic principle is simple. For a dataset to be classified, compare its feature values with the feature values corresponding to the sample data, then, the classification result tags corresponding to the k data with the most similar features in the sample set and the data to be classified are extracted, finally, find the tag with the most appearance as the final classification result of the data to be classified.

3. A simple problem to be solved

Existing data:

Group = array ([[1.0, 1.1], [1.0, 1.0], [0.1], [0,])
Labels = ['A', 'A', 'B', 'B']

Two groups of data to be classified:

[1.0, 0.8]

[0.5, 0.5]

Find the category of the two groups of data to be classified

4. Code

From numpy import * import operator # existing data, and corresponding label group = array ([1.0, 1.1], [1.0, 1.0], [], [0, 0], [0, 0.1]) labels = ['A', 'A', 'B', 'B: calculate the dataSet to be classified and the existing dataSet with its tags to obtain the most likely category parameter of the dataSet to be classified: dataSet: Existing dataSet, using createDataSet () use the createDataSet () function to obtain the labels of an existing dataSet. Use the createDataSet () function to obtain k: set the minimum distance '''def classify0 (distance, dataSet, labels, k ): dataSetSize = dataSet. shape [0] # obtain the number of rows in the dataset # Calculation distance # tile (a, (B, c): repeat the content of a B on the row, repeat the column c times # the result of the following code line is to extend the dataset to be classified to the same scale as the existing one, and then make the difference diffMat = tile (partition, (dataSetSize, (1)-dataSet sqDiffMat = diffMat ** 2 # Calculate the squared sqDistances = sqDiffMat for the difference. sum (axis = 1) # sum distances = sqDistances for each row ** 0.5 # Open sortedDistIndicies = distances for the preceding result. argsort () # create an index for the results of the opening party # Calculate the Lable classCount ={}# to create an empty dictionary and a category dictionary for the minimum k points, save the number of classes for I in range (k ): # search for k Nearest Neighbor voteIlabel = labels [sortedDistIndicies [I] in a loop # first find out the classCount [voteIlabel] = classCount of the Label value corresponding to the I value in the index table of the open results. get (voteIlabel, 0) + 1 # store the current label and the corresponding category value sortedClassCount = sorted (classCount. items (), key = operator. itemgetter (1), reverse = True) # Sort the category dictionary in reverse order, move forward with a large number of levels # return result return sortedClassCount [0] [0] # return the first value in the Level dictionary, that is, the most likely Label value # print (classify0 ([1.0, 0.8], group, labels, 3) print (classify0 ([0.5, 0.5], group, labels, 3 ))

5. Execution result

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python3.2 implement data classification based on KNN algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python3.2 implement data classification based on KNN algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support