[Machine learning algorithm-python implementation] Implementation of KNN-k Nearest Neighbor Algorithm (with source code)

Last Update:2014-05-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

(Reprinted please indicate the source: http://blog.csdn.net/buptgshengod)

1. background in the future, bloggers will regularly update machine learning algorithms and Their python implementations on a weekly basis. The algorithm we learned today is the KNN nearest neighbor algorithm. KNN is an algorithm for supervised learning classifier classification. What is supervised learning and what is unsupervised learning. Supervised Learning is the algorithm we use when we know the target vector. unsupervised learning is used when we do not know the specific target variable. Supervised Learning is divided into Classifier algorithms and Regression Algorithms Based on the type (discrete or continuous) of the target variables. K-Nearest Neighbor. K is a constraint variable in the algorithm. The general idea of the entire algorithm is relatively simple, that is, to regard the feature values of a dataset as vectors. We give the program a set of feature values. If there are three feature values, we can think of them as (x1, x2, x3 ). The original feature values of the system can be seen as a group of (y1, y2, y3) vectors. By finding the distance between two vectors, we can find the first k feature value pairs with the shortest distance of y. The target variable corresponding to these y values is the classification of the x feature value. Formula:

2. python-based numpy is a mathematical computing library of python. It is mainly used for some matrix operations. We will use it a lot here. This section describes some functions used in the code.
Arry: the array representation provided by numpy. For example, four rows and two columns of numbers in this example can be entered as follows:

Group = array ([[9,400], [40,300], [], [])

Shape: Show (row, column) Example: shape (group) =)

Zeros: list an empty matrix in the same format, for example, zeros (group) = ([[0, 0], [0, 0], [0, 0])

The tile function is located in the python module numpy. lib. shape_base. Its function is to repeat an array. For example, tile (A, n) is used to repeat array A n times to form A new array.

Sum (axis = 1) matrix adds each vector row

3. Dataset

4. The code is divided into three functions:
Create a dataset:

CreateDataset

from __future__ import divisionfrom numpy import *import operatordef createDataset():        group=array([[9,400],[200,5],[100,77],[40,300]])                labels=['1','2','3','1']        return group,labels

Data normalization:

AutoNorm

def autoNorm(dataSet):    minVals = dataSet.min(0)    maxVals = dataSet.max(0)    ranges = maxVals - minVals    normDataSet = zeros(shape(dataSet))      m = dataSet.shape[0]    normDataSet = dataSet - tile(minVals, (m,1))    #print normDataSet    normDataSet = normDataSet/tile(ranges, (m,1)) #element wise divide   # print normDataSet    return normDataSet, ranges, minVals

Classification function:

Classify

def classify(inX, dataSet, labels, k):    dataSetSize = dataSet.shape[0]    diffMat = tile(inX, (dataSetSize,1)) - dataSet    sqDiffMat = diffMat**2    sqDistances = sqDiffMat.sum(axis=1)    distances = sqDistances**0.5    sortedDistIndicies = distances.argsort()         classCount={}              for i in range(k):        voteIlabel = labels[sortedDistIndicies[i]]                classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)    return sortedClassCount[0][0]

5. Display Results

6. Download Code

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Machine learning algorithm-python implementation] Implementation of KNN-k Nearest Neighbor Algorithm (with source code)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Machine learning algorithm-python implementation] Implementation of KNN-k Nearest Neighbor Algorithm (with source code)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support