[Machine learning algorithm-python implementation] Implementation of KNN-k Nearest Neighbor Algorithm (with source code)

Source: Internet
Author: User

(Reprinted please indicate the source: http://blog.csdn.net/buptgshengod)

1. background in the future, bloggers will regularly update machine learning algorithms and Their python implementations on a weekly basis. The algorithm we learned today is the KNN nearest neighbor algorithm. KNN is an algorithm for supervised learning classifier classification. What is supervised learning and what is unsupervised learning. Supervised Learning is the algorithm we use when we know the target vector. unsupervised learning is used when we do not know the specific target variable. Supervised Learning is divided into Classifier algorithms and Regression Algorithms Based on the type (discrete or continuous) of the target variables. K-Nearest Neighbor. K is a constraint variable in the algorithm. The general idea of the entire algorithm is relatively simple, that is, to regard the feature values of a dataset as vectors. We give the program a set of feature values. If there are three feature values, we can think of them as (x1, x2, x3 ). The original feature values of the system can be seen as a group of (y1, y2, y3) vectors. By finding the distance between two vectors, we can find the first k feature value pairs with the shortest distance of y. The target variable corresponding to these y values is the classification of the x feature value. Formula:


2. python-based numpy is a mathematical computing library of python. It is mainly used for some matrix operations. We will use it a lot here. This section describes some functions used in the code.
Arry: the array representation provided by numpy. For example, four rows and two columns of numbers in this example can be entered as follows:

Group = array ([[9,400], [40,300], [], [])


Shape: Show (row, column) Example: shape (group) =)


Zeros: list an empty matrix in the same format, for example, zeros (group) = ([[0, 0], [0, 0], [0, 0])


The tile function is located in the python module numpy. lib. shape_base. Its function is to repeat an array. For example, tile (A, n) is used to repeat array A n times to form A new array.


Sum (axis = 1) matrix adds each vector row


3. Dataset
 
4. The code is divided into three functions:
Create a dataset:

CreateDataset

from __future__ import divisionfrom numpy import *import operatordef createDataset():        group=array([[9,400],[200,5],[100,77],[40,300]])                labels=['1','2','3','1']        return group,labels  

Data normalization:

AutoNorm

def autoNorm(dataSet):    minVals = dataSet.min(0)    maxVals = dataSet.max(0)    ranges = maxVals - minVals    normDataSet = zeros(shape(dataSet))      m = dataSet.shape[0]    normDataSet = dataSet - tile(minVals, (m,1))    #print normDataSet    normDataSet = normDataSet/tile(ranges, (m,1)) #element wise divide   # print normDataSet    return normDataSet, ranges, minVals

Classification function:

Classify

def classify(inX, dataSet, labels, k):    dataSetSize = dataSet.shape[0]    diffMat = tile(inX, (dataSetSize,1)) - dataSet    sqDiffMat = diffMat**2    sqDistances = sqDiffMat.sum(axis=1)    distances = sqDistances**0.5    sortedDistIndicies = distances.argsort()         classCount={}              for i in range(k):        voteIlabel = labels[sortedDistIndicies[i]]                classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)    return sortedClassCount[0][0]


5. Display Results


6. Download Code


 







Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.