Implementation of knn-k nearest neighbor algorithm for the Python implementation of machine learning algorithm

Source: Internet
Author: User
Tags ranges

1. Background

In the future, the blogger will update the machine learning algorithm and its Python simple implementation regularly every week. Today's algorithm is the KNN nearest neighbor algorithm. KNN algorithm is a kind of supervised learning classifier class algorithm.

What is supervised learning and what is unsupervised learning? Supervised learning is the algorithm used when we know the target vector, unsupervised learning is used when we do not know the specific target variable. And the supervised learning is divided into classifier algorithm and regression algorithm according to the category of target variable (discrete or continuous).

K-nearest neighbor. K is a constraint variable in the algorithm, the overall idea of the whole algorithm is relatively simple, that is, the characteristic value of the dataset as a vector. We give the program a set of eigenvalues, assuming that there are three sets of eigenvalues, it can be considered (X1,X2,X3). The original eigenvalues of the system can be regarded as a group of (y1,y2,y3) vectors. By finding the distance between the two vectors, we find the eigenvalues pairs of the y with the shortest distance from the first K. The target variable of these y values is the classification of this x eigenvalue.

Formula:

The NumPy of 2.python Foundation

NumPy is a mathematical computing library for Python, mainly for some matrix operations, where we use it a lot. Describe some of the features that are used in this chapter's code.

Arry: is an array representation of the numpy, such as the 4 rows 2 column numbers in this example can be entered

Group=array ([[[9,400],[200,5],[100,77],[40,300]])

Shape: Showing (rows, columns) Example: Shape (Group) = (4,2)

Zeros: Lists an empty matrix in the same format, for example: Zeros (group) = ([[[0,0],[0,0],[0,0],[0,0]])

The tile function is in the Python module numpy.lib.shape_base, and his function is to repeat an array. For example Tile (a,n), the function is to repeat the array a n times, forming a new array

SUM (Axis=1) matrix with each line vector added

3. Data sets

4. Code

The code is divided into three functions, respectively

More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/

To create a dataset:

CreateDataSet

From __future__ Import division from  
numpy Import *
import operator  
      
      
      
def createdataset ():  
        Group=array ([ [9,400],[200,5],[100,77],[40,300]]  
              
        labels=[' 1 ', ' 2 ', ' 3 ', ' 1 '] return  
        group,labels

Normalization of data:

Autonorm

def autonorm (dataSet):  
    minvals = dataset.min (0)  
    maxvals = Dataset.max (0)  
    ranges = Maxvals-minvals  
    Normdataset = Zeros (Shape (dataSet))  
        
    m = dataset.shape[0]  
    normdataset = Dataset-tile (Minvals, (m,1))  
    # Print Normdataset  
    normdataset = normdataset/tile (ranges, (m,1)) #element wise divide  
   # print Normdataset  
    Return normdataset, ranges, minvals

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.