The K-Nearest neighbor algorithm for machine learning in Python language

Source: Internet
Author: User
write in front

Well 、、、 have recently started learning machine learning, online to find a book on machine learning, called "machine learning combat." Coincidentally, the algorithm in this book is implemented in Python language, just before I learned some basic knowledge of Python, so this book for me, is undoubtedly the timely. Next, I will tell you about the actual things.

What is the K-nearest neighbor algorithm?

Simply put, the K-nearest neighbor algorithm uses the distance method to measure the different eigenvalues to classify. It works: There is a collection of sample data, also known as the training sample set, and every data in the sample set has a label, that is, we know the corresponding relationship between each data in the sample set and the owning classification, and after entering new data without tags, we compare each feature of the new data with the characteristic of the data in the sample set. Then the algorithm extracts the classification label of the most similar data in the sample set. In general, we only select the first k most similar data in the sample data set, which is the origin of the K-nearest neighbor algorithm name.

Question: Pro, do you build K-Nearest neighbor algorithm is supervised learning or unsupervised learning?

Importing data using Python

From the working principle of K-Nearest neighbor algorithm, we can see that in order to implement this algorithm to classify data, we need sample data on hand, no sample data how to set up the classification function. So, our first step is to import the sample data collection.

Create a module named knn.py and write the code:

From numpy import * import operator  def createdataset ():   group = Array ([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])   labels = [' A ', ' a ', ' B ', ' B ']   return group, labels

Code, we need to import two modules of Python: Scientific Computing package NumPy and operator modules. The NumPy function library is a standalone module in the Python development environment, and most Python versions do not have a default installation of the NumPy library, so here we need to install the module separately.

Download Address: http://sourceforge.net/projects/numpy/files/

There are a lot of versions, and here I choose the Numpy-1.7.0-win32-superpack-python2.7.exe.

Implementation of K-nearest neighbor algorithm

The specific idea of the K-nearest neighbor algorithm is as follows:

(1) Calculate the distance between the points in a well-known category dataset and the current point

(2) Sorting in ascending order of distance

(3) Select K points with a minimum distance from the current point

(4) Determine the frequency of the category in which the first K points are present

(5) Returns the category with the highest frequency in the first K points as the current point of the forecast classification

The code for the Python language implementation of the K-nearest neighbor algorithm is as follows:

# Coding:utf-8 from numpy import * import operator  import KNN Group, labels = Knn.createdataset () def classify (InX, DataSet, labels, k):   datasetsize = dataset.shape[0]    Diffmat = Tile (InX, (datasetsize,1))-DataSet   Sqdiffmat = diffmat**2   sqdistances = sqdiffmat.sum (Axis=1)   distances = sqdistances**0.5   sorteddistances = Distances.argsort ()   ClassCount = {} for   I in range (k):     Numoflabel = labels[sorteddistances[i]]     Classcount[numoflabel] = Classcount.get (numoflabel,0) + 1   sortedclasscount = sorted (Classcount.iteritems (), key= Operator.itemgetter (1), reverse=true)   return sortedclasscount[0][0] my = classify ([0,0], group, labels, 3) print my

The results of the operation are as follows:

The output is B: It shows that our new data ([0,0]) belongs to Class B.

Code explanation

I believe a lot of friends have a lot to do with this code, and then I'll focus on some of the key points of this function to make it easier for readers and myself to review the algorithm code.

Parameters of the classify function:

InX: Input vectors for classification
DataSet: Training Sample Collection
Labels: tag vector
K in the k:k-nearest neighbor algorithm
Shape: Is the property of the array that describes the dimension of a multidimensional array

Tile (InX, (datasetsize,1)): InX Two-dimensional array, datasetsize represents the number of rows after the array is generated, and 1 represents a multiple of the column. The entire line of code represents the previous two-dimensional array matrix of each element minus the corresponding element value of the next array, so that the subtraction between the matrix, simple and convenient to not let you admire it!

Axis=1: When the parameter equals 1, the sum of the numbers between rows in the matrix is equal to 0, which represents the sum of the numbers between columns.

Argsort (): Sort an array in a non-descending order

Classcount.get (numoflabel,0) + 1: This line of code has to be said to be beautiful indeed. Get (): This method is a method of accessing a dictionary item, that is, accessing an entry with the subscript key Numoflabel, and if not, the initial value is 0. Then add 1 to the value of this item. So it's simple and efficient to do this in Python with just one line of code.

Something

K-Nearest Neighbor Algorithm (KNN) principle and code implementation almost so, the next task is to become more familiar with it, to reach the point of bare knocking.

Above mentioned above is the whole content of this article, hope everybody can like.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.