1. Background
In the future, the blogger will update the machine learning algorithm and its Python simple implementation regularly every week. Today's algorithm is the KNN nearest neighbor algorithm. KNN algorithm is a kind of supervised learning classifier class algorithm.
What is supervised learning and what is unsupervised learning? Supervised learning is the algorithm used when we know the target vector, unsupervised learning is used when we do not know the specific target variable. And the supervised learning is divided into classifier algorithm and regression algorithm according to the category of target variable (discrete or continuous).
K-nearest neighbor. K is a constraint variable in the algorithm, the overall idea of the whole algorithm is relatively simple, that is, the characteristic value of the dataset as a vector. We give the program a set of eigenvalues, assuming that there are three sets of eigenvalues, it can be considered (X1,X2,X3). The original eigenvalues of the system can be regarded as a group of (y1,y2,y3) vectors. By finding the distance between the two vectors, we find the eigenvalues pairs of the y with the shortest distance from the first K. The target variable of these y values is the classification of this x eigenvalue.
Formula:
The NumPy of 2.python Foundation
NumPy is a mathematical computing library for Python, mainly for some matrix operations, where we use it a lot. Describe some of the features that are used in this chapter's code.
Arry: is an array representation of the numpy, such as the 4 rows 2 column numbers in this example can be entered
Group=array ([[[9,400],[200,5],[100,77],[40,300]])
Shape: Showing (rows, columns) Example: Shape (Group) = (4,2)
Zeros: Lists an empty matrix in the same format, for example: Zeros (group) = ([[[0,0],[0,0],[0,0],[0,0]])
The tile function is in the Python module numpy.lib.shape_base, and his function is to repeat an array. For example Tile (a,n), the function is to repeat the array a n times, forming a new array
SUM (Axis=1) matrix with each line vector added
3. Data sets
4. Code
The code is divided into three functions, respectively
More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/
To create a dataset:
CreateDataSet
From __future__ Import division from
numpy Import *
import operator
def createdataset ():
Group=array ([ [9,400],[200,5],[100,77],[40,300]]
labels=[' 1 ', ' 2 ', ' 3 ', ' 1 '] return
group,labels
Normalization of data:
Autonorm
def autonorm (dataSet):
minvals = dataset.min (0)
maxvals = Dataset.max (0)
ranges = Maxvals-minvals
Normdataset = Zeros (Shape (dataSet))
m = dataset.shape[0]
normdataset = Dataset-tile (Minvals, (m,1))
# Print Normdataset
normdataset = normdataset/tile (ranges, (m,1)) #element wise divide
# print Normdataset
Return normdataset, ranges, minvals