Python3 and machine Learning practice---1, the simplest K-proximity algorithm (k-nearest NEIGHBOR,KNN)

Last Update:2018-07-29 Source: Internet

Author: User

Tags ranges

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to K-Proximity algorithm:

K-Neighbor algorithm is to calculate the distance between the data to be classified and the sample data, get the first k (usually not more than 20) and the most similar data to be classified data, then classify the K data, and classify the data to the category with the most occurrences.

It is to be noted that

1, sometimes need to be based on the characteristics of the data in the classification of contribution size, weighted;

2, if the characteristics of the classification of the same contribution, and the difference between the characteristics of a large number of large numbers will affect the classification results, the characteristics of the data need to be normalized treatment. In the data processing, normalization is commonly used pretreatment means, the method of normalization is also more, reproduced here a blog about normalization, to the normalization of the reader can go. Re-discussion on normalization in machine learning (normalization method)

The linear normalized Python implementation is as follows:

'
normalized
'
def autonorm (dataSet):
    minvalues = dataset.min (0)
    maxvalues = Dataset.max (0)
    ranges = maxvalues-minvalues
    normdataset = zeros (Shape (dataSet))
    m = dataset.shape[0]
    Normdataset = Dataset-tile (Minvalues, (M, 1))
    Normdataset = Normdataset/tile (ranges, (M, 1)) return
    normdataset, ranges, MI Nvalues

Python3 implements the simplest K-proximity algorithm:

From numpy Import * import operator ' group: Sample Data Labels: sample data corresponding category label ' Def CreateDataSet (): group = array ([1.0, 1. 1], [1.0, 1.0], [0, 0], [0, 0.1]]) labels = [' A ', ' a ', ' B ', ' B '] return group, labels ' ' Define K-proximity algorithm ' def classif Y0 (InX, DataSet, labels, k): ":p AramInX: Data to be sorted:p AramDataSet: Sample Data:p AramLabels: Sample data labels:p AramK: Select the number of points with the smallest distance: return: Categories of data to be classified ' "' compute Euclidean distance ' #shape: function in NumPy, get array, tuple datasetsize = dataset.shape[0] # til Functions in E:numpy, constructing array diffmat = Tile (InX, (datasetsize, 1))-DataSet Sqdiffmat = Diffmat * * 2 sqdistances = Sqdiff Mat.sum (axis=1) distances = sqdistances * * 0.5 #argsort: Functions in NumPy, sorting and extracting indexes sorteddistindicies = DISTANCES.ARGSO RT () ClassCount = {} ' ' before K count category returns to category Data category ' For I in range (k): Voteilabel = Labels[s Orteddistindicies[i]] Classcount[voteilabel] = classcount.get (Voteilabel, 0) + 1 Sortedclasscount = sorted (cl Asscount.items (), Key=operator.itemgetter (1), reverse=true) #items ():p ython3 syntax, python2 need to use: Iteritems () return Sortedclasscount[0][0] if __name__ = = "__main__": Group, labels = CreateDataSet () print (Classify0 ([0, 0], group, Labels, 3))

You can read the sample data stored in the text by using the code:

"
take data from text
" '
def File2matrix (filename,num):
    '
    Note: The last behavior tag data for the document
    :p Aram FileName: file name
    :p Aram num: Sample contains the number of features
    :return: Sample array and label
    '
    fr = open (filename)
    arrayoflines = Fr.readlines ()           #将文本读取列表, note the difference from ReadLine ()
    numberoflines = Len (arrayoflines)       # Gets the number of lines of text
    Returnmat = Zeros ((numberoflines, num))
    classlabelvector = []
    index = 0 for
    line in Arrayoflines: Line
        = Line.strip ()                 #去回撤
        listfromline = Line.sptrit (' \ t ')    #用 \ t split data into list
        of elements Returnmat[index,:] = Listfromline[0:num]
        classlabelvector.append (int (listfromline[-1))
        index = 1
    Return Returnmat, Classlabelvector

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More