Python3 and machine Learning practice---1, the simplest K-proximity algorithm (k-nearest NEIGHBOR,KNN)

Source: Internet
Author: User
Tags ranges

Introduction to K-Proximity algorithm:

K-Neighbor algorithm is to calculate the distance between the data to be classified and the sample data, get the first k (usually not more than 20) and the most similar data to be classified data, then classify the K data, and classify the data to the category with the most occurrences.

It is to be noted that

1, sometimes need to be based on the characteristics of the data in the classification of contribution size, weighted;

2, if the characteristics of the classification of the same contribution, and the difference between the characteristics of a large number of large numbers will affect the classification results, the characteristics of the data need to be normalized treatment. In the data processing, normalization is commonly used pretreatment means, the method of normalization is also more, reproduced here a blog about normalization, to the normalization of the reader can go. Re-discussion on normalization in machine learning (normalization method)

The linear normalized Python implementation is as follows:

'
normalized
'
def autonorm (dataSet):
    minvalues = dataset.min (0)
    maxvalues = Dataset.max (0)
    ranges = maxvalues-minvalues
    normdataset = zeros (Shape (dataSet))
    m = dataset.shape[0]
    Normdataset = Dataset-tile (Minvalues, (M, 1))
    Normdataset = Normdataset/tile (ranges, (M, 1)) return
    normdataset, ranges, MI Nvalues

Python3 implements the simplest K-proximity algorithm:

From numpy Import * import operator ' group: Sample Data Labels: sample data corresponding category label ' Def CreateDataSet (): group = array ([1.0, 1. 1], [1.0, 1.0], [0, 0], [0, 0.1]]) labels = [' A ', ' a ', ' B ', ' B '] return group, labels ' ' Define K-proximity algorithm ' def classif Y0 (InX, DataSet, labels, k): ":p AramInX: Data to be sorted:p AramDataSet: Sample Data:p AramLabels: Sample data labels:p AramK: Select the number of points with the smallest distance: return: Categories of data to be classified ' "' compute Euclidean distance ' #shape: function in NumPy, get array, tuple datasetsize = dataset.shape[0] # til Functions in E:numpy, constructing array diffmat = Tile (InX, (datasetsize, 1))-DataSet Sqdiffmat = Diffmat * * 2 sqdistances = Sqdiff Mat.sum (axis=1) distances = sqdistances * * 0.5 #argsort: Functions in NumPy, sorting and extracting indexes sorteddistindicies = DISTANCES.ARGSO RT () ClassCount = {} ' ' before K count category returns to category Data category ' For I in range (k): Voteilabel = Labels[s Orteddistindicies[i]] Classcount[voteilabel] = classcount.get (Voteilabel, 0) + 1 Sortedclasscount = sorted (cl Asscount.items (), Key=operator.itemgetter (1), reverse=true) #items ():p ython3 syntax, python2 need to use: Iteritems () return Sortedclasscount[0][0] if __name__ = = "__main__": Group, labels = CreateDataSet () print (Classify0 ([0, 0], group, Labels, 3))
You can read the sample data stored in the text by using the code:

"
take data from text
" '
def File2matrix (filename,num):
    '
    Note: The last behavior tag data for the document
    :p Aram FileName: file name
    :p Aram num: Sample contains the number of features
    :return: Sample array and label
    '
    fr = open (filename)
    arrayoflines = Fr.readlines ()           #将文本读取列表, note the difference from ReadLine ()
    numberoflines = Len (arrayoflines)       # Gets the number of lines of text
    Returnmat = Zeros ((numberoflines, num))
    classlabelvector = []
    index = 0 for
    line in Arrayoflines: Line
        = Line.strip ()                 #去回撤
        listfromline = Line.sptrit (' \ t ')    #用 \ t split data into list
        of elements Returnmat[index,:] = Listfromline[0:num]
        classlabelvector.append (int (listfromline[-1))
        index = 1
    Return Returnmat, Classlabelvector

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.