Machine learning Combat NOTE-K Neighbor Algorithm 1 (category action movie and love Movie)

Source: Internet
Author: User

K Nearest neighbor algorithm uses the distance method of measuring different eigenvalues to classify

K Nearest Neighbor Algorithm features:
Advantages: High precision, insensitive to outliers, no data input assumptions.
Disadvantages: High computational complexity and high spatial complexity.
Applicable data range: Numerical and nominal type.

K Nearest Neighbor Algorithm principle:
There is a collection of sample data, also called a training sample set, and there is a label for each data in the sample set, that is, we know the correspondence between each data in the sample set and the owning category. After losing new data with no tags, each feature of the new data corresponds to the data in the sample set.
Features are compared, and then the algorithm extracts the classification labels of the most similar data (nearest neighbor) in the sample set. In general, we only select the first k most similar data in the sample data set, which is the source of K in the K-nearest neighbor algorithm, usually K is an integer not greater than 20.
Finally, select the most frequently occurring classification of the K most similar data as the classification of the new data.

Case I. Classification of movies using fights and kissing shots

案例分析:    首先我们需要知道未知电影存在多少个打斗镜头和接吻镜头,计算未知电影与样本集中其他电影的距离。按照距离递增排序,可以找到K个距离最近的电影。然后选取K个分类中出现次数最多的分类即为未知电影的种类。k-近邻算法的一般流程:     (1)收集数据:可以使用任何方法。    (2)准备数据:距离计算所需要的数值,最好是结构化的数据格式。    (3)分析数据:可以使用任何方法。    (4)训练算法:此步驟不适用于1 近邻算法。    (5)测试算法:计算错误率。    (6)使用算法:首先需要输入样本数据和结构化的输出结果,然后运行女-近邻算法判定输    入数据分别属于哪个分类,最后应用对计算出的分类执行后续的处理。代码:

knn.py
From numpy Import *
Import operator

#创建数据集

def createDataSet():

 #使用numpy中的Array类创建二维数组 group = Array ([[3,104],[2,100],[1,81] , [101,10],[99,5],[98,2]) labels = [' Love movie ', ' Love movie ', ' Love movie ', ' action movie ', ' action movie ', ' action movie '] return group,labels ' ' Inx, test vector dataset, data set, two Dimension matrix form labels, category K, number of "' 
def classify0 (Inx, DataSet, labels, k): #获取二维数组行数 datasetsize = dataset.shape[0] #tile, will Inx Vector into a two-dimensional array of equivalent row numbers of datasets Diffmat = Tile (Inx, (datasetsize,1))-DataSet Sqdiffmat = Diffmat * 2 sqdistances = Sqdiffmat.sum (axis = 1) distances = sqdistances * * 0.5 sorteddistindicies = Distances.argsort (); ClassCount = {}; For I in range (k): Voteilabel = Labels[sorteddistindicies[i]] Classcount[voteilabel] = Classcount.get (Voteil abel,0) + 1; #operator. Itemgetter (1) specifies that value in ClassCount is the number of the sort comparison, reverse=true indicates the reverse display, and it is important to note that the dictionary does not have iteritems methods after Python3. Sortedclasscount = sorted (Classcount.items (), key = Operator.itemgetter (1), reverse=true) return sortedclasscount[0][0 ]

Test code:
Import KNN

group,labels = kNN.createDataSet()kNN.classify0([18,90],group,labels,3)

Results

Reference books:
< machine learning Combat >
Peter
Publisher: People's post and telecommunications publishing house

Machine learning Combat NOTE-K Neighbor Algorithm 1 (category action movie and love Movie)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.