Machine Learning Basics (i) K-Nearest Neighbor method

Source: Internet
Author: User

Machine learning is divided into two major categories, supervised learning (supervised learning) and unsupervised learning (unsupervised learning). Supervised learning can be divided into two categories: classification (classification.) and regression (regression), the task is to classify a sample into a known category, each sample of the class information in training needs to be given, such as face recognition, behavioral recognition, target detection are classified. The task of regression is to predict a value, such as a given housing market data (area, location, etc.) to predict the price trend. Unsupervised learning can also be made into two categories: clustering (clustering) and density estimation (density estimation), clustering is the formation of a bunch of data into the weak-dry group, no category information, density estimates are estimates of a pile of data statistical parameters to describe the data, such as the depth of learning RBM.

According to the machine learning actual combat instruction order, first learns K nearest neighbor method (K nearest NEIGHBORS-KNN)

K-Nearest Neighbor method is supervised learning method, the principle is very simple, suppose we have a bunch of samples of the sample data, the class means that each sample is a corresponding known class tag, when a test sample to us to determine its category is, the separate calculation of the distance to each sample, Then select the tag that is the most recent sample from the test sample, and the tag that has the highest number of votes is the label for the test sample.

Example (movie classification):

(Figure I)

(Fig.) The horizontal axis indicates the number of fights in a movie, and the ordinate indicates the number of kisses. We want to classify the question mark in (Figure I), and the statistics and categories of the other films are as shown in (Figure II):

(Figure II)

It can be seen from (figure II) that there are three films in the category of romance, there are three of the categories of movies is action, that how to determine the question mark represents the category of the film? Based on the KNN principle, we need to compute the distance between the question mark and all other movies in the coordinate system shown in (figure I). The calculated Euclidean distance is as shown in (Figure III):

(Figure III)

Since there are only two categories of our labels, let's say we choose K=6/2=3, and since the first three films are romance, the question mark indicates that the film is romance.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.