KNN of 20151014_ based on distance classification algorithm

Source: Internet
Author: User

1. Principle

  By calculating the distance from each training data to the tuple to be categorized, the nearest K training data is taken and the tuple to be categorized, and which category of training data in the K data is the majority, which category the classifier belongs to.

Training samples are described with n-dimensional numeric attributes. Each sample represents a point in an n-dimensional space. All training samples are placed in the n-dimensional mode space. Given a sample, K-nearest to the taxonomy search pattern space, to find the nearest unknown sample of the nine training samples.2. Required Information
    • Training set
    • Distance calculated value
    • The number of nearest neighbors to get K
    1. Calculate the distance between two points
      1. For example, Euclidean distances can be used: D = sqrt ((x1-x2) ^2+ (y1-y2) ^2+...+ (Yn-yn) ^2)
    2. Determine the results of a classification from the nearest neighbor list
      1. method One : Select the class label for most of the K nearest neighbors

      2. method Two : You can add weights to each poll based on distance Weight factor, w=1/d2
    3. Selection of K values

If k is too small, it will be too sensitive to the noise present in the data;

If k is too large, the neighbors may contain points of other classes;

An empirical rule of thumb is k≤,q for the number of training tuples. Business algorithms typically use 10 as the default value.

3. General description

Algorithm: K-nearest Neighbor Classification algorithm input: training data t; nearest neighbor number k; tuple t to be categorized. Output: Output category C. (1) n=?;//define a neighbor set(2) for each d∈t does BEGIN (3) IF | N|≤k Then//the size of n is maintained at K(4) n=n∪{d}; (5) ELSE (6IF u∈n such that sim (T,u) <Sim (t,d) then BEGIN          /*If there is a data u in N, the similarity between T and U is less than the similarity between T and D (no equal to the condition guaranteed u! =d), i.e.: Newly added D can be added to n after n minus U, and D is a new member of N*/(7) N=n-{u};//Remove U(8) N=n∪{d};//Add D(9) END (Ten) END ( One) c=classTo which the most u∈n.
4. KNN Advantages and disadvantages

  Advantages : The principle is simple, the realization is more convenient. Supports incremental learning. Can model the complex decision space of hyper-polygon.

  cons : High computational overhead requires efficient storage technology and support for parallel hardware.

5.Java implementations

refer to: Java implementation of K nearest neighbor (KNN) algorithm

KNN of 20151014_ based on distance classification algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.