Statistical learning Methods (3rd) k Nearest Neighbor Method study notes

Source: Internet
Author: User

The 3rd Chapter K nearest Neighbor method

K Nearest neighbor algorithm is simple and intuitive: given a training data set, the new input instance, in the training data set to find the closest to the instance of the K-instance, this K-instance of the majority belongs to a class, the input instance is divided into this class. When K=1, also known as the nearest neighbor algorithm, is the class that sets the training dataset to the x nearest neighbor as the X.

3.1 k Nearest Neighbor model

The model is determined by three basic elements-distance metric, K-worthy choice, and classification decision rule.

3.1.1 Distance measurement

    

    p=2, called the European distance, P=1, is called the Manhattan distance.

3.1.2 K-Value selection

    The choice of K-value will have a significant effect on the results of K-nearest neighbor method. If you choose a smaller k value, it is equivalent to using a training instance in a smaller neighborhood to predict that the approximate error of "learning" will decrease, and only the (similar) training instance that is closer to the input instance will work on the prediction result but the disadvantage is that the estimate error of "learning" will increase. Prediction results are very sensitive to the neighbor's instance point. If the neighboring instance point happens to be noise, the prediction will go wrong. That is, the K value of the month is small, the equivalent of the model more complex, prone to overfitting, when the k=n, then no matter what input, is a simple prediction for the class with the most instances, the model is too simple. In general, cross-validation is used to select the optimal k value.

3.1.3 Classification decision rules

Most voting rules, the class of the input instance, are determined by most classes in the training instance of the K nearest neighbor of the input instance.

Realization of 3.2 K Nearest neighbor method: KD Tree

KD Tree is a two-fork tree, which represents a division of k-dimensional space, the construction of KD tree is equivalent to continuously using perpendicular to the axis of the super-plane to divide the K-dimensional space, forming a series of k-dimensional super-rectangular region. Each node of the KD tree corresponds to a k-dimensional hyper-rectangular region.

3.2.1 Construction KD Tree

  

 Give examples for ease of understanding:

    

3.2.2 Search kd Tree

    

   Similarly, an example is given to facilitate comprehension:

    

   Therefore, the use of KD tree can eliminate most of the data points of the search, thus reducing the computational capacity of the search.

Statistical learning Methods (3rd) k Nearest Neighbor Method study notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.