Machine learning Combat-KNN Classification algorithm __ Machine learning

Source: Internet
Author: User
Tags square root
a summary of KNN algorithm

KNN classification algorithm is simple and effective, can be classified and return.
Core principle: The characteristics and classification of each data of a given sample dataset, the characteristics of the new data and the sample data are compared to find the most similar (nearest neighbor) K (k<=20) data, select K data occurrence most of the classification, as the new data classification.

in short: Birds of a feather flock together second, for example:

As shown in the following illustration:

Blue squares and red triangles are known categories, green circles are our data to be measured and need to be sorted.
If k=3, the nearest 3 neighbors of the green dot are 2 red triangles and a blue square, so the minority obeys the majority, and the green dots belong to the triangle category.
If k=5, the last 5 Lin Jun of a green dot are 2 red triangles and 3 blue, so green dots belong to the blue category.

Distance calculation:
The common distance calculation method is Euclidean distance. European distance: the European distance between sample and sample is: three, the algorithm flow: calculate the distance between the points in the known class dataset and the current point by the distance order the K point of the current point distance to determine the probability of occurrence of the class of the first K point Returns the highest frequency category of current K points as the current point of prediction classification Iv. Code implementation:

# Calculate the distance between the measured point and the sample point
def classify0 (InX, DataSet, Lables, K):
    datasetsize = dataset.shape[0]
    Diffmat = Tile (InX, Datasetsize, 1)-The DataSet  converts the points to be measured into a matrix that is equal to the sample data and then subtracts the matrix of the sample data
    Sqdiffmat = Diffmat * * 2  # Square and compute the difference between the sample point and the point to be measured
    sqdistances = sqdiffmat.sum (Axis=1)  # Calculates the distance between two points and
    distances = sqdistances * 0.5  # Open root operation
    sorteddistindicies = Distances.argsort ()  # to the distance between two points from small to large sort
    # print sorteddistindicies
    ClassCount = {}
    for I in range (k):
        # Select the K point with the smallest distance
        Volteilabel = lables[sorteddistindicies[i]]
        Classcount[volteilabel] = classcount.get (Volteilabel, 0) + 1
    sortedclasscount = sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reverse=true) return
    sortedclasscount[0][0]
v. Summary

Advantages
The KNN algorithm itself is simple and effective, it is a lazy-learning algorithm. There is no need to use the training set for training, the training time complexity of 0.

defects:

The computational complexity is high: it needs to calculate the distance with each sample data, so the classification time complexity of KNN is O (n), which is proportional to the total number of samples.

K-Value setting: K-Value selection on the results of the algorithm is very large, if the K set too small will reduce the classification accuracy, if the K-value set too large, and the test samples belong to the training set contains less data classes, it will increase noise, reduce the classification effect. In general, the K-value is set in a cross test (based on k=1, and k<=20) rule of thumb: K is generally lower than the square root of the training sample number.

The data sample is unbalanced, resulting in greater error: When the sample is unbalanced, such as a large sample size, and other sample capacity is very small, it is possible to enter a new sample, the sample of the K-neighbor of the large size of the sample majority. Resolution: Different weights are given to various samples.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.