Machine learning Combat-KNN Classification algorithm _

Machine learning Combat-KNN Classification algorithm __ Machine learning

Last Update:2018-08-20 Source: Internet

Author: User

Tags square root

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

a summary of KNN algorithm

KNN classification algorithm is simple and effective, can be classified and return.
Core principle: The characteristics and classification of each data of a given sample dataset, the characteristics of the new data and the sample data are compared to find the most similar (nearest neighbor) K (k<=20) data, select K data occurrence most of the classification, as the new data classification.

in short: Birds of a feather flock together second, for example:

As shown in the following illustration:

Blue squares and red triangles are known categories, green circles are our data to be measured and need to be sorted.
If k=3, the nearest 3 neighbors of the green dot are 2 red triangles and a blue square, so the minority obeys the majority, and the green dots belong to the triangle category.
If k=5, the last 5 Lin Jun of a green dot are 2 red triangles and 3 blue, so green dots belong to the blue category.

Distance calculation:
The common distance calculation method is Euclidean distance. European distance: the European distance between sample and sample is: three, the algorithm flow: calculate the distance between the points in the known class dataset and the current point by the distance order the K point of the current point distance to determine the probability of occurrence of the class of the first K point Returns the highest frequency category of current K points as the current point of prediction classification Iv. Code implementation:

# Calculate the distance between the measured point and the sample point
def classify0 (InX, DataSet, Lables, K):
    datasetsize = dataset.shape[0]
    Diffmat = Tile (InX, Datasetsize, 1)-The DataSet  converts the points to be measured into a matrix that is equal to the sample data and then subtracts the matrix of the sample data
    Sqdiffmat = Diffmat * * 2  # Square and compute the difference between the sample point and the point to be measured
    sqdistances = sqdiffmat.sum (Axis=1)  # Calculates the distance between two points and
    distances = sqdistances * 0.5  # Open root operation
    sorteddistindicies = Distances.argsort ()  # to the distance between two points from small to large sort
    # print sorteddistindicies
    ClassCount = {}
    for I in range (k):
        # Select the K point with the smallest distance
        Volteilabel = lables[sorteddistindicies[i]]
        Classcount[volteilabel] = classcount.get (Volteilabel, 0) + 1
    sortedclasscount = sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reverse=true) return
    sortedclasscount[0][0]

v. Summary

Advantages
The KNN algorithm itself is simple and effective, it is a lazy-learning algorithm. There is no need to use the training set for training, the training time complexity of 0.

defects:

The computational complexity is high: it needs to calculate the distance with each sample data, so the classification time complexity of KNN is O (n), which is proportional to the total number of samples.

K-Value setting: K-Value selection on the results of the algorithm is very large, if the K set too small will reduce the classification accuracy, if the K-value set too large, and the test samples belong to the training set contains less data classes, it will increase noise, reduce the classification effect. In general, the K-value is set in a cross test (based on k=1, and k<=20) rule of thumb: K is generally lower than the square root of the training sample number.

The data sample is unbalanced, resulting in greater error: When the sample is unbalanced, such as a large sample size, and other sample capacity is very small, it is possible to enter a new sample, the sample of the K-neighbor of the large size of the sample majority. Resolution: Different weights are given to various samples.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More