function to assign data points to the nearest cluster center

Source: Internet
Author: User
Background

In the clustering of large samples, because of the computational cost of K-means, some samples are randomly selected for clustering, and the clustering centers are obtained. However, it is often needed to get the nearest cluster center for each sample, which is commonly used in index construction, eg. OPQ (Pami 2014), inverted Multi-Index (Pami 2014). algorithm Steps

Set a eigenvector p (1*2000), 2000 is a feature dimension. The cluster center matrix is C (256*2000), 256 is the center number, and 2000 is the characteristic dimension.
1. Normalization of data
\quad calculates the sum of squares of P-dimensions and obtains p_norm, which is a floating-point number.
2. Normalization of Cluster Center
\quad each row of C, compute the sum of each dimension squared, get c_norm:256*1
3. P_norm extended to 1*256
4. Calculation p_norm = P_norm+c_norm;
5. Compute point P to 256 center distance vector dis=−2c∗p+p_norm dis = -2c * p + p\_norm
6. Calculates the index of the smallest data in dis, that is, the cluster center analysis of the nearest point P

Why is the formula in step 5th getting a distance?
A: Set C in a cluster center vector (C1,C2,..., c2000) (c_1, c_2, ..., c_{2000}), the sample point P for (P1,p2,..., p2000) (P_1, p_2, ..., p_{2000}), then the 1th step of the normalization That is p21+...+p22000 p_1^2 + ... + p_{2000}^2, the normalization of the 2nd step is c21+...+c22000 c_1^2 + ... + c_{2000}^2, the formula in 5 is actually: −2 (p1∗c1+...+p2 000∗c2000) +p21+...+p22000+c21+...+c22000-2 (p_1*c_1 + ... + p_{2000}*c_{2000}) + p_1^2 + ... + p_{2000}^2 + c_1^2 + ... + C_{2000}^2
That
(P1−C1) 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.