Machine Learning Public Course notes (8): K-means Clustering and PCA dimensionality reduction

Last Update:2016-01-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

K-means algorithm

Unsupervised learning attempts to discover the underlying structure of a group of untagged data, including:

Market Division (segmentation)
Social networking Analytics (social network analysis)
Manage computer clusters (Organize computer Clusters)
Astronomical data Analysis (astronomical)

K-means algorithm belongs to unsupervised learning, the input of the algorithm is: training data set $\{x^{(1)},x^{(2)},\ldots, x^{(m)}\}$ (where $x^{(i)}\in r^{n}$) and the number of clusters $k$ (divide data into $k$ class) The algorithm output is $k$ cluster Center $\mu_1, \mu_2, \ldots, \mu_k$, and each data point $x^{(i)}$ the classification.

K-means algorithm Steps

Random Initialization $k$ Cluster Center (cluster centroid) $\mu_1, \mu_2, \ldots, \mu_k$
Cluster assignment: For each data point $x^{(i)}$, look for the cluster center closest to it and classify it into that class; $c^{(i)}=\min\limits_k| | x^{(i)}-\mu_k| | ^2$, where $c^{(i)}$ represents $x^{(i)}$ the class
Move centroid: Update the value of the cluster center $u_k$ to the average of all data points that belong to the class $k$
Repeat 2, 3 steps until convergence or maximum iteration count

Figure 1 K-means Algorithm Example

Optimization target of K-means algorithm

The cost function for}$ optimization is $ $J (K-means (1) c^{(m)},\ldots,c^{) using $\mu_{c^{(i)}}$ to represent the center of the class in which the $i$ data points $x^{(i)},\mu_1,\ldots,\mu_k; =\frac {1} {m}\sum\limits_{i=1}^{m}| | x^{(i)}-\mu_{c^{(i)}}| | ^2$$ wants to find the optimal parameter to minimize the function, i.e. $$\min\limits_{\substack{c^{(1)},\ldots,c^{(m)} \ \ \mu_1,\ldots,\mu_k}}j (c^{(1)},\ldots,c^ {(m)},\mu_1,\ldots,\mu_k) $$

Issues to be aware of

Random initialization: The commonly used initialization method is to randomly select $k$ ($K < m$) data points from the training point, as the initial cluster center $\mu_1, \mu_2, \ldots, \mu_k$
Local optimality: The performance of the algorithm clustering is related to the selection of the initial clustering center, in order to avoid falling into the local optimal (2), it should be run multiple times (50 times) to make $j$ the smallest result.
$K $ value Selection: Elbow method, Draw $j$ with $k$ curve, select the descending speed of the sudden slow turning point as the K value, for the transition is not obvious curve, according to the K-means algorithm follow-up target selection.

Fig. 2 Global optimal solution and local optimal solutions of K-means algorithm

Figure 3 cases where K values are selected using the Elbow method (left) and elbow (right)

PCA Reduced Dimension Algorithm motivation

Data compression: Compress high-dimensional data (n-dimensional) into low-dimensional data (k-dimensional)

Data visualization: Compress data into 2 D/3 dimensions for easy visualization

Formalization of PCA problem

If we need to compress the two-dimensional data points into one-dimensional data points, we need to find a direction that minimizes the error when the data points are projected in this direction (that is, the distance from the point to the line is the smallest), and more generally, if you need to compress the data points of the $n$ dimension to the $k$ dimension, we need to find $k$ )}, u^{(2)}, \ldots, u^{(k)}$ the smallest error when projecting data points to $u^{(i)}$ in each direction.

Figure 4 PCA instance, compress 2-D data points into 1-dimensional data points, find new direction $u_1$, so that the projection error (perpendicular distance in the graph such as $x^i$ to ${\widetilde x}^i$) is minimized

Note: The difference between PCA and linear regression, PCA is to ensure that the error of the projection (Figure 5 right Yellow line) is the smallest, and the linear regression is to ensure the error along the $y$ direction (Figure 5 left Yellow line) the smallest.

Fig. 5 difference between linear regression and PCA optimization target

PCA algorithm Steps

1. Data preprocessing: Mean Normalization:$\mu_j = \frac{1}{m}\sum\limits_{i=1}^{m}x_j^{(i)}, x_j^{(i)}=x_j-\mu_j$;feature Scaling: (optional, required when different feature range gaps are too large), $x _j^{(i)}=\frac{x^{(i)}-\mu_j}{\sigma_j}$

2. Calculate the covariance matrix (convariance matrix) $$\sigma=\frac{1}{m}\sum\limits_{i=1}^{m}x^{(i)} (x^{(i)}) ^t \quad \text{or} \quad \ Sigma = \frac{1}{m}x^tx$$

3. Calculating the eigenvectors of the covariance matrix $\sigma$ [U, S, V] = SVD (Sigma)

4. Select the first k-column vector of the U-matrix as the direction of the K-main element, forming a matrix $u_{reduce}$

5. For each raw data point $x$ ($x \in r^n$), its reduced-dimension data points $z$ ($z \in r^k$) are $z =u_{reduce}^t x$

Application of PCA

Refactoring data: For the descending dimension of the K-dimensional data point Z, the approximate point after the N-dimension is restored is $x _{apporx} (\approx x) =u_{reduce}z$

Select K Value

Average projection error (Average square projection error): $\frac{1}{m}\sum\limits_{i=1}^{m}| | x^{(i)}-x^{(i)}_{approx}| | ^2$
Total Variation: $\frac{1}{m}\sum\limits_{i=1}^{m}| | x^{(i)}| | ^2$
Select a minimum k value to make $\frac{\frac{1}{m}\sum\limits_{i=1}^{m}| | x^{(i)}-x^{(i)}_{approx}| | ^2}{\frac{1}{m}\sum\limits_{i=1}^{m}| | x^{(i)}| | ^2} \leq 0.01 (0.05) $, can also be selected using the SVD decomposition S matrix $1-\frac{\sum\limits_{i=1}^{k}s_{ii}}{\sum\limits_{i=1}^{n}s_{ii}}\leq 0.01 (0.05) $

Recommendations for applying PCA

For accelerated supervised Learning: (1) for tagged data, the PCA data is reduced after removing the label, (2) using the data of the reduced dimension to train the model, (3) for the new data points, the PCA reduced dimension to obtain the dimensionality reduction data, and the model to obtain the predicted value. Note : You should only use the training set data for PCA dimensionality reduction get Map $x^{(i)}\rightarrow z^{(i)}$, and then apply the mapping (PCA-selected principal matrix $u_reduce$) to the validation set and test set
do not use PCA to block overfitting, use regularization.
Before using PCA, model training with raw data, if not, consider using PCA instead of directly using PCA.

Reference documents

[1] Andrew Ng Coursera public class eighth week

[2] ramble on Clustering:k-means. Http://blog.pluskid.org/?p=17

[3] K-means clustering in a GIF. http://www.statsblogs.com/2014/02/18/k-means-clustering-in-a-gif/

[4] Wikipedia:principal component analysis. Https://en.wikipedia.org/wiki/Principal_component_analysis

[5] explained Visually:principal component Analysis http://setosa.io/ev/principal-component-analysis/

Machine Learning Public Course notes (8): K-means Clustering and PCA dimensionality reduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine Learning Public Course notes (8): K-means Clustering and PCA dimensionality reduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine Learning Public Course notes (8): K-means Clustering and PCA dimensionality reduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support