ML: dimensionality Reduction Algorithm-Overview

Last Update:2017-08-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the field of machine learning, the so-called dimensionality reduction refers to the mapping of data points in the original high-dimensional space to the low-dimensional space . The essence of dimensionality is to learn a mapping function f:x->y, where x is the expression of the original data point, Y is the low-dimensional vector expression after the data point mapping, usually the dimension of y is less than the dimension of X (of course, the dimension is also possible). F may be explicit or implicit, linear, or non-linear. Reasons for using dimensionality reduction:

compress data to reduce storage .
Effects of noise removal
Extracting features from your data for easy categorization
Projecting data into a low-dimensional visual space for easy visibility into the distribution of data
The number of variables (features) may be too large relative to the number of data bars, thus not conforming to the requirements of some models. For example, if you have 100 data, but there are 200 characteristics, then most of the models return the wrong, reminding you that the number of variables (features) is too many

Because the above reasons are also to better understand the data, reading the information, usually using some data reduction methods to a certain extent, the number of variables (features) to a degree of reduction, without losing the overwhelming majority of information, as far as possible to generate more explanatory force features, while removing unnecessary features. In many algorithms, the reduced-dimension algorithm becomes a part of data preprocessing . In addition, there are some algorithms without dimensionality reduction preprocessing, in fact, it is very difficult to get good results.

Classification of descending dimension algorithm

there are all sorts of ways to reduce dimensionality, it is difficult to one step using the dimensionality reduction method without any adaptation in visual analysis. A common practice is to adapt the standard dimensionality reduction method to specific application scenarios by interacting. From different angles can have different classification, the main classification methods are:

According to the characteristics of the data: can be divided into linear dimensionality reduction and non- linear dimensionality reduction
Based on the monitoring information of whether the data is considered and utilized: it can be divided into unsupervised dimensionality reduction , supervised reduction and semi-supervised dimension reduction .
According to the structure of preserving data: it can be divided into Global maintenance dimensionality reduction , Local maintenance reduction dimension and Global and local consistent dimensionality reduction , etc.

linear/Nonlinear:

Linear dimensionality reduction means that the low dimensional data obtained by dimensionality reduction can maintain the linear relationship between high-dimensional data points. The linear dimensionality reduction methods mainly include:

PCA principal component Analysis (Principal Component analysis)
linear discriminant Analysis LDA (Linear discriminant analysis)
Locally reserved Projection LPP (Local preserving Projection): LPP is actually a linear representation of Laplacian eigenmaps

Nonlinear dimensionality reduction is based on kernel, and the other is usually called manifold Learning: To recover the low-dimensional manifold structure from high-dimensional sampled data (assuming that the data is uniformly sampled in a low-dimensional manifold in a high-dimensional Euclidean space), the low-dimensional manifold in the high-dimensional space is found, and the corresponding embedding mapping is obtained. Nonlinear manifold learning methods are:

Equidistant mapping Isomap (isometric Mapping)
Local linear embedding LLE (Locally Linear embedding)
Laplace feature Mapping (Laplacian eigenmaps)
Local tangent space arrangement LTSA (local Tangent space Alignment)
Max Variance Expansion MVU (Maximum Variance unfolding)

As a whole, the linear method calculates the block with low complexity, but has poor dimensionality reduction effect on complex data.

supervised/non-supervised

The main difference between supervised and unsupervised learning is whether the data sample has category information.

The objective of the unsupervised dimensionality reduction method is to minimize the loss of information in dimensionality reduction, such as PCA, LPP, Isomap, LLE, Laplacian eigenmaps, LTSA and MVU.
The goal of the supervised Dimensionality reduction method is to maximize the identification between categories, such as LDA.

In fact, there are corresponding supervised or semi-supervised methods for the non-supervised dimensionality reduction algorithm.

Global/local

The local method only considers the local information of the sample set, that is, the relationship between the data point and the neighboring point. The local method is represented by Lle, and also includes Laplacian eigenmaps, LPP, and LTSA.
The global method not only considers the local information of the sample geometry, but also considers the global information of the sample set and the relationship between the sample point and the non-adjacent point. The global algorithm has PCA, LDA, Isomap, MVU.

Because the local method does not take into account the relationship between the samples that are far apart on the data manifold, the local method cannot achieve the purpose of "making the samples that are far apart on the data manifold far apart".

ML: dimensionality Reduction Algorithm-Overview

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

ML: dimensionality Reduction Algorithm-Overview

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

ML: dimensionality Reduction Algorithm-Overview

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support