ML: dimensionality Reduction Algorithm-Overview

Source: Internet
Author: User

In the field of machine learning, the so-called dimensionality reduction refers to the mapping of data points in the original high-dimensional space to the low-dimensional space . The essence of dimensionality is to learn a mapping function f:x->y, where x is the expression of the original data point, Y is the low-dimensional vector expression after the data point mapping, usually the dimension of y is less than the dimension of X (of course, the dimension is also possible). F may be explicit or implicit, linear, or non-linear. Reasons for using dimensionality reduction:

    • compress data to reduce storage .
    • Effects of noise removal
    • Extracting features from your data for easy categorization
    • Projecting data into a low-dimensional visual space for easy visibility into the distribution of data
    • The number of variables (features) may be too large relative to the number of data bars, thus not conforming to the requirements of some models. For example, if you have 100 data, but there are 200 characteristics, then most of the models return the wrong, reminding you that the number of variables (features) is too many

Because the above reasons are also to better understand the data, reading the information, usually using some data reduction methods to a certain extent, the number of variables (features) to a degree of reduction, without losing the overwhelming majority of information, as far as possible to generate more explanatory force features, while removing unnecessary features. In many algorithms, the reduced-dimension algorithm becomes a part of data preprocessing . In addition, there are some algorithms without dimensionality reduction preprocessing, in fact, it is very difficult to get good results.

Classification of descending dimension algorithm

there are all sorts of ways to reduce dimensionality, it is difficult to one step using the dimensionality reduction method without any adaptation in visual analysis. A common practice is to adapt the standard dimensionality reduction method to specific application scenarios by interacting. From different angles can have different classification, the main classification methods are:

    • According to the characteristics of the data: can be divided into linear dimensionality reduction and non- linear dimensionality reduction
    • Based on the monitoring information of whether the data is considered and utilized: it can be divided into unsupervised dimensionality reduction , supervised reduction and semi-supervised dimension reduction .
    • According to the structure of preserving data: it can be divided into Global maintenance dimensionality reduction , Local maintenance reduction dimension and Global and local consistent dimensionality reduction , etc.

linear/Nonlinear:

Linear dimensionality reduction means that the low dimensional data obtained by dimensionality reduction can maintain the linear relationship between high-dimensional data points. The linear dimensionality reduction methods mainly include:

    • PCA principal component Analysis (Principal Component analysis)
    • linear discriminant Analysis LDA (Linear discriminant analysis)
    • Locally reserved Projection LPP (Local preserving Projection): LPP is actually a linear representation of Laplacian eigenmaps

Nonlinear dimensionality reduction is based on kernel, and the other is usually called manifold Learning: To recover the low-dimensional manifold structure from high-dimensional sampled data (assuming that the data is uniformly sampled in a low-dimensional manifold in a high-dimensional Euclidean space), the low-dimensional manifold in the high-dimensional space is found, and the corresponding embedding mapping is obtained. Nonlinear manifold learning methods are:

    • Equidistant mapping Isomap (isometric Mapping)
    • Local linear embedding LLE (Locally Linear embedding)
    • Laplace feature Mapping (Laplacian eigenmaps)
    • Local tangent space arrangement LTSA (local Tangent space Alignment)
    • Max Variance Expansion MVU (Maximum Variance unfolding)

As a whole, the linear method calculates the block with low complexity, but has poor dimensionality reduction effect on complex data.

supervised/non-supervised

The main difference between supervised and unsupervised learning is whether the data sample has category information.

    • The objective of the unsupervised dimensionality reduction method is to minimize the loss of information in dimensionality reduction, such as PCA, LPP, Isomap, LLE, Laplacian eigenmaps, LTSA and MVU.
    • The goal of the supervised Dimensionality reduction method is to maximize the identification between categories, such as LDA.

In fact, there are corresponding supervised or semi-supervised methods for the non-supervised dimensionality reduction algorithm.

Global/local

    • The local method only considers the local information of the sample set, that is, the relationship between the data point and the neighboring point. The local method is represented by Lle, and also includes Laplacian eigenmaps, LPP, and LTSA.
    • The global method not only considers the local information of the sample geometry, but also considers the global information of the sample set and the relationship between the sample point and the non-adjacent point. The global algorithm has PCA, LDA, Isomap, MVU.

Because the local method does not take into account the relationship between the samples that are far apart on the data manifold, the local method cannot achieve the purpose of "making the samples that are far apart on the data manifold far apart".

ML: dimensionality Reduction Algorithm-Overview

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.