Machine Learning 4th Week---Smelting number into gold-----dimensionality reduction Technology

Source: Internet
Author: User

Principal component Analysis

Pearson was proposed in 1901, and then developed by Hotelling (1933), a multivariable statistical method
The maximum individual differences are revealed by the main component, and the number of variables in the regression analysis and clustering analysis is also reduced.
The sample covariance matrix or correlation coefficient matrix can be used as a starting point for analysis
Retention of ingredients: Kaiser Proposition (1960) Discard components with eigenvalues less than 1 and retain only those with eigenvalues greater than 1
If we can explain the variation of 80% with no more than 3-5 ingredients, even if it is successful

The optimized index is obtained by linear combination of the original variables.
The calculation of the original multiple indexes is reduced to a few of the optimized indexes (taking up most of the shares)
The basic idea: to try to regroup many of the previously relevant indicators into a new set of independent integrated indicators and replace the original indicators

The visual geometric meaning of principal component analysis

The mathematical model of principal component analysis

The idea of principal component analysis can finally be transformed into a linear algebra problem by the matrix notation

Translates to the problem of diagonalization of the covariance matrix (solving eigenvalue)

Factor analysis

A method of dimensionality reduction is the generalization and development of principal component analysis.
is a statistical model used to analyze the effects of factors behind surface phenomena. An attempt is made to describe each component of the original observation with the sum of the linear and special factors of the least number of non-measurable common factors .
Example: Academic achievement (mathematical ability, language ability, transport ability, etc.)
Example: Life satisfaction (job satisfaction, family satisfaction)
Example: Shiry book P522

Main uses of factor analysis

Reduce the number of analysis variables
By probing the correlation between variables, the original variables are grouped, i.e. the variables with high correlation are divided into a group, and the variables are substituted by the common factor .
Make the meaning of the business factor behind the problem more clearly presented

The difference from principal component analysis

Principal component analysis focuses on " variability" by converting the original variable into a new combination variable to maximize the "variance" of the data, thus maximizing the difference between the individual samples.

But the main ingredient that comes out is often difficult to explain from the perspective of business scenarios
Factor analysis pays more attention to the "common variation " of related variables, and combines the primitive variables with strong correlation.

The goal is to find a few key factors that work behind the scenes , and the results of factor analysis tend to be easier to interpret with business knowledge.

Factor analysis uses a complex mathematical approach

A more complex mathematical model than principal component analysis
Methods for solving the model: principal component method, Principal factor method, maximum likelihood method
The result can also be rotated by factor to make the business meaning more obvious

Maximum Likelihood method

Likelihood function
Maximum likelihood function
Algorithm description (Shiry book p533)

Principal Component Method

Estimating expectation and covariance matrices from samples
Finding eigenvalues and eigenvectors for covariance matrices
Omit parts with smaller eigenvalues to find a, D
Program
Example

Principal Factor method

First, standardize the variables
Gives an estimate (initial) value for M and a special variance
Finding the Simple Correlation matrix r* (P-order matrix)
Calculate the eigenvalues and eigenvectors of the r*, take the first m and omit the rest
Find A * and d*, and then iterate the calculation

Machine Learning 4th Week---Smelting number into gold-----dimensionality reduction Technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.