Partial Least Square regression (partial least squares regression)

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Problem 1.

This section shows the final plsr related to component analysis and regression. Plsr feels that it has brought component analysis and regression to the extreme. The following describes the idea rather than the complete tutorial. Let's review the disadvantages of the earliest linear regression: if the number of samples m is less than the number of features N (M <n) or linear correlation between features, due to (N * n matrix) the rank of is smaller than the number of features (that is, irreversible ). Therefore, the least square method fails.

To solve this problem, we will use PCA to reduce the dimensionality of sample X (M * n matrix). we may say that the X after dimensionality reduction is X' (M * r matrix, if 'is added, it indicates transpose. Here, it is changed temporarily). Then, the rank of x' is R (column unrelated ).

2. PCA Revisited

Here, we will review PCA.

X indicates the sample, which contains m samples. The feature dimension of each sample is N ,. Assume that we have done 0 processing for each feature mean.

If the rank of X is less than N, the rank of the covariance matrix of X is less than N. Therefore, if linear regression is used directly, the least square method cannot be used to solve the unique problem, we want to use PCA to make it reversible, so that we can use the least square method for regression. Such a regression is called PCA ).

PCAA representation:

X is the sample matrix, and P is the feature vector of the covariance matrix of X (of course, the first R feature vectors selected after the feature values are sorted ), T is the projection of X on the new orthogonal subspaces formed by P (also the new matrix after Dimensionality Reduction of sample X ).

In linear algebra, we know that real symmetric matrix A must have orthogonal matrix P, making it a diagonal matrix. Therefore, we can make the feature vector matrix P orthogonal.

In fact, the column vectors of T are also orthogonal. The less rigorous proof is as follows:

It is used. This is the process of P. It is a diagonal array, and the element on the diagonal is the feature value. Here we have made p a unit, that is. This shows that t is orthogonal. P is the feature vector matrix. Furthermore, T is the feature vector matrix (.

After PCA, the new sample Matrix T (M * r) is full-rank and the column vector is orthogonal. Therefore, the regression coefficient can be obtained by directly using the least square formula.

PCAAnother representation:

(Assume that the X rank is n)

This formula is actually no different from the above representation.

(Of course we think P is N * n, so)

If P is N * r, that is, if the feature vectors with smaller feature values are discarded, the formula above becomes

Here e is the residual matrix. In fact, this formula has a strong geometric significance. It is the normalized feature vector corresponding to the largest feature value, that is, the projection on X. That is, X is first projected to X and obtained in the original coordinate system '. The figure below can help us understand:

The black lines represent the original coordinate system, and the blue points represent the original four two-dimensional sample points. After PCA is completed, two orthogonal feature vector coordinates and are obtained. The green point is the projection (with the maximum variance) on the sample point, and the red point is the projection on the sample point. Each component is the intercept of the Green Point and the intercept of the red point. Each component in can be seen as a vector with the direction and intercept as the corresponding component size, such as the orange arrow on it. All the projection vectors of X are obtained. Due to the positive intersection, it is equivalent to the addition of the orange arrow of each vertex. As you can imagine, the original sample points are obtained.

If some feature vectors are discarded, such as, some information of the original vertex can only be restored (the obtained green vertex loses the information of the blue Vertex on another dimension ). In addition, P is called the loading matrix and T is called the score matrix.

3. plsr ideas and steps

We also need to review the CCA to introduce plsr. In the CCA, we projected X and Y into a straight line to get U and V respectively, and then calculated the Pearson coefficient of U and V (that is, Corr (u, v )), the higher the relevance, the better. Formal Representation:

Maximize

Subject:

Where A and B are the required projection directions.

Think about the disadvantages of CCA: the processing method for features is rough, and linear regression is used to represent the relationship between U and X. U is also the projection of X on a certain line, therefore, linear regression has some disadvantages. We want to introduce PCA component extraction technology into CCA so that u and v can carry the most important information of samples as much as possible. Another more important issue is that the CCA is to look for the relationship between the U and V after the X and Y projection. Obviously, the relationship between x and y cannot be used to restore, that is, the direct ing between x and y is not found. This is also the reason why KNN is most often used for CCA prediction.

While plsr is more intelligent, it takes both PCA and CCA into account, and solves the ing problem between x and y. Let's look at the PCA revisited figure. Assume that for the CCA, the projection line of X is, then the CCA only considers the correlation between the green point of X and the projection result of Y on a certain line, the information of x and y in other dimensions is discarded, so there is no ing between x and y. The plsr will take another step on the basis of the CCA. Since the original Blue Points can be considered as the superposition of green points and red points, the Green Point of X is used for regression (it looks a bit strange, multiply the values on both sides to understand. Here, Y is similar to linear regression.) then, use the red point of X to perform regression on the remaining part of F (get,) of Y ,). In this way, Y is the superposition of two regression parts. When a new X is generated, the green and red points are obtained by projection, and then the y can be restored through R to realize the ing between x and y. Of course, this is just a geometric description, which is somewhat different from the following details.

The following describes plsr:

1) both X and Y have been standardized (including mean reduction and standard deviation Division ).

2) set the first principal component of X as, and the first principal component of Y as, both of which are subject to unitization. (The principal component here is not derived from PCA)

3), this step looks the same as CCA, But here p and q both have Principal Component properties, so there are the expected conditions of 4) and 5) below.

4), that is, projection on the principal component. We expect that the variance is maximized.

5). This is consistent with the idea of CCA.

6) Integrated 4) and 5) to achieve the optimization goal.

Formally speaking:

Maximize

Subject:

It seems easier than CCA. In fact, it is not the case that the problem of one optimization of CCA is finished. But here and for plsr is just a principal component, there are other components, and that information should also be calculated.

Let's take a look at the solution to this optimization problem:

Introduce the Laplace Multiplier

Evaluate the partial direction separately.

We can see from the above (both sides are multiplied by P or Q, and then use the constraint of = 1)

The following formula is substituted into the above formula to obtain

The preceding formula is substituted into the following formula to obtain

The maximum number of objective functions required.

Therefore, the unit feature vector corresponding to the maximum feature value of a symmetric array is the unit feature vector corresponding to the maximum feature value.

The biggest difference between the projection variance and the maximum correlation is visible and the biggest balance between the two, while the CCA only maximizes the correlation.

Obtain the sum.

The obtained result is similar to the green point, but the relationship between x and y is found on the Green Point. If this ends, the problem that the CCA ing from X to Y is the same as that of the CCA will occur.

Using our second expression in Pca revisited, we can continue to build a regression equation:

C and D here are different from p and q, but there is a certain relationship between them, which will be proved soon. E and G are residual matrices.

The following steps of plsr are as follows:

1) Use y for regression. The reason has been explained. Use X's principal component to perform regression on Y.

2) use the least square method to calculate C, D, and R:

In fact, each projection vector is calculated in this step.

And the relationship is as follows:

Let's talk about the relationship with each other, although we will replace it with a vector that can meet the equality and Geometric Requirements and is the direction vector projected by X. But here we want to do a regression (to make e as small as possible), so what we get is generally different according to the least square method.

3) use the remaining E as the new X, and the remaining F as the New Y. Then, follow the preceding steps to obtain the sum:

The target function, which is the same as the previous one, and is the unit feature vector corresponding to the maximum feature values of the new sum, respectively.

4) The second group of regression coefficients is calculated:

Here the result is orthogonal to the previous one, and the proof is as follows:

In fact, they are orthogonal to each other.

They are also orthogonal to each other.

However, the difference is generally not orthogonal.

5) obtain the regression equation from the previous step:

If there is a residual matrix, continue the calculation.

6) after such calculation, the final result is:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Partial Least Square regression (partial least squares regression)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Partial Least Square regression (partial least squares regression)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support