Principal Component analysis

Source: Internet
Author: User

Principal Component analysis (PCA)Also called:Principal Component Analysis and Principal Component Regression Analysis

Directory

  • 1. What is principal component analysis?
  • 2. Basic Idea of principal component analysis
  • 3 principle of principal component analysis
  • 4 main functions of principal component analysis
  • 5 calculation steps of principal component analysis
  • 6 Application Analysis of principal component analysis
    • 6.1 Case 1: Application of principal component analysis in Beer Flavor Evaluation Analysis [1]

      • 6.1.1 1 Materials and Methods
      • 6.1.2 2 principle of principal component analysis
      • 6.1.3 3 Application of principal component analysis in beer quality consistency Assessment
      • 6.1.4 4 Conclusion
  • 7 related entries
  • 8 references

What is principal component analysis?

Principal component analysis is also calledMain Component AnalysisThe purpose is to use the concept of dimensionality reduction to convert multiple indicators into a few comprehensive indicators.

In statistics, Principal Component Analysis (PCA (Principal components analysis (PCA)) Is a technology that simplifies data sets. It is a linear transformation. This transformation transforms the data into a new coordinate system so that the first variance of any data projection is on the first coordinate (called the first principal component, the second variance is in the Second coordinate (second principal component), and so on. Principal component analysis is often used to reduce the dimension of a dataset, while maintaining the feature with the greatest contribution of the difference between the datasets. This is done by retaining the low-level Principal Component and ignoring the High-Level principal component. In this way, low-level components often retain the most important aspect of data. However, this is not certain, but depends on the specific application.

The basic idea of principal component analysis

In the empirical research, we must consider many influencing factors to comprehensively and systematically analyze the problem. These factors are generally referred to as indicators and variables in multivariate statistical analysis. Because each variable reflects certain information of the problem to varying degrees, and indicators are correlated to each other, the information reflected by the Statistical Data overlaps to a certain extent. When we use statistical methods to study the problem of multiple variables, too many variables will increase the calculation workload and complexity of the analysis problem. people hope that there will be fewer variables involved in the quantitative analysis process, A large amount of information is obtained. Principal component analysis is designed to meet this requirement and is an ideal tool for solving such problems.

Similarly, this problem also exists in the course of evaluating the effectiveness of science popularization. It is difficult to quantify the effectiveness of popular science. In actual evaluation work, we often select several representative comprehensive indicators and use the scoring method for evaluation. Therefore, the selection of comprehensive indicators is a key and difficult point. As mentioned above, PCA is an ideal tool to solve this problem. Since many variables involved in the evaluation have a certain degree of relevance, there must be a dominant factor. Based on this, through the study of the relationship between the internal structure of the matrix of the original variables, We can find several comprehensive indicators that affect a certain element of the popular science effect, so that the comprehensive indicators can be linear fit of the original variables. In this way, the comprehensive index not only retains the main information of the original variables, but also has some better properties than the original variables, this makes it easy for us to grasp the main contradiction when studying the complex science popularization performance evaluation problem. The above idea can be further summarized as follows: a science popularization performance evaluation element involves an indicator, which forms a dimension random vector. The orthogonal transformation is performed, so that each component of an orthogonal array is unrelated, which makes it easy to explain the role of each component in an evaluation element, this makes it possible for us to select the main component from the main component, remove the part that has a slight impact on this element, through the key analysis of the main component, to analyze the original variables. Each component is a linear combination of the original variables. Different components indicate different influences between the original variables. As these basic relationships are likely to be associated with specific processes of action, principal component analysis enables us to identify some of the main elements of the intricate science evaluation elements, in order to effectively use a large amount of statistical data to evaluate and analyze the effectiveness of science popularization, we may be inspired by the depth of the Study on the Effectiveness Evaluation of science popularization.

For example, in the evaluation of the development and utilization of popular science products, there are millions of people involved in the creation of popular science products, millions of circulation of Kop works, and the industrialization of Popular Science (millions of people in the Popular Science Demonstration Base) and other indicators. After principal component analysis and calculation, the final determination of one or more principal components as a comprehensive indicator for the comprehensive evaluation of the use and development of popular science products, the number of variables is reduced, and a certain degree of reliability is achieved, it is easy to evaluate the effectiveness of popular science.

Principle of principal component analysis

Principal component analysis is a statistical method for dimensionality reduction. It converts original random vectors related to its components into new random vectors unrelated to its components by means of an orthogonal transformation, in algebra, the covariance array of the original random vector is transformed into an angular array, and the ry is represented by converting the original coordinate system into a new orthogonal coordinate system, point to the p orthogonal direction of the sample point distribution, and then perform dimensionality reduction on the multidimensional variable system, so that it can be converted into a low-dimensional variable system with a high accuracy, then, the low-dimensional system is further transformed into a one-dimensional system by constructing appropriate value functions.

Main functions of principal component analysis

In summary, principal component analysis is mainly used in the following aspects.

1. Principal Component Analysis can reduce the dimension of the data space. That is to say, the m-dimensional Y space is used to replace the p-dimensional X space (m <p), while the low-dimensional Y space replaces the high-dimensional x space with little information lost. That is, make only one Principal ComponentYL(M = 1 ),YLIt is still obtained using all X variables (p. For example, to calculate the mean of Yl, you must also use the mean of all x. In the selected first m principal components, ifXIIf all the coefficients are close to zero, you can take thisXIDelete, which is also a way to delete unnecessary variables.

2. Sometimes load by factorAIJTo find out some relationships between X variables.

3. a graphical representation of multidimensional data. We know that ry cannot be drawn when the dimension is greater than 3. The problem of multivariate statistics is mostly more than 3 variables. It is impossible to figure out the research problems. However, after principal component analysis, we can select the first two principal components or one of the two principal components, and draw the distribution of n samples on the two-dimensional plane based on the principal component score, the chart intuitively shows the position of each sample in the main component, and further classifies the sample. The chart can discover the outlier points away from most sample points.

4. The regression model is constructed by principal component analysis. That is, the principal component is used as the new independent variable instead of the original independent variable x for regression analysis.

5. Use PCA to filter regression variables. The selection of regression variables has important practical significance. In order to make the model easy to perform structure analysis, control, and prediction, we can select the best variable from the subset of the original variables, constitute the best set of variables. Using Principal Component Analysis to filter variables, you can use a small amount of computing to select the desired number of variables.

Calculation steps of principal component analysis

1. standardized collection of original indicator data p-Dimensional Random VectorsX= (X1,X2 ,...,XP)T) N SamplesXI= (XI1,XI2 ,...,XIP)T, I = 1, 2 ,..., N,

N> p: Construct a sample array and perform the following standard transformation on the sample array elements:

  

Among them, we need to standardize array Z.

2. Evaluate the correlation coefficient matrix for standardized Matrix Z

  

Here ,.

3. p feature roots are obtained for solving the feature equation of the sample correlation matrix R, and principal components are determined.

Based on the m value, the information utilization rate is over 85%.J, J = 1, 2,..., m, solving equationsRB= λJBUnit feature vector.

4. Convert standardized indicator variables into major components

  

  U1 is called the first principal component,U2 is called the second principal component ,...,UPIt is called the p principal component.

5. Comprehensive Evaluation of m Principal Components

Weighted summation of m principal components is used to obtain the final evaluation value. Weights represent the variance contribution rate of each principal component.

Application of principal component analysis Case 1: Application of principal component analysis in Beer Flavor Evaluation Analysis [1]

Beer is a multi-index flavor food. To fully understand the flavor of beer, beer companies have developed a large number of detection methods to analyze beer indicators, most enterprises are at a loss and do not know how to use this large amount of data. According to the above introduction, in this case, principal component analysis can be used. In recent years, in order to gain a better understanding of beer flavor, researchers are increasingly using multivariate statistics technology. There are two main reasons for this: ① In the beer field, there is almost no problem that using single variables (single indicators) can reflect the attributes of things, for example, the quality and consistency of beer cannot be explained through a diacetyl indicator. ② another important reason is that, in recent years, the emergence of a large number of mathematical statistics software and the popularization of personal computers have promoted the application of multivariate statistical analysis technology. An important task of multivariate statistical technology in Beer Flavor research is to identify the correlation between beer style and beer Physicochemical indexes (flavor composition indexes are also physical and chemical indexes. For example, the relationship between beer flavor indexes and beer flavor or flavor differences between different beer flavors can be identified using multivariate statistical technology.

Frequently used multivariate statistics technologies include clustering analysis, discriminant analysis, principal component analysis, and regression analysis. Principal component analysis can be used for multi-index products. Principal Component Analysis can differentiate products based on the similarity of things. The results can be displayed by one-dimensional, two-dimensional, or three-dimensional plane coordinate icons, which are particularly intuitive. Principal component analysis can be used to study the relationships behind different variables, and the background interpretation of principal components can be obtained based on these variables.

In view of the powerful role of principal component analysis in Beer Flavor Quality application, this paper briefly introduces the basic principle of Principal Component Analysis and Its Application in beer consistency monitoring, which has aroused widespread attention in China's beer industry.

1 Materials and Methods

1.1 instruments

HP 6890 Capillary Chromatograph (American company anjet), FID detector, HP 7694E air input automatic sampling device, HP Gas Chromatography chemical workstation.

1.2 Analysis Method

1.2.1 Sample Preparation

Beer is refrigerated at 5 ℃, 5 mL liquor is consumed in 20 mL top empty bottles, 2.0g/L n-butanol solution is added 0.10 mL, and the sealing pad and aluminum cover are added for sealing, oscillating mixing for determination by top air chromatography.

1.2.2 chromatographic conditions

Capillary Column (DB-WAXETR 30 m × 0.53 I. d, film thickness 1.0 μm); column temperature: the initial temperature is 35 deg C, to 10 deg C/min program to 150 deg C, and then 20 deg C/min to 180 deg C, the temperature of the sample inlet is 150 ℃, the detector temperature is 200 ℃, the carrier gas is high purity nitrogen, the flow rate is 5 ml/min, the hydrogen is 30 mL/min, and the air is 400 ml/min; the shunting sampling is adopted, and the shunting ratio is.

  

2 principle of principal component analysis

2.1 necessity of applying Principal Component Analysis in beer Research: an example is provided to illustrate the necessity of principal component analysis in beer research. Suppose there are 6 beer samples labeled as A-F, each of which is described with 3 indicators. These indicators can be the analytical data, sensory analysis data, or both of them. For the sake of discussion, we assume these three indicators are respectively the bitter value (BU), DMS, and alcohol concentration. To understand the similarity between the six samples and to facilitate the classification of the six samples, the six samples can be painted in three-dimensional space, as shown in figure 1. Obviously, in this simple example, these 6 samples tend to form two types: A-C and D-F. The measured indicators can be used to explain this classification. For example, the first group (A-C) has A higher bitter taste value and A lower alcohol concentration. This example involves only 6 samples and 3 indicators. But in fact, both the number of samples and the number of indicators are very large. For example, if there are 20 indicators, the samples cannot be drawn in the 20-dimensional coordinate system. To solve the problem of multi-index Sample comparison, principal component analysis can be used.

2.2 principle of principal component analysis

The first step of principal component analysis is to standardize all metric data. The general method of standardization is as follows :(XIJXJMEAN)/DeltaJ, HereXIJIs the I index of sample j,XJMEANAnd DeltaJIs the average value and standard deviation of the j indicator. After standardization, the average value of each variable becomes 0, and the standard deviation is 1. The advantage of standardization is that it can eliminate the dimensional differences between different indicators and the differences between orders of magnitude.

The second step is to find the correlation matrix between indicators. Through the correlation matrix, we can determine highly correlated indicators. The covariance between these indicators can be replaced by another variable, which is called the first component. After removing the first component, calculate the residual correlation matrix. Through the residual correlation matrix, the second group of highly correlated variables can also find that their covariance can be replaced by the second component, and the second component and the first component are orthogonal. After the contribution of the second component to the original data is removed, the third component can be extracted. This process continues until all the variance of the original data is extracted. The result is that the original data is converted to the same number of new variables, but these new variables are orthogonal.

Therefore, the standardized data of the original variables of each sample is converted into the calculated values of a series of components. For each sample, the raw data can be expressed as a linear combination of new components. For example, a dataset with nine metrics can be converted:

  

  

..................

  

Is the standard value of raw data. It is an indicator of the degree of correlation between the original variable and the new score. It is generally called a factor load.

The contribution rate of the difference is generated by the computer's principal component program. In general, the total variance of the original data is always highly concentrated in the first few components. Therefore, in this analysis, several smaller principal components can be selected based on the acceptable contribution rate of the lowest variance. Finally, you can use the selected principal components to recalculate the samples. The recalculated value is the principal component score.

Because the variance of the original data array is usually concentrated in the first several principal components (generally two or three), a series of standardized factor scores of the sample can be drawn in two-dimensional plane coordinates, in this way, samples can be classified based on the similarity of the samples. In addition, this classification can be explained based on factor loads.

 

3 Application of principal component analysis in beer quality consistency Assessment

3.1 application of principal component analysis in flavor Difference Evaluation of beer of different brands

Beer is an alcoholic beverage and the flavor of beer is the main factor that affects beer selection. Obviously, beer is different from the same concentration of alcohol aqueous solution, mainly because beer not only contains alcohol but also contains hundreds of trace ingredients, such as aldehyde, alcohol and ester. For beer manufacturers, it is very important to take the flavor of their beer and competing beer to understand the differences between their beer and competing products, analyze the reasons why competitive beer is popular in the market to improve your products, or find out the style and characteristics of your beer, and take the path of differentiated competition. To do this, beer companies can compare their beer and competitive beer. This is a very good method, but it is difficult to find the difference between this method and competing products in essence, it is difficult to form qualitative and quantitative measures to guide production. To solve this problem, beer companies can analyze the flavor components of beer. Theoretically, the more components they analyze, the more information they obtain. However, it is difficult to conduct comparative analysis on the whole. At this time, we can use Principal Component Analysis to extract the main integrated components, and then draw a picture in the plane coordinate system for comparison.

  

Figure 2 shows the plane coordinates of the first two principal components after Principal Component Analysis of Flavor Components of beer in China. The flavor components analyzed include acetaldehyde, ethyl acetate, isobutyl ester, isoamyl acetate, isoamyl alcohol and ethyl hexanoate. The analysis time span is half a year. After using principal component analysis, the first two principal components are extracted. These two components can reflect 83.1% of all information and are completely extracted, this indicates that the two principal components Replace the sample information reflected by the original 6 flavor components. Budweiser beer, Heineken beer and Tsingtao beer are three famous brands in China's beer market, and their quality is also recognized by people.

As shown in figure 2, although the flavor content of Budweiser beer, Heineken beer, and Tsingtao Beer varies over time, each beer is made into one group, the center of the three is like three vertices of a triangle. The three constitute a flavor triangle. As shown in figure 2, beer of a southern brand has its own shape, which is different from that of Tsingtao Beer, Heineken beer, and Budweiser beer, in fact, this conclusion can be obtained through sensory tasting. The classification of principal component analysis can be explained by analyzing the principal component. Figure 3 shows the factor Load Diagram of the first two principal components.

  

As shown in figure 3, Principal Component 1 is mainly determined by ethyl acetate, isoester acetate, and ethyl hexanoate. The higher the ester content, the larger the principal component 1, that is, Principal Component 1 represents the ester flavor of beer, the higher the ester flavor, the larger the principal component 1. Principal Component 2 is mainly determined by aldehyde, isobutyl alcohol, and isoamyl alcohol. These components can represent the "alcohol strength" of beer. The higher the content of these components, the larger the Principal Component 2 is, that is, beer tastes heavier. In combination with this explanation, we can analyze the classification in Figure 2. Budweiser beer is a "Luzhou-flavor" beer with moderate flavor and relatively strong ester flavor, heineken beer is a "strong alcohol" beer with strong alcohol flavor and ester flavor. Tsingtao Beer has a heavy alcohol flavor, while xiineken beer has a weak ester flavor, a certain brand of beer is a "light" beer with weak flavor and ester flavor.

3.2 application of principal component analysis in consistent evaluation of beer flavor of the same brand

3.2.1 application of principal component analysis in consistency evaluation between different production plants of the same brand

In the past decade or so, China's beer industry has developed very fast, and the scale of beer enterprises has grown bigger and bigger. Many beer enterprises have gone out of the "Origin" of beer and established factories in different regions to further expand their scale. For some beer companies, the consumer groups facing new factories are the same as those before the establishment of factories. In this case, we need to ask the newly created beer to be consistent with the original beer production style, in this case, the consumer does not approve the switch. Figure 4 principal component analysis diagram of the same type of beer in three production plants of the same company.

From figure 4, we can see that, in general, the beer produced by the three production plants is consistent, because the beer of the same variety produced by the three production plants has a small fluctuation range. As can be seen from figure 4, the production plant 1 has a long production history and stable production, so its fluctuation is small (the circle in the figure ); the stability of production plant 2 and production plant 3 is slightly inferior. This is because both plants are new factories and there is a running-in process. At the same time, the production plant 2

  

Similar to the flavor of production plant 1, the consistency between production plant 3 and production plant 1 is slightly different. Among them, production plant 3 is the latest.

3.2.2 application of principal component analysis in beer consistency evaluation of the same manufacturer

Beer of the same type produced by the same manufacturer is reflected in the fluctuation of product flavor due to the fluctuation of water quality and raw materials at different times. The same principal component analysis can also evaluate the consistency of products over time. Taking the beer of a certain type produced by a beer company in 2006 as an example, the application of principal component analysis in the evaluation of product flavor consistency is described. To evaluate the consistency of beer flavor, beer enterprises must first determine the flavor index of beer. Currently, about 10 flavors can be determined by using the top-space-capillary technology, the two types of products are respectively aldehyde, DMS, ethyl methyl ester, ethyl acetate, isobutyl acetate, n-butanol, isobutyl alcohol, isobutyl acetate, isoamyl alcohol, and ethyl hexanoic. Some previous statistical technologies, such as statistical process control charts, can only describe the volatility of a certain benchmark, rather than the overall volatility of the product, because of fluctuations in some indicators, it will not cause fluctuations in the Product Style, while the principal component analysis method shows the volatility of the product in general, which is more indicative of the volatility of the product than the control chart.

Figure 5 shows the Planimetric coordinate of the first two principal components of the 10 flavor indexes of a certain type of beer produced by a beer company in 2006. The two principal components can reflect about 60% of the product information. In Figure 5, the first small elliptic is the confidence zone of 95%, that is, the points out of the elliptic account for 5%. By following up and analyzing the points outside the elliptic, we can find out the cause of the fluctuation, and avoid it in the future production process to improve product consistency.

  

4 Conclusion

4.1 Principal Component Analysis can eliminate the collinearity between variables, reduce the number of variables, and facilitate subsequent analysis.

4.2 principal component analysis can be used to differentiate products based on the similarity of things. The results can be displayed on one-dimensional, two-dimensional, or three-dimensional plane coordinate icons, which is particularly intuitive.

4.3 concentrate the sample data through principal component analysis, and then use the plane coordinates to analyze the overall consistency of the sample. The general statistical technology can only evaluate a certain indicator.

4.4 combination of techniques for analyzing Beer Flavor Components by static top-air injection High-Performance Capillary Gas Chromatography, the principal component analysis technique can be effectively used to evaluate the differences between different brands of beer and the consistency and uniformity of the same beer flavor.

 

Related entries
  • Factor Analysis
References
  1. Wei Weiping, Li Hong, Zhang wujiu. Principal Component Analysis Method and Its Evaluation in beer flavor. 2007 stage of Brewing Technology in 11th (total stage 161st)
Related documents of this entry
  • Page 7 of principal component analysis
  • Application of principal component analysis in individual stock evaluation 2 pages
  • Using Principal Component Analysis in SPSS to evaluate surgical Efficiency
  • Study on Enterprise Credit Risk Assessment under Principal Component Analysis
  • Application of principal component analysis in industrial Correlation Analysis
  • Application of weighted Principal Component Analysis in Comprehensive Evaluation of highway network
  • Application of SPSS-based Principal Component Analysis in the evaluation system
  • Evaluation of economic sustainable development ability of five counties and cities in Nantong Based on Principal Component Analysis
  • Application of BP Neural Network Principal Component Analysis in traffic demand prediction 3 pages
  • Comprehensive Evaluation of hospital medical work quality by using principal component analysis and weighted sum method

From: multi-dimensional statistical analysis

References:

1. http://zh.wikipedia.org/wiki/%E4%B8%BB%E6%88%90%E5%88%86%E5%88%86%E6%9E%90

2. http://wiki.mbalib.com/wiki/%E4%B8%BB%E6%88%90%E5%88%86%E5%88%86%E6%9E%90%E6%B3%95

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.