Author: shizhixin
Blog: http://blog.csdn.net/shizhixin
Weibo:Http://weibo.com/zhixinshi
Email: zstarstone@163.com
Date: 2016-04-14
Note : This notebook is based on the online and book materials summarized, and to all the authors expressed thanks, especially thanks to the author mentioned in the literature. This note is for learning and communication only. 1 Overview
There are always a variety of problems with real training data:
1, for example, to get a sample of a car, which has a "kilometer/hour" maximum speed characteristics, but also the "mile/hour" maximum speed characteristics, obviously these two features have an excess.
2, to get a math department of the undergraduate final exam results list, there are three columns, a list of the interest in mathematics, a list is review time, there is a list of exam results. We know that to learn maths well, we need to have a strong interest, so the second one is related to the first, and the third and the second is strong. Is it possible to combine the first and second items?
3, get a sample, features very much, and the sample is very few, so using regression to direct fitting is very difficult, easy to excessive fitting. For example, housing prices in Beijing: Assuming that the characteristics of the house are (size, position, orientation, whether the school district room, the age of construction, whether second-hand, the number of layers, the number of floors), so many characteristics, the result of less than 10 houses of the sample. To fit the characteristics of the house-> the characteristics of housing prices, it will cause excessive fitting.
4. This is similar to the second one, assuming that in the document-lexical matrix we establish in IR, there are two terms "learn" and "study", which are considered independent in the traditional vector space model. However, from the semantic point of view, the two are similar, and the frequency is similar to each other, it can be combined into a feature.
5, in the signal transmission process, because the channel is not ideal, the other end of the channel received signals will have noise disturbance, then how to filter these noises.
And a lot of the characteristics here are related to the class label, but there is noise or redundancy. In this case, a feature dimensionality reduction method is needed to reduce the number of features, reduce noise and redundancy, and reduce the likelihood of excessive fitting.
A method called Principal component Analysis (PCA) is discussed below to solve some of the above problems. The idea of PCA is to map n-dimensional features onto D-dimensional (d<n) (d), which is a new orthogonal feature. The D-Viterbi, called the principal element, is a reconstructed D-dimensional feature rather than simply removing the rest of the n-d dimension features from the N-dimensional feature. 2 Introduction inner product and projection
Guided reading: Most people estimate that PCA is a projection of the data point to the largest unit vector of the new variance, but what is the projection of the vector and what does it have to do with the inner product?
The inner product of two vectors is defined as:
(A1,a2,⋯,an) T⋅ (b1,b2,⋯,bn) t=a1b1+a2b2+⋯+anbn (a_1,a_2,\cdots,a_n) ^\mathsf{t}\cdot b_1,b_2,\cdots,b_n (^\mathsf{T}=a_1b_1+a_) 2b_2+\cdots+a_nb_n
The inner product of two vectors is a real number, so it is easy to calculate the inner product of two vectors from the angle of mathematical calculation, but how to understand the inner product from the geometrical sense.
For simplicity, look at the inner product of a two-dimensional vector. Suppose there are two vectors a= (x1,y1) T a= (x_1,y_1) ^t and b= (x2,y2) T b= (x_2,y_2) ^t, to make a vertical line from point A to the OB vector, the vertical and OB intersection C is the projection of a on B, and the angle between A and B is α\a Lpha, then the projection length of point A to B is the | A|cos (Alpha) | A|cos (\alpha).
Figure 1: Inner product and projection
There is another way to express the inner product of a vector:
a⋅b=| a| | B|cos (Alpha) A\cdot b=| a| | B|cos (\alpha)
That is, the inner product of A and b equals the projection length of a to B multiplied by the mode of B. Further, if we assume that B is a unit vector, its modulus is 1, that is, | B|=1 | B|=1, then it becomes:
a⋅b=| A|cos (Alpha) A\cdot b=| A|cos (\alpha)
In other words, assuming Vector b is the unit vector (modulo 1), then the inner product value of a and unit vector b equals the projection length of the line of a to unit vector B, i.e. the length of the OC in Figure 1. coordinates of base and base
As shown in Figure 1, it is well known that vector B is represented by point coordinates as