Follow me to learn algorithmic-PCA (dimensionality reduction)

Source: Internet
Author: User

PCA is a black box type of dimensionality reduction, through mapping, hope that the projected data as far as possible, so to ensure that the map after the variance as large as possible, the direction of the next map and the current mapping direction orthogonal

Steps of PCA:

The first step: first to the current data (de-mean) to find the covariance matrix, covariance matrix = data * Data of the transpose/(M-1) m for the number of columns, the diagonal is the variance, the other position represents the covariance

The second step: need to through the matrix diagonalization, so that the covariance is 0, there is only a diagonal direction of the data, this time can be obtained our eigenvalues and eigenvectors

The third step: the current data * feature vector is done to reduce the dimension of work, eigenvalues/eigenvalues of the sum of the eigenvalues can be expressed in the corresponding eigenvectors of the expression of importance

The following is a description of the program

First step: Data import, go to mean value, ask covariance

ImportPandas as PDImportNumPy as NPImportMatplotlib.pyplot as PLTDF= Pd.read_csv ('Iris.data')Print(Df.head ()) Df.columns=['Sepal_len','Sepal_wid','Petal_len','Petal_wid','class']Print(Df.head ())#used to store variablesX = df.ix[:, 0:4].values#used to store labelsy = df.ix[:, 4].values msg={'Iris-setosa'70A'Iris-versicolor': 1,'Iris-virginica': 2}df['class'] = df['class'].map (msg)#change the letters to numbers#to standardize fromSklearn.preprocessingImportStandardscalerscaler=Standardscaler () X_scaler=scaler.fit_transform (X)#ask for the mean value of each rowMean_vec = Np.mean (X_scaler, axis=0)#to seek the covariance matrix after the mean valueCov_mat = (X_scaler-mean_vec). T.dot (X_scaler-mean_vec)/(x_scaler.shape[0]-1)Print(Cov_mat)#using NP to find the covariance matrix, the result is the sameCov_mat =Np.cov (X_SCALER.T)Print(Cov_mat)

The second step: the process of seeking the diagonalization of matrices is a process of finding eigenvalues and eigenvectors.

#finding eigenvalues and eigenvectorsEig_vals, eig_vecs =Np.linalg.eig (Cov_mat)Print(Eig_vals, Eig_vecs)#merging eigenvalues with feature vectorsEig_pairs = [(Np.abs (Eig_vals[i]), eig_vecs[:, I]) forIinchRange (len (eig_vals))]#Combination CorrespondenceEig_pairs.sort (key=LambdaX:x[0], reverse=True) Tot=sum (eig_vals) var_exp= [(I/tot) *100 forIinchSorted (Eig_vals, reverse=True)]#Cumsum represents the sum of each of the first two numbersCum_var_exp =np.cumsum (VAR_EXP)#DrawingPlt.figure (figsize= (6, 4))#Draw a bar chartPlt.bar (Range (4), Var_exp, alpha=0.5, align='Center', the label='individual explained Variance')#Draw Step ChartPlt.step (Range (4), Cum_var_exp, where='Mid', the label='Cumulative Explained Variance') Plt.ylabel ('explained Variance ratio') Plt.xlabel ('Principal Components') plt.legend (Loc=' Best') plt.tight_layout ()

Step three: Data (mean value)

#reduce the 4-D matrix to two-D and take the first two eigenvector combinations to transpose the points.#np.hstack Merge Two vectors, reshape to make a row into a column, equivalent to transposeMatrix_w = Np.hstack ((Eig_pairs[0][1].reshape (4,1), eig_pairs[1][1].reshape (4,1)))#matrix 149*4 after transformation. dot 4*2 = 149*2Become_x_scaler =X_scaler.dot (matrix_w)Print(Become_x_scaler) plt.figure (figsize= (6, 4)) Color= Np.array (['Red','Green','Blue'])#constituent determinantPlt.scatter (become_x_scaler[:,0], become_x_scaler[:,1], c=color[df['class']])#draw a scatter plot of the kind corresponding to the colorPlt.show ()

Follow me to learn algorithmic-PCA (dimensionality reduction)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.