A very common problem is that the data encountered is multidimensional data, the dimension is too high will lead to extreme complexity of the model, the Compromise of the bill is to reduce dimensions, and then Q Cluster, classification, regression. dimensionality reduction emphasizes reducing dimensions ( selecting optimal features ) without loss of accuracy
PCA is the most common dimensionality reduction algorithm, which looks for linearly unrelated feature subsets (major factors), plus LDA (Linear discriminant analysis, linear discriminant analyses), MDS (multidimensional scaling, multidimensional scale analysis) , the citation at the end of the text is very recommended, Daniel Summary of the comparison of bereaved? Bereaved?
The following PCA methods and LIBSVM are used in the Mlpy module to reduce and classify. Note: when performing mlpy. Libsvm.learn (z,y) will error , temporarily still do not know how to deal with, here only to do record sharing, have to know the small partners must inform Acridine
1 #-*-coding:utf-8-*-2 """3 Created on Fri Oct 09:54:54 20184 5 @author: Luove6 """7 8 ImportNumPy as NP9 ImportMatplotlib.pyplot as PltTen Importmlpy One fromMatplotlibImportcm A - -Filepath='D:\Analyze\Python matlab\python\datalib Py\wine.data' the defGetData (): -List1 = [Line.strip (). Split (',') forLineinchOpen (filepath,'R'). ReadLines ()] - return[List (list2[1:14]) forList2inchLIST1],[LIST2[0] forList2inchList1] -Matrix, labels =GetData () + -x1=[];y1=[] +X2=[];y2=[] Ax3=[];y3=[] atX=0;y=1#alcohol and malic acid attributes, respectively, indicating column index number - forN,eleminchEnumerate (Matrix):#An enumeration dictionary is generated (auto-generate key values starting with 0 for each values list) - ifInt (Labels[n]) = = 1:#str transform to int -X1.append (Matrix[n][x])#Extract the Alcohol attribute column values under this category, - y1.append (Matrix[n][y]) - elifInt (Labels[n]) = = 2: in x2.append (matrix[n][x]) - y2.append (Matrix[n][y]) to elifInt (Labels[n]) = = 3: + x3.append (matrix[n][x]) - y3.append (Matrix[n][y]) the *Plt.scatter (x1,y1,s=50,c='Green', label='Class 1')#s control Point size $Plt.scatter (x2,y2,s=100,c='Red', label='Class 2')Panax NotoginsengPlt.scatter (x3,y3,s=200,c='darkred', label='Class 3') -Plt.title ('Wine features', fontsize=14) thePlt.xlabel ('x Axis') +Plt.ylabel ('Y Axis') A plt.legend () thePlt.grid (true,linestyle='--', color='0.0')#color= ' 0.5 ', gray scale, value [0,1], the higher the value is closer to the gray, the lower the more white, the smaller the value of the darker + plt.show () - #reduced dimension, PCA (Principal component Analysis,principal component analysis) principal components; MDS (Multidimensional scaling) multidimensional scale analysis $Wine = Np.loadtxt (filepath,delimiter=',')#The first column is the rest of the Label column property columns $X,y=wine[:,1:6],wine[:,0].astype (np.int) - X.shape - Y.shape the -Pca=mlpy. PCA ()#Build, InstantiateWuyiPca.learn (x)#input Data thez = pca.transform (x,k=2)#reduced to 2 dimensions - Z.shape Wu Print(Cm.cmap_d.keys ()) -Plt.scatter (z[:,0],z[:,1],c=y,s=50,cmap=cm. Reds) AboutPlt.xlabel ('First component') $Plt.ylabel ('Second Component') - plt.show () - -Svm=mlpy. LIBSVM (kernel_type='Linear', gamma=10) A Svm.learn (x, y) +Xmin,xmax = Z[:,0].min () -0.1,z[:,0].max () +0.1 theYmin,ymax = Z[:,1].min () -0.1,z[:,1].max () +0.1 -Xx,yy = Np.meshgrid (Np.arange (xmin,xmax,0.01), Np.arange (ymin,ymax,0.01)) $Grid =np.c_ (Xx.ravel (), Yy.ravel ()) theresult =svm.pred (GRID) thePlt.pcolormesh (Xx,yy,result.reshape (xx.shape), cmap=cm. Greys_r) thePlt.scatter (z[:,0],z[:,1],c=y,s=50,cmap=cm. Reds) thePlt.xlabel ('First component') -Plt.ylabel ('Second Component') in Plt.xlim (Xmin,xmax) thePlt.ylim (Ymin,ymax)
p.s.: instance running in the execution to 61 line is error , want to know how to solve the small partner informed, thank you ~
REF:
Study notes on artificial mental retardation--machine learning (a) LDA dimensionality reduction
Detailed multidimensional scale method (Mds,multidimensional scaling)
SVM principle and derivation of "machine learning" support vector machine
"Practical Data Analysis": text and mlpy documents need to be able to pick up:https://github.com/Luove/Data
Further exploration of mlpy, dimensionality reduction, classification, visualization