First of all, to collect ...
This article is for the author after learning Zhou Zhihua Teacher's machine study material, writes after the class exercises the programming question. Previously placed in the answer post, now re-organized, will need to implement the code to take out the part of the individual, slowly accumulate. Want to write a machine learning algorithm implementation of the series.
This article mainly includes:
1, Logistics return
2.Python Library:
Data set used: Watermelon Data set 3.0α on machine learning textbook
Idx |
Density |
Ratio_sugar |
Label |
1 |
0.697 |
0.46 |
1 |
2 |
0.774 |
0.376 |
1 |
3 |
0.634 |
0.264 |
1 |
4 |
0.608 |
0.318 |
1 |
5 |
0.556 |
0.215 |
1 |
6 |
0.403 |
0.237 |
1 |
7 |
0.481 |
0.149 |
1 |
8 |
0.437 |
0.211 |
1 |
9 |
0.666 |
0.091 |
0 |
10 |
0.243 |
0.0267 |
0 |
11 |
0.245 |
0.057 |
0 |
12 |
0.343 |
0.099 |
0 |
13 |
0.639 |
0.161 |
0 |
14 |
0.657 |
0.198 |
0 |
15 |
0.36 |
0.37 |
0 |
16 |
0.593 |
0.042 |
0 |
17 |
0.719 |
0.103 |
0 |
Logistic regression:refer to the "machine learning combat" content. The gradient ascending method and the random gradient ascending method are respectively written. Made a little change to the program in the book.
#-*-coding:cp936-*-from numpy import * Import pandas as PD import Matplotlib.pyplot as Plt #读入csv文件数据 df=pd.re Ad_csv (' Watermelon_3a.csv ') m,n=shape (datamat) df[' Norm ']=ones ((m,1)) Datamat=array (df[[' norm ', ' density ', ' ratio_ Sugar ']].values[:,:]) Labelmat=mat (df[' label '].values[:]). Transpose () #sigmoid函数 def sigmoid (InX): Return 1.0/(1 +exp (-inx)) #梯度上升算法 def gradascent (Datamat,labelmat): M,n=shape (df.values) alpha=0.1 maxcycles=500 Weights=array (Ones ((n,1))) for K in range (Maxcycles): A=dot (datamat,weights) h=sigmoid (a) Error= (labelmat-h) Weights=weights+alpha*dot (Datamat.transpose (), error) return weights #随机梯度上升 def r Andomgradascent (datamat,label,numiter=50): M,n=shape (Datamat) weights=ones (n) for J in Range (Numiter): Dataindex=range (m) for I in Range (m): alpha=40/(1.0+j+i) +0.2 Randindex_index=int (Random.uniform (0,len (dataindex))) Randindex=dataindex[randindex_index] h=sigmoid (SUM (dot (datamat[randindex],weights))) Error= (label[randindex]-h) weights=weights+alpha*error[0,0]* (Datamat[randindex].transpose ()) Del (Dataindex[randindex_index]) return weights #画图 def plotbestfit (weights): M=shape (Datamat) [0] Xco Rd1=[] ycord1=[] xcord2=[] ycord2=[] for I in range (m): if Labelmat[i]==1:xcord 1.append (datamat[i,1]) ycord1.append (datamat[i,2]) else:xcord2.append (datamat[i,1]) Ycord2.append (datamat[i,2]) plt.figure (1) ax=plt.subplot (111) ax.scatter (xcord1,ycord1,s=30,c= ' r Ed ', marker= ' s ') ax.scatter (xcord2,ycord2,s=30,c= ' green ') X=arange (0.2,0.8,0.1) Y=array ((-weights[0]-weights [1]*x]/weights[2]) print shape (x) print shape (y) Plt.sca (AX) plt.plot (x, y) #ramdomgradAscent #plt. Plot (x,y[0]) #grAdascent plt.xlabel (' density ') plt.ylabel (' Ratio_sugar ') #plt. Title (' Gradascent Logistic regression ') Plt.title (' ramdom gradascent logistic regression ') plt.show () #weights =gradascent (Datamat,labelmat) Weights=rando Mgradascent (Datamat,labelmat) plotbestfit (weights)
The results obtained by the gradient rise method are as follows:
The result of the random gradient rise method is as follows:
It can be seen that the effect of the two methods is basically similar. However, the random gradient rise method requires a much smaller number of iterations
The programming of LDA is the main reference book of P62 3.39 and P61 of the 3.33 of these two formulas. Because the formula can be directly calculated, it is relatively simple
The formula is as follows:
#-*-coding:cp936-*-from numpy import * Import NumPy as NP import pandas as PD import Matplotlib.pyplot as Plt Df=pd.read_csv (' watermelon_3a.csv ') def calulate_w (): Df1=df[df.label==1] df2=df[df.label==0] X1=df1.val Ues[:,1:3] X0=df2.values[:,1:3] Mean1=array ([Mean (x1[:,0]), mean (x1[:,1])]) Mean0=array ([Mean (x0[:,0]), mean ( x0[:,1]) M1=shape (X1) [0] Sw=zeros (shape= (2,2)) for I in Range (M1): Xsmean=mat (X1[i,:]-mean1) Sw+=xsmean.transpose () *xsmean m0=shape (X0) [0] for I in range (M0): Xsmean=mat (X0[i,:]-mean0) Sw+=xsmean.transpose () *xsmean w= (mean0-mean1) * (Mat (SW). I) Return W def plot (W): Datamat=array (df[[' density ', ' Ratio_sugar ']].values[:,:]) Labelmat=mat (df[' label '].values[:]). Transpose () M=shape (Datamat) [0] xcord1=[] ycord1=[] xcord2=[] ycord2=[] for I In range (m): If Labelmat[i]==1:xcord1.append (datamat[i,0]) Ycord1.append (datamat[i,1]) else:xcord2.append (datamat[i,0]) Ycord2.appen D (datamat[i,1]) plt.figure (1) ax=plt.subplot (111) ax.scatter (xcord1,ycord1,s=30,c= ' red ', marker= ' s ') ax . Scatter (xcord2,ycord2,s=30,c= ' green ') X=arange ( -0.2,0.8,0.1) Y=array ((-w[0,0]*x)/w[0,1]) print shape (x) Print shape (y) Plt.sca (ax) #plt. Plot (x, y) #ramdomgradAscent plt.plot (x, y) #gradAscent PLT.XL Abel (' Density ') plt.ylabel (' Ratio_sugar ') plt.title (' LDA ') plt.show () W=calulate_w () plot (W)
The results are as follows:
The corresponding W value is:
[ -6.62487509e-04, -9.36728168e-01]
Because of the relationship between data distribution, LDA's effect is not obvious. So I changed the number of samples of several label=0, rerun the program to get the result as follows:
The result is obvious, the corresponding W value is:
[-0.60311161,-0.67601433]
Transferred from: http://cache.baiducontent.com/c?m= 9d78d513d9d430db4f9be0697b14c0101f4381132ba6d70209d6843890732f43506793ac57270772d7d20d1016db4d4bea81743971597deb8f8fc814d 2e1d46e6d9f26476d01d61f4f860eafbc1764977c875a9ef34ea1a7b57accef8c959a49008a155e2bdea7960c57529934ae552ce4a59b49105a10bd &p=ce6fc64ad4d807f449bd9b7d0d1796&newp= c26ada15d9c041ae17a6c7710f0a88231610db2151dcd101298ffe0cc4241a1a1a3aecbf21261b01d4c67a6606a94c5de1f53373310434f1f689df08d 2ecce7e60c3&user=baidu&fm=sc&query=%cf%df%d0%d4%c5%d0%b1%f0%b7%d6%ce%f6+python&qid= Ccbe92e80000a2cb&p1=1
Python implementations of machine learning Algorithms (1): Logistics regression and linear discriminant analysis (LDA)