Python implementations of machine learning Algorithms (1): Logistics regression and linear discriminant analysis (LDA)

Source: Internet
Author: User
Tags sca

First of all, to collect ...

This article is for the author after learning Zhou Zhihua Teacher's machine study material, writes after the class exercises the programming question. Previously placed in the answer post, now re-organized, will need to implement the code to take out the part of the individual, slowly accumulate. Want to write a machine learning algorithm implementation of the series.


This article mainly includes:

1, Logistics return

2.Python Library:

    • NumPy
    • Matplotlib
    • Pandas
Data set used: Watermelon Data set 3.0α on machine learning textbook
Idx Density Ratio_sugar Label
1 0.697 0.46 1
2 0.774 0.376 1
3 0.634 0.264 1
4 0.608 0.318 1
5 0.556 0.215 1
6 0.403 0.237 1
7 0.481 0.149 1
8 0.437 0.211 1
9 0.666 0.091 0
10 0.243 0.0267 0
11 0.245 0.057 0
12 0.343 0.099 0
13 0.639 0.161 0
14 0.657 0.198 0
15 0.36 0.37 0
16 0.593 0.042 0
17 0.719 0.103 0


Logistic regression:refer to the "machine learning combat" content. The gradient ascending method and the random gradient ascending method are respectively written. Made a little change to the program in the book.
#-*-coding:cp936-*-from numpy import * Import pandas as PD import Matplotlib.pyplot as Plt #读入csv文件数据 df=pd.re Ad_csv (' Watermelon_3a.csv ') m,n=shape (datamat) df[' Norm ']=ones ((m,1)) Datamat=array (df[[' norm ', ' density ', ' ratio_ Sugar ']].values[:,:]) Labelmat=mat (df[' label '].values[:]). Transpose () #sigmoid函数 def sigmoid (InX): Return 1.0/(1      +exp (-inx)) #梯度上升算法 def gradascent (Datamat,labelmat): M,n=shape (df.values) alpha=0.1 maxcycles=500          Weights=array (Ones ((n,1))) for K in range (Maxcycles): A=dot (datamat,weights) h=sigmoid (a) Error= (labelmat-h) Weights=weights+alpha*dot (Datamat.transpose (), error) return weights #随机梯度上升 def r          Andomgradascent (datamat,label,numiter=50): M,n=shape (Datamat) weights=ones (n) for J in Range (Numiter): Dataindex=range (m) for I in Range (m): alpha=40/(1.0+j+i) +0.2 Randindex_index=int (Random.uniform (0,len (dataindex))) Randindex=dataindex[randindex_index] h=sigmoid (SUM (dot (datamat[randindex],weights)))              Error= (label[randindex]-h) weights=weights+alpha*error[0,0]* (Datamat[randindex].transpose ()) Del (Dataindex[randindex_index]) return weights #画图 def plotbestfit (weights): M=shape (Datamat) [0] Xco Rd1=[] ycord1=[] xcord2=[] ycord2=[] for I in range (m): if Labelmat[i]==1:xcord              1.append (datamat[i,1]) ycord1.append (datamat[i,2]) else:xcord2.append (datamat[i,1]) Ycord2.append (datamat[i,2]) plt.figure (1) ax=plt.subplot (111) ax.scatter (xcord1,ycord1,s=30,c= ' r Ed ', marker= ' s ') ax.scatter (xcord2,ycord2,s=30,c= ' green ') X=arange (0.2,0.8,0.1) Y=array ((-weights[0]-weights      [1]*x]/weights[2]) print shape (x) print shape (y) Plt.sca (AX) plt.plot (x, y) #ramdomgradAscent #plt. Plot (x,y[0]) #grAdascent plt.xlabel (' density ') plt.ylabel (' Ratio_sugar ') #plt. Title (' Gradascent Logistic regression ') Plt.title (' ramdom gradascent logistic regression ') plt.show () #weights =gradascent (Datamat,labelmat) Weights=rando   Mgradascent (Datamat,labelmat) plotbestfit (weights)

The results obtained by the gradient rise method are as follows:
The result of the random gradient rise method is as follows:
It can be seen that the effect of the two methods is basically similar. However, the random gradient rise method requires a much smaller number of iterations

The programming of LDA is the main reference book of P62 3.39 and P61 of the 3.33 of these two formulas. Because the formula can be directly calculated, it is relatively simple
The formula is as follows:

#-*-coding:cp936-*-from numpy import * Import NumPy as NP import pandas as PD import Matplotlib.pyplot as Plt Df=pd.read_csv (' watermelon_3a.csv ') def calulate_w (): Df1=df[df.label==1] df2=df[df.label==0] X1=df1.val Ues[:,1:3] X0=df2.values[:,1:3] Mean1=array ([Mean (x1[:,0]), mean (x1[:,1])]) Mean0=array ([Mean (x0[:,0]), mean (          x0[:,1]) M1=shape (X1) [0] Sw=zeros (shape= (2,2)) for I in Range (M1): Xsmean=mat (X1[i,:]-mean1)          Sw+=xsmean.transpose () *xsmean m0=shape (X0) [0] for I in range (M0): Xsmean=mat (X0[i,:]-mean0) Sw+=xsmean.transpose () *xsmean w= (mean0-mean1) * (Mat (SW). I) Return W def plot (W): Datamat=array (df[[' density ', ' Ratio_sugar ']].values[:,:]) Labelmat=mat (df[' label '].values[:]). Transpose () M=shape (Datamat) [0] xcord1=[] ycord1=[] xcord2=[] ycord2=[] for I In range (m): If Labelmat[i]==1:xcord1.append (datamat[i,0])             Ycord1.append (datamat[i,1]) else:xcord2.append (datamat[i,0]) Ycord2.appen D (datamat[i,1]) plt.figure (1) ax=plt.subplot (111) ax.scatter (xcord1,ycord1,s=30,c= ' red ', marker= ' s ') ax      . Scatter (xcord2,ycord2,s=30,c= ' green ') X=arange ( -0.2,0.8,0.1) Y=array ((-w[0,0]*x)/w[0,1]) print shape (x) Print shape (y) Plt.sca (ax) #plt. Plot (x, y) #ramdomgradAscent plt.plot (x, y) #gradAscent PLT.XL   Abel (' Density ') plt.ylabel (' Ratio_sugar ') plt.title (' LDA ') plt.show () W=calulate_w () plot (W)

The results are as follows:

The corresponding W value is:

[ -6.62487509e-04, -9.36728168e-01]

Because of the relationship between data distribution, LDA's effect is not obvious. So I changed the number of samples of several label=0, rerun the program to get the result as follows:


The result is obvious, the corresponding W value is:

[-0.60311161,-0.67601433]


Transferred from: http://cache.baiducontent.com/c?m= 9d78d513d9d430db4f9be0697b14c0101f4381132ba6d70209d6843890732f43506793ac57270772d7d20d1016db4d4bea81743971597deb8f8fc814d 2e1d46e6d9f26476d01d61f4f860eafbc1764977c875a9ef34ea1a7b57accef8c959a49008a155e2bdea7960c57529934ae552ce4a59b49105a10bd &p=ce6fc64ad4d807f449bd9b7d0d1796&newp= c26ada15d9c041ae17a6c7710f0a88231610db2151dcd101298ffe0cc4241a1a1a3aecbf21261b01d4c67a6606a94c5de1f53373310434f1f689df08d 2ecce7e60c3&user=baidu&fm=sc&query=%cf%df%d0%d4%c5%d0%b1%f0%b7%d6%ce%f6+python&qid= Ccbe92e80000a2cb&p1=1

Python implementations of machine learning Algorithms (1): Logistics regression and linear discriminant analysis (LDA)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.