First, the Perception machine introduction
The Perceptron (English: Perceptron) is an artificial neural network invented by Frank Rosenblatt in 1957 at the Cornell Aviation Laboratory (Cornell Aeronautical Laboratory). It can be considered as the simplest form of feedforward neural network and is a binary linear classifier. Frank Rosenblatt gives the corresponding perceptual machine learning algorithms, which are commonly used for perceptual machine learning, least squares method and gradient descent. For example, the perceptual machine uses the gradient descent method to minimize the loss function, and then find out the separation Super plane which can divide the training data linearly, and then obtain the Perceptron model. The perceptual machine is a simple abstraction of biological nerve cells. Nerve cell structure can be broadly divided into: dendritic, synaptic, cell body and axon. A single nerve cell can be seen as a machine with only two states-the excitement being ' yes ' and the ' no ' when not excited. The state of the nerve cell depends on the amount of input signal received from other nerve cells, and the intensity of the synapse (inhibition or enhancement). When the sum of the semaphore exceeds a certain threshold, the cell body becomes agitated and generates an electrical pulse. Electrical pulses follow the axon and pass through synapses to other neurons. In order to simulate neuronal behavior, the corresponding perceptual machine-based concept is proposed, such as weight (synapse), bias (threshold), and activation function (cell body).
In the field of artificial neural networks, the Perceptron is also referred to as a single-layer artificial neural network, which distinguishes it from the more complex multilayer perceptron (multilayer Perceptron). As a linear classifier, (single-layer) Perceptron is the simplest form of forward artificial neural network. Despite its simple structure, perceptual machines can learn and solve quite complex problems. The main intrinsic flaw of perceptron is that it cannot deal with linear irreducible problems.
Second, the principle of the perception machine
The principle of the perceptron algorithm and the linear regression algorithm are roughly the same, except that the predictive function h and the weight update rule are different, and the perceptron algorithm is applied to the two classification.
Introduction of data sets
The breast cancer dataset, with an instance number of 569, includes diagnostic classes and attributes that help predict the properties of 30, each of which includes the radius radius (the average of the distance from the center to the edge of the point), the texture texture (the standard deviation of the grayscale value), and so on, the classes include: Wdbc-malignant malignant and wdbc-benign benign. Using 70% of the dataset as a training set, 30% of the dataset as a test set, and both the training set and the test set include features and diagnostic classes.
The Code realization and result analysis of the Perceptual machine algorithm
Code implementation:
Import pandas as pd# reading data with pandas
Import Matplotlib.pyplot as Plt
Import NumPy as NP
From Sklearn Import preprocessing
From matplotlib.colors import Listedcolormap
From Perceptron import Perceptron
From scipy.special import expit# This is the sign () function
From sklearn.model_selection import Train_test_split
Def loaddataset ():
# df=pd.read_csv ("Breast_ cancer_ data.csv")
# # Print (Df.head ())
# # Print (Df.tail ())
# label=df.ix[:,1]
# data=df.ix[:,2:32] #数据类型没有转换, such as 31.48, is a str type and needs to be converted to float
#
# M=data.shape[0]
# Data=np.array (data,dtype=float)
#
# for I in range (m):
# if label[i]== ' B ':
# label[i]=0
# Else:
# label[i]=1
# train_x, test_x, train_y, test_y = train_test_split (data, label, test_size=0.30, random_state=0) # Dividing datasets and test sets
DF = Pd.read_csv ("Breast_ cancer_ data.csv")
Dataarray=np.array (DF)
testratio=0.3
DATASIZE=DATAARRAY.SHAPE[0]
Testnum=int (Testratio*datasize)
Trainnum=datasize-testnum
Train_x=np.array (Dataarray[0:trainnum,2:],dtype=np.float)
Test_x=np.array (Dataarray[trainnum:,2:],dtype=np.float)
train_y=dataarray[0:trainnum,1]
test_y = Dataarray[trainnum:, 1]
For I in Range (Trainnum):
If train_y[i]== ' B ':
Train_y[i]=1
Else
Train_y[i]=0
For I in Range (Testnum):
If test_y[i] = = ' B ':
Test_y[i] = 1
Else
Test_y[i] = 0
Return train_x,test_x,train_y,test_y
# def sign (inner_product):
# if inner_product >= 0:
# return 1
# Else:
# return 0
#学习模型, learn the parameters theta
def train_model (train_x,train_y,theta,learning_rate,iteration):
M=TRAIN_X.SHAPE[0]
N=TRAIN_X.SHAPE[1]
J_theta=np.zeros ((iteration,1)) #列向量
Train_x=np.insert (Train_x,0,values=1,axis=1) #相当于x0, plus on the first column
For I in range (iteration): #迭代
# Temp=theta #暂存
# J_theta[i]=sum (SUM ((Train_y-expit (Np.dot (Train_x,theta)) **2)/2.0) #dot是内积, the SUM function sums the lost column vectors to a number
J_theta[i]=sum ((Train_y[:,np.newaxis]-expit (Np.dot (Train_x,theta)) **2)/2.0#dot is an inner product, and the SUM function sums the missing column vectors to a number
For j in Range (N): #j是第j个属性, but the theta for all properties are updated, so loop
# Temp[j]=temp[j]+learning_rate*np.dot ((train_x[:,j). T) [Np.newaxis], (Train_y[:,np.newaxis]-expit (Np.dot (Train_x,theta)))) #T是转置
# Temp[j]=temp[j]+learning_rate*np.dot (Train_x[:,j]. T, (Train_y-expit (Np.dot (Train_x,theta)))) #T是转置
Theta[j]=theta[j]+learning_rate*np.dot ((train_x[:,j). T) [Np.newaxis], (Train_y[:,np.newaxis]-expit (Np.dot (Train_x,theta)))) #T是转置
# theta=temp
X_iteration=np.linspace (0,iteration,num=iteration)
Plt.plot (X_iteration,j_theta)
Plt.show ()
Return theta
def predict (Test_x,test_y,theta): #假设theta是已经学习好的参数传递进来
Errorcount=0
M=TEST_X.SHAPE[0]
Test_x=np.insert (Test_x,0,values=1,axis=1) #相当于x0
H_theta = Expit (Np.dot (test_x, theta))
For I in range (m):
If h_theta[i]>0.5:
H_theta[i]=1
Else
H_theta[i]=0
If h_theta[i]!=test_y[i]: #test_y [i] need to be 0 or 1 to compare, because H_theta[i] is 0 or 1
Errorcount+=1
Error_rate=float (errorcount)/M
Print ("Error_rate", Error_rate)
#特征缩放中的标准化方法, note: The matrix operation in NumPy should be used more
def standardization (x): #x是data
M=X.SHAPE[0]
N=X.SHAPE[1]
X_average=np.zeros ((1,n)) #x_average是1 *n Matrix
Sigma = Np.zeros ((1, N)) # Sigma is the 1*n matrix
X_result=np.zeros ((M, N)) # X_result is the m*n matrix
X_average=sum (x)/M
# x_average = X.mean (axis=0) #用np的mean函数也可以求得每一列的平均值
# for I in range (n):
# for J in Range (m):
# X_average[0][i] + = (float (x[j][i)))
# x_average[0][i]/=m
# sigma= (SUM ((x-x_average) **2)/m) **0.5#m*n matrix minus 1*n matrix, will broadcast, 1*n matrix will be copied into M*n matrix
Sigma = X.var (axis=0) # Use the Var function of NP to find the variance of each column
# for I in range (n):
# for J in Range (m):
# sigma[0][i]+= ((X[j][i]-x_average[0][i]) **2.0)
# sigma[0][i]= (sigma[0][i]/m) **0.5
x_result= (x-x_average)/sigma# the corresponding element divide
# for I in range (n):
# for J in Range (m):
# x_result[j][i]= (X[j][i]-x_average[0][i])/sigma[0][i]
Return X_result
#特征缩放中的调节比例方法
def rescaling (x):
m = x.shape[0]
n = x.shape[1]
X_min=np.zeros ((1,n)) #x_min是1 *n Matrix
X_max=np.zeros ((1,n)) #x_max是1 *n Matrix
X_result = Np.zeros ((m, N)) # X_result is the m*n matrix
# for I in range (n):
# X_min[0][i]=x[0][i]
# X_max[0][i]=x[0][i]
# for J in Range (1,m):
# if X_MIN[0][I]>X[J][I]:
# X_min[0][i]=x[j][i]
# if X_MAX[0][I]<X[J][I]:
# X_max[0][i]=x[j][i]
# for I in range (n):
# for J in Range (m):
# x_result[j][i]= (X[j][i]-x_min[0][i])/(X_max[0][i]-x_min[0][i])
X_min=x.min (axis=0) #获得每个列的最小值
X_max=x.max (axis=0) #获得每个列的最大值
X_result = (x-x_min)/(X_max-x_min)
Return X_result
If __name__== ' __main__ ':
Train_x, test_x, train_y, Test_y=loaddataset ()
# scaler=preprocessing. Minmaxscaler ()
# Train_x=scaler.fit_transform (train_x)
# Test_x=scaler.fit_transform (test_x)
# train_x=standardization (train_x)
# test_x=standardization (test_x)
# train_x=rescaling (train_x)
# test_x=rescaling (test_x)
N=test_x.shape[1]+1
Theta=np.zeros ((n,1))
# Theta=np.random.rand (n,1) #随机构造1 matrix of *n
Theta_new=train_model (train_x,train_y,theta,learning_rate=0.001,iteration=1000) #用rescaling的时候错误率0.017
Predict (test_x, test_y, theta_new)
Results Display and Analysis:
The experiment of the perceptual machine classifying breast cancer data set
The number of iterations of the experiment was 1000, the learning rate was 0.001, and the method effect of feature scaling was compared, as shown in table 3.
Table 3 Relationship between the classification error rate and the method of feature scaling
Feature Scaling |
Standardization |
Rescaling |
Classification Error Rate |
0.182 |
0.017 |
The J of standardization standardized method as the number of iterations changes as shown in Figure 1:
Figure 1 The Standardized method
The J of the rescaling scaling method as the number of iterations changes as shown in Figure 2:
Figure 2 Adjusting the scale method
Figure 1 shows that the number of iterations and the learning rate parameters are not adjusted to make the effect of better values; Figure 2 has a good effect, the loss function is gradually reduced and tends to smooth, compared with the two methods, the relative adjustment of the proportion of the method is better than the standardized method.
Python perceptron classification breast cancer data set