In peacetime research, hope every night idle down when, all learn a machine learning algorithm, today see a few good genetic algorithm articles, summed up here.
1 Neural network Fundamentals
Figure 1. Artificial neural element model
The X1~XN is an input signal from other neurons, wij represents the connection weights from neuron j to neuron I,θ represents a threshold (threshold), or is called bias (bias). The relationship between the output of the neuron I and the input is expressed as:
Yi represents the output of neuron I, the function f is called the activation function (Activation functions) or the transfer function (Transfer functions), net is called net activation (NET Activation). If the threshold is regarded as the weight wi0 of one of the input x0 of neuron I, the above formula can be simplified to:
If the input vector is represented by x, the weight vector is represented by W:
X = [x0, x1, x2, ..., xn]
The output of the neuron can be expressed in the form of a vector multiplication:
If the net activation net of the neuron is positive, the neuron is said to be active or excited (fire), and if net activation net is negative, the neuron is suppressed.
This "threshold weighted sum" neuron model in Figure 1 is called the m-p model (Mcculloch-pitts models), also known as a processing unit of the Neural network (PE, processing Element).
2. Common activation functions
The selection of activation function is an important link in the process of constructing neural network, the following is a brief introduction to the commonly used activation functions.
(1) linear functions (Liner function)
(2) bevel functions (Ramp function)
(3) threshold Functions (Threshold function)
The above 3 activation functions are linear functions, and the following is a description of two commonly used nonlinear activation functions.
(4) S - shape Functions (Sigmoid function)
The function's Guide function:
(5) bipolar S - shape function
The function's Guide function:
The image of the S-shape function and the bipolar S-shape function is as follows:
Figure 3. S-shape function and bipolar s-shape function image
The main difference between the S-shape function and the S-shape function is the value range of the function, the value of the bipolar S-shape function is ( -1,1), and the S-shape function domain is (0,1).
Because the S-shape function and the bipolar S-shape function are both conductive (the derivative function is a continuous function), it is suitable for use in the BP neural network. (BP algorithm requires activation function to be guided)
3. Neural network model
Neural networks are networks that are interconnected by a large number of neurons. According to the interconnection of neurons in the network, the common network structure can be divided into the following 3 categories:
(1) feedforward Neural network ( feedforward neural Networks)
Feedforward networks are also referred to as forward networks. This kind of network only has the feedback signal in the training process, but in the classification process the data can only forward, until arrives the output layer, the layer does not have the backward feedback signal, therefore is called the Feedforward network. The perceptual Machine (perceptron) and BP neural network belong to Feedforward network.
Figure 4 is a 3-layer feedforward neural network, where the first layer is the input unit, the second layer is called the hidden layer, the third layer is called the output layer (the input unit is not a neuron, so the figure has 2 layers of neurons).
Figure 4. Feedforward Neural Networks
For a 3-layer feedforward neural network n, if x is the input vector of the network, W1~W3 represents the connection weight vector of each layer of the network, and F1~F3 represents the activation function of the 3 layer of the neural network.
Then the output of the first neuron of the neural network is:
O1 = F1 (XW1)
The output from the second layer is:
O2 = F2 (F1 (XW1) W2)
The output of the output layer is:
O3 = F3 (F2 (F1 (XW1) W2) W3)
If the activation function F1~F3 selects the linear function, then the output O3 of the neural network will be the linear function of the input x. Therefore, to make the approximation of the higher function, the appropriate nonlinear function should be chosen as the activation function.
(2) feedback Neural network ( Feedback neural Networks)
Feedback neural network is a kind of neural network which has feedback connection from output to input, its structure is much more complex than Feedforward network. Typical feedback neural networks are: Elman networks and Hopfield networks.
Figure 5. Feedback Neural Network
(3) self-Organizing Network (SOM, self-organizing neural Networks)
Self-organizing neural network is a non-tutor learning network. It automatically changes the parameters and structure of the network by self-organizing and adaptively searching for the intrinsic laws and intrinsic properties of the samples.
Figure 6. Self-Organizing Network
4. How neural networks work
The operation process of neural network is divided into two states: learning and working.
(1) learning State of neural networks
Network learning mainly refers to the use of learning algorithms to adjust the connection between neurons, so that the network output more in line with the actual. The learning algorithms are divided into two categories: Tutor Learning (supervised learning) and non-tutor learning (unsupervised learning) .
a mentor Learning algorithm feeds a set of training sets (training set) into the network, adjusting connection rights based on the difference between the actual output of the network and the expected output. The main steps to having a mentor learning algorithm include:
1) Take a sample from the sample set (AI,BI);
2) Calculate the actual output of the network o;
3) seeking d=bi-o;
4) According to D adjustment weight matrix W;
5) Repeat the process for each sample until the error does not exceed the specified range for the entire sample set.
BP algorithm is a kind of excellent learning algorithm with tutor.
No tutor learns the statistical characteristics contained in the collection of samples and is stored in the network in the form of connection rights between neurons.
Hebb Learning Law is a classical non-tutor learning algorithm.
(2) working state of Neural Networks
The connection right between neurons is constant, and neural networks are used as classifiers and predictors.
The following is a brief introduction to the Hebb learning rate and Delta learning rules.
(3) no tutor learning algorithm:Hebb Learning rate
The core idea of the Hebb algorithm is that when two neurons are in a state of excitation, the connection between the two is strengthened, otherwise it is weakened.
In order to understand the Hebb algorithm, it is necessary to introduce the reflex test briefly. Pavlov's reflex test: Every time a dog is given a bell before it is fed, the dog will link the bell to the food. The dog will drool if it rings but does not give food.
Inspired by the experiment, Hebb's theory argues that the link between neurons excited at the same time is enhanced. For example, when a neuron is excited when the bell rings, and at the same time the presence of the food stimulates another nearby neuron, the connection between the two neurons is intensified, thus remembering that there is a connection between the two things. Conversely, if two neurons are not always excited synchronously, the connection between them will be weaker.
Hebb Learning Law can be expressed as:
where Wij represents the connection right of neuron j to neuron I, Yi and YJ are the outputs of two neurons, and a is a constant for learning speed. If Yi and yj are simultaneously activated, that is, Yi and yj are both positive, then the wij will increase. If Yi is activated and the YJ is suppressed, i.e. Yi is positive yj is negative, then Wij will become smaller.
(4) Tutorial Learning algorithm:Delta Learning rules
Delta Learning Rules is a simple tutor learning algorithm, which adjusts the connection right according to the actual output of the neuron and the desired output, and its mathematical representation is as follows:
Wherein WIJ represents the connection right of the neuron J to neuron I, DI is the desired output of neuron I, Yi is the actual output of neuron I, XJ denotes the neuron J state, if the neuron J is active, XJ is 1, and if it is suppressed, XJ is 0 or 1 (depending on the activation function). A is a constant that represents the speed of learning. Assuming that Xi is 1, if di is greater than Yi, then wij will increase, if di is smaller than Yi, then Wij will be smaller.
The delta rule simply says that if the actual output of the neuron is larger than the desired output, the weights of all connections that are positive are reduced, and the weights of all connections with negative inputs are increased. Conversely, if the actual output of the neuron is smaller than expected output, increase the weight of all connections with positive input, and reduce the weight of all connections with negative input. The magnitude of this increase or decrease is calculated based on the above equation.
(5) Learning algorithms with mentors: BP algorithm
Feedforward Neural Networks using BP learning algorithms are often referred to as BP networks.
Figure 8. Three-layer BP neural network structure
BP Network has strong nonlinear mapping ability, and a 3-layer BP neural network can achieve approximation to any nonlinear function (according to Kolrnogorov theorem). A typical 3-layer BP neural network model is shown in 7.
The learning algorithm of BP network occupies a large space, and I intend to introduce it in the next article.
5 Example
Hebb learning rules represent a purely forward unsupervised learning. Here is a simple example to illustrate the Hebb learning of binary and continuous activation functions with simple networks.
assume that the network has the following initial weight vectors as shown. Initial weight vector
W1 = [1,-1, 0, 0.5]tinput
X = [x1, x2, x3, x4]tthe training set uses the following three input vectors
x1 = [1,-2, 1.5, 0]t;
x2 = [1, -0.5,-2, -1.5]t;
x3 = [0, 1,-1, 1.5]t
The learning constants are set here as η= 1. Because the initial weight has a non-0 value, this means that the network has been significantly trained in advance. Here we use two-machine binary neurons,
Then f (net) = SGN (net).
The learning process has the following steps:
The first step added to the network input X1 generates the following net1
Net1 = (W1) TX1 = [1,-1, 0, 0.5]*[1,-2, 1.5, 0]t = 3
The right to update is
W2 = w1 + sgn (NET1)x1 = w1 + x1 = [1,-1, 0, 0.5]t + [1,-2, 1.5, 0]t = [2,- 3, 1.5, 0.5]t
Where the subscript to the right of the expression indicates the current number of adjustment steps.
The second step of this study is to use X2 as input, repeating the steps of the first step W3 = [1, -2.5, 3.5, 2]
The third step of this study is to use X3 as input and repeat the first step W4 = [1, -3.5, 4.5, 0.5]
From the above, the learning with discrete f (net) and η= 1 is generated by adding the entire input mode vector to the weight vector or expediency minus the entire input mode vector. In the case of continuous f (net), the weight increment/decrease vector proportionally shrinks to the fractional value of the input mode.
Here is an example of a Hebb that has a continuous bipolar activation function f (net), with input X1 and initial weights W1 .
As in the first step, we get the neuron output value and for the Λ=1 update right, compared to the previous case, F (net), now the activation function is the following type F (net) = 2/[1+exp (-λ* NET)]-1
by calculating the available
F (net1) = 0.905 F (net2) = -0.077 F (net3) = 0.932
W2 = [1.905, -2.81, 1.357, 0.5]t w3 = [1.828, -2.772, 1.512, 0.616]t W4 = [1.828, -3.70, 2.44, -0.783]t
By comparing discrete and continuous activation functions, it is visible that for successive activation functions, the weights are tapered, but generally in the same direction.
Article from http://www.cnblogs.com/heaad/archive/2011/03/07/1976443.html
Machine learning: The principle of genetic algorithm and its example analysis