Neural Network (optimization algorithm)

Source: Internet
Author: User
Article reproduced from: http://www.52analysis.com/R/1627.html
Neural Network (optimization algorithm)

Artificial neural Network (ANN), referred to as neural network, is a mathematical model or computational model that mimics the structure and function of a biological neural network. Neural networks are computed by a large number of artificial neural connections. In most cases, the artificial neural network can change the internal structure based on the external information, which is an adaptive system. Modern neural network is a non-linear statistical data modeling tool, which is often used to model the complex relationship between input and output, or to explore the data patterns.

Artificial Neural network simulates human intelligence behavior from the following four aspects:

Physical Structure: artificial neurons will simulate the function of biological neuron

Computational Simulations: the neurons of the human brain have functions of local computation and storage, which form a system through connections. Artificial neural network also has a large number of local processing capacity of neurons, can also be a large-scale parallel processing of information

Storage and Operation: both the human brain and the artificial neural network realize memory storage function through the connection strength of neurons, and provide strong support for generalization, analogy and popularization.

Training: like the human brain, artificial neural network will be based on its own structural characteristics, the use of different training, learning process, automatically from practice to obtain relevant knowledge

A neural network is an operational model composed of a large number of nodes (or "neurons", or "units") and interconnected. Each node represents a particular output function called the excitation function. The connection between each two nodes represents a weighted value for the connection signal, which is equivalent to the memory of the artificial neural network. The output of the network depends on the connection mode of the network, the weight value and the excitation function are different. The network itself is usually an approximation of some kind of algorithm or function in nature, and it may be the expression of a logical strategy.




first, the sensor

A perceptron is equivalent to a single layer of a neural network, consisting of a linear assembly and an original binary threshold value:



A single-layer perceptron that forms the Ann system:

The perceptron takes a real value vector as input, calculates the linear combination of these inputs, and if the result is greater than a threshold, outputs 1, otherwise the output ‐1.

The Perceptron function can be written as: sign (w*x) can sometimes add offset B, written as sign (W*X+B)

Learning a perceptron means choosing the right W0,..., wn value. So the candidate space H that the Perceptron learns to consider is the set of all possible real numerical weight vectors.

algorithm Training steps:

1. Define variables and parameters x (input vector), W (weight vector), B (offset), Y (actual output), D (expected output), a (learning rate parameter)

2, initialization, n=0,w=0

3. Input training samples, specify their expected output for each training sample: Class A is 1, Class B is 1.

4. Calculate the actual output y=sign (w*x+b)

5, Update weight vector w (n+1) =w (n) +a[d-y (n)]*x (n), 0

6, the judgment, if satisfies the convergence condition, the algorithm completes, otherwise returns 3

Note that the learning rate a for the stability of the weight should not be too large, in order to reflect the error on the weight of the correction should not be too small, in the final analysis, this is an empirical problem.

From the previous narration, the Perceptron has a certain convergence for the linear and measurable example, and it cannot achieve the correct classification for the irreducible problem. This is very similar to the idea of the support vector machine that we talked about earlier, but the way to classify lines is different. It can be said that, for linear and measurable examples, support vector machines have found the "best" line of classification, and the single layer Perceptron has found a workable line.

We take the iris DataSet for example, since the single layer perceptron is a two classifier, we also divide the iris data into two categories, "Setosa" and "Versicolor" (the latter two categories are regarded as the 2nd category), then the data according to the characteristics: petal length and width do classification.

Run the following code:

#感知器训练结果:

a<-0.2

W<-rep (0,3)

Iris1<-t (As.matrix (Iris[,3:4))

D<-c (Rep (0,50), Rep (1,100))

E<-rep (0,150)

P<-rbind (Rep (1,150), IRIS1)

max<-100000

Eps<-rep (0,100000)

i<-0

repeat{

v<-w%*%p;

Y<-ifelse (sign (v) >=0,1,0);

e<-d-y;

Eps[i+1]<-sum (ABS (E))/length (e)

if (eps[i+1]<0.01) {

Print ("Finish:");

Print (w);

Break

}

w<-w+a* (d-y)%*%t (p);

i<-i+1;

if (I>max) {

Print ("Max time Loop");

Print (Eps[i])

print (y);

Break

}

}

#绘图程序

Plot (Petal.length~petal.width,xlim=c (0,3), Ylim=c (0,8),

data=iris[iris$species== "Virginica",]

data1<-iris[iris$species== "Versicolor",]

Points (data1$petal.width,data1$petal.length,col=2)

data2<-iris[iris$species== "Setosa",]

Points (data2$petal.width,data2$petal.length,col=3)

X<-seq (0,3,0.01)

y<-x* (-w[2]/w[3])-w[1]/w[3]

Lines (x,y,col=4)

#绘制每次迭代的平均绝对误差

Plot (1:i,eps[1:i],type= "O")

The results of the classification are as follows:



This is the result of running 7 times. Compared with the support vector machines in front of us, it is obvious that the single layer Perceptron classification of neural networks is not so credible and somewhat weak.

We can try to do cross-validation and we can see that cross-validation results are not ideal. second, linear neural network

While the training sample is linearly separable, the Perceptron rule can successfully find a weight vector, but it will not converge if the sample is not linearly separable. As a result, people have devised another discipline to overcome this deficiency, called the Delta Law.

If the training sample is not linearly measurable, then the delta rule converges to the best approximation of the target concept.

The key idea of the Delta law is to use gradient descent to search for the hypothetical space of a possible weight vector to find the right vector for the best fitting training sample.

We describe the algorithm as follows:

1, define variables and parameters. X (Input vector), W (weight vector), B (offset), Y (actual output), d (desired output), a (learning rate parameter) (for the description is simple, we can incorporate the bias into the weight vector)

2. Initialize W=0

3, the input sample, calculates the actual output and the error. E (n) =d-x*w (n)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.