The perceptron algorithm of Artificial neural network

Source: Internet
Author: User

Perceptron, as the most basic unit in artificial neural network, has multiple inputs and an output component. Although our goal is to learn a lot of neural network interconnection, but we still need to first study the individual neural unit.

The main flow of the perceptron algorithm:

First get n input, then weighting each input value, then judge the input of the sensor and the most to reach a threshold of V, if reached, then through the sign function output 1, otherwise output-1.

To unify the expression, we set the threshold V above to-w0 and add the variable x0=1 so that the w0x0+w1x1+w2x2+...+wnxn>0 can be used instead of the w1x1+w2x2+...+wnxn>v above. So there are:

From the above formula, we can use Perceptron to classify the power value vectors when they are determined.

So how do we get the weighted value of the Perceptron? This requires different methods depending on whether the training set can be divided:

1. Training set linear time-to-sensor training rule

In order to get the acceptable weights, we usually start with random weights, then use the training set to train weights repeatedly, and finally get the weights that can correctly classify all the samples.

The specific algorithm process is as follows:

A) initialize the weight vector w= (w0,w1,..., wn), assigning a random value to each value of the weight vector.

B) For each training sample, first calculate its predictive output:

C) Use the following formula to modify the weight vector when the predicted value is not equal to the true value:

Meaning of each symbol: represents the learning rate, T represents the target output of the sample, and O represents the Perceptron output.

(D) repeat B) and C) until there is no sample of the wrong score in the training set.

Algorithm Analysis:

If a sample is wrongly divided, if the target output T is-1, the result perceptron o output is 1, at this time in order to let the Perceptron output-1, the WX should be reduced to output-1, and the value of x will only reduce the value of W, then by adding (T-o) x= after the original w can reduce the value of W (t-o <0, x>0).

By gradually adjusting the value of W, the final perceptron will converge to the extent that all training sets can be correctly categorized, but only if the training set is linearly divided. If the training set is linearly non-divided, the above process will not converge, and the infinite loop continues.

2. Training set linear non-tick-and-delta rule (also called increment rule, LMS rule, Adaline rule, Windrow-hoff rule)

Because in the real situation, the training set is not guaranteed to be linearly divided. So how do you train the Perceptron when the training set is linearly non-tick? In this case, we use the Delta rule to find the best approximation to converge to the target,

The key idea of the delta rule is to use gradient descent to search the hypothetical space of a possible weight vector to find the right vector for the best fit sample [1]. Specifically, the loss function is used to move the negative gradient direction of the loss function every time until the loss function obtains the minimum value (minimum). We define the training error function as:

where D: Training set, TD for the target output, OD for the perceptron output.

The stochastic gradient descent algorithm process is as follows:

1) Initialize the weight vector w to take a random value for each value of the weight vector.

2) For each training sample, perform the following actions, respectively:

A) The output of the sample is obtained through the Perceptron O.

B) Modify the weight vector w according to the output of the sensor.

3) Repeat the 2nd step, the algorithm terminates when the error rate of the training sample is less than the set threshold.

The algorithm condition: the error loss function needs the weight vector to be micro, the hypothesis space contains the hypothesis of continuous parameterization.

Possible problems: If the error surface has multiple local minima, it is not guaranteed to achieve the global optimal.

The derivation of the weight vector formula of the 2nd step of the algorithm? Please refer to the derivation of the gradient descent rule in the next section.

Two differences:

1) perceptron training rule and Delta rule (increment rule)

The key difference is that the Perceptron training rule updates the weights according to the error of the threshold output of the perceptron, and the increment rule updates the weights according to the error of the input non-threshold linear combination.

The weights update formula looks the same, but it is different: the O of the Perceptron rule refers to the output of the threshold: and the O in the increment rule is the output of the linear unit:

2) (standard) gradient descent and random gradient descent

Gradient descent every round of the training sample, each sample obtained weight vector difference value to accumulate, and finally the sum of these differences accumulated to the initial weight vector, and the random gradient drop is in each training sample will update the weight, and finally get a loss function of the lower weight vector.

3. Derivation of gradient descent rule

The core of the gradient descent algorithm is to move in the steepest direction of the loss function each time, and the steepest direction is usually the inverse direction of the vector obtained by the loss function to the weight vector.

To calculate the above vectors, we calculate each component individually:

The weight update amount for each time is:

The method of using this method to update weights is called gradient descent, which calculates a sum for all the trained centralization values and then updates the weights. This method updates the weights in turn and requires all training sets to be trained once, so the speed is slow and the efficiency is low.

In order to improve the speed of the snail update, there is a stochastic gradient descent algorithm, the weight vector update formula is:

It will update the weights in each iteration of the sample, and this way the weights can be adjusted more flexibly, so that the weights converge at a faster rate.

Reference documents

[1] Mitchell, machine learning. Electronic version: Http://pan.baidu.com/s/1sjGBlEX

The perceptron algorithm of Artificial neural network

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.