Neural network One: Introduction, example, code

Source: Internet
Author: User

The basic overview of neural networks and neural network models are not carefully introduced here. A detailed introduction to the introduction of the neural network and its model is presented in the details of Daniel Ng, Stanford University. This paper mainly introduces the concrete derivation of the reverse conduction algorithm (backpropagation algorithm), and the simple example and related Python code of the Neural network. the derivation process of a reverse conduction algorithm

Inspired by neural networks in the human brain, there have been many different versions of history, the most famous of which are the backpropagation of the 1980. The backpropagation is used on multilayer forward neural networks.

Let's say we have a fixed sample set, which contains a sample. We can use the batch gradient descent method to solve the neural network. Specifically, for a single sample, the cost function is: where Y is the true label value, h is the predictive value. (Note: Here the relevant variable name is based on the neural network overview and the variables in the model)

Given a dataset containing a sample, we can define the overall cost function as:

The first item in the above formula is a mean variance term. The second term is a regularization term, which aims to reduce the magnitude of the weight and prevent the fitting from being fitted.

Each iteration of the gradient descent method is updated with the following formula:

Which is the learning rate. One of the key steps is to compute the partial derivative. Now let's talk about the reverse propagation algorithm, which is an effective way to compute the partial derivative.


First of all, how to use the reverse propagation algorithm to calculate and, these two are a single sample of the partial derivative of the cost function. Once we derive the partial derivative, we can deduce the partial derivative of the total cost function:




The idea of the reverse propagation algorithm is as follows: Given a sample, we first conduct the "forward conduction" operation to calculate all the activation values in the network, including the output values. Then, for each node in the layer, we calculate the deviation, which indicates how much the node has affected the residual error of the final output value. For the final output node, we can directly calculate the network generated by the difference between the activation value and the actual value, we define this gap (layer represents the output layer). What do we do with hidden cells? We will compute the weighted average of the residuals based on the node (translator Note: Layer node), which is used as input. See here, the first question is what the residuals are, how to get residuals. The residuals are actually partial derivatives of Z. For a neuron, which is connected to many neurons on the upper layer, the output of these neurons passes a weighted, and then the result of the addition is z, which means that Z is the true input of the neuron, and the residuals represent the bias of the final cost function on the input of neurons in the network. Residuals reflect the contribution to the cost of sensitivity, for a large residual, slightly to the point of input, will not, resulting in the final loss is very large. Z is a function of the weight w, so it can be passed to the contribution sensitivity of W to the cost function according to the chain rule.

The details of the reverse conduction algorithm are given below: The feedforward conduction calculation is performed, and the forward conduction formula is used to obtain the activation value of the output layer. For each output unit of the layer (output layer), we calculate the residuals according to the following formula: The final purpose is to solve and, in this case, the solution variable is converted, and finally according to the bias formula: to calculate the result, for B, see later. For each layer, the residual calculation method for the first node of the first layer is as follows: {Translator Note: The relationship between the above and the relation, can be obtained: the above successive derivation from the forward process is the "reverse conduction" of the original intention. To calculate the partial derivative we need, the calculation method is as follows: According to the formula, where z = wa+b, get: If the choice, that is, the sigmoid function, then its derivative is; If you choose the Tanh function, its derivative is.

Then, the reverse propagation algorithm can be expressed as follows: The feedforward conduction calculation, using the forward conduction formula, to obtain the activation value until the output layer. For the output layer (layer), calculate: For each layer, calculate: Calculate the final derivative value required: 5. Last update:

Update termination conditions are: 1: The weight of the update is below a certain threshold, 2: The predicted error rate is lower than a certain threshold, 3: To achieve a certain number of cycles, exit the loop

Two simple examples
As shown in the figure, this is an example of a neural network model and a specific number of values in which the value of the weight w is randomly initialized (not equal). According to the formula

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.