Pytorch (vi)--understanding of gradient reverse transfer (backpropogate)

Source: Internet
Author: User
Tags alternation generator pytorch generative adversarial networks

Directory Connections
(1) Data processing
(2) Build and customize the network
(3) Test your pictures with a well-trained model
(4) Processing of video data
(5) Pytorch source code modification to increase the CONVLSTM layer
(6) Understanding of gradient reverse transfer (backpropogate)
(total) Pytorch encounters fascinating bug Pytorch learn and use (vi) Multiple network alternation situations

Recently, using Pytorch to build a confrontation network, because of the two network of Confrontation network, its parameter update also involves the alternation of two networks. This is simple: the generator (generator) generates new data, and the discrimator is used to determine whether the data is real or generated. By training the discerning device, the detector can accurately distinguish the authenticity of the data, and the detector cannot distinguish the authenticity by training the generator. See generative adversarial Networks.

During training, update the selector before updating the generator:

Pytorch Gradient Transfer

In Pytorch, the data type of the incoming network calculation must be the variable type, variable wraps a tensor, and holds the gradient and creates a reference to the variablefunction, in other words, records the gradient and network diagram for each layer of the network, The reverse transfer of the gradient can be achieved, the network diagram can be expressed as follows (figure from deep learning with pytorch:a Minute Blitz):

Then, according to the last loss, the gradient of each layer can be obtained recursively, and the weight update is realized.

There are three main steps to achieving gradient reverse transfer: Initializing gradient values: Net.zero_grad () Inverse solution gradient: loss.backward () update parameter: Optimizer.step ()

Note: For an input inputs, after the network calculation to get output, in the calculation of gradient is the output–>input recursive process, after the recursive completion of the graph will release the cache of the graph, so the second time to use the Outout gradient calculation will appear wrong, as follows:

Runtimeerror:trying to backward through the graph second time, but the buffers has already been freed. Please specify retain_variables=true if calling backward for the first time.

Now look at the graph of the above GAN Network update weights, in 1 you need to use real data and generate data to update the detector (discriminator), but the generated data is generated by the generator (generator), in the incoming to the detector to calculate, Therefore, when the gradient inversion is computed, the gradient of the generated network is computed and the cache of the network recursive graph is released, and an error occurs when the generator is updated in 2.

It is not necessary to calculate the gradient of the generator in 1, so the Gendata.detach () is used as the input data when calculating the detector using the generated data, so the current graph is split and a new variable variable is obtained. Network Test

Build two Networks A and B, first use the results of a network to calculate the B network, and then update the B network, and finally update a network, which is similar to the anti-network, b network needs to use the results of a network calculation, if you update the B network, even with a network gradient will be calculated, when the last update of the B network, then The code is as follows:

Import Torch Import torch.nn as nn from Torch.autograd import Variable import Torch.optim as Optim class A (NN. Module): def __init__ (self): Super (A, self). __init__ () SELF.FC = nn. Linear (1, ten) def forward (self, x): Return SELF.FC (X) class B (NN. Module): def __init__ (self): Super (B, self). __init__ () SELF.FC = nn. Linear (+) def forward (self, x): Return SELF.FC (x) a_net = a () B_net = B () criterion = nn. Mseloss () optimizer_a = Optim. Adam (A_net.parameters (), 0.1) Optimizer_b = Optim. Adam (B_net.parameters (), 0.1) input = torch. Floattensor (2,1) label = Torch. Floattensor (2, ten). Fill_ (1) label2 = torch. Floattensor (2). Fill_ (1) input = Variable (input) label = Variable (label) Label2 = Variable (label2) # update B net B_ne T.zero_grad () output1 = a_net (input) Loss1 = criterion (output1, label) Output2 = B_net (Output1.detach ()) Loss2 = Criterio N (output2, Label2) Loss2.backward () Optimizer_b.step () # Updata A Net A_net.zero_grad () Output3 = B_net (output1) Loss3 = criterion (output3, Label2) Loss3.backward () Optimizer_a.step () 

If Output2 = B_net (Output1.detach ()) is changed to Output2 = B_net (OUTPUT1), the following error occurs:

Therefore, the inverse solution gradient is based on the loss value of the output recursive to all networks to calculate, in the control network update gradient need to pay attention to control the incoming variable.

* * Note: When using Loss3.backward (), since LOSS3 is passed through A and B networks, the gradients of A and B networks are computed, using Optimizer_a.step () only with the gradient of the new a network. If you want to save the post-forward topology diagram, use Loss.backward (retain_graph) when using the same network for multiple gradients and custom weight initialization When the gradient is evaluated multiple times on the same network, the net final gradient is the sum of all gradients. You can use Torch.nn.Moduleapply () to initialize the weights.

Above 2 points see Pytorch (total)--pytorch encounter fascinating bug and record

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.