Sometimes we use other tasks (such as classification) to pre-train the network, then fix the convolutional layer as an image feature extractor, and then use the current task's data to train only the fully connected layer. So pytorch how to fix the bottom only update the upper layer when training. This means that we want to calculate the gradient in reverse propagation, we only want to compute to the topmost convolution layer, for the convolution layer, we do not want to calculate the gradient and gradient to update the parameters.
We know that all the operands in the network are variable objects, and variable has two parameters that can be used for this purpose: Requires_grad and volatile. Requires_grad=false
The default value of the parameter Requires_grad is false when the user manually defines the variable. When the layer in module is defined, the Requires_grad parameter of the related variable is true by default.
In the calculation diagram, if one of the input Requires_grad is true, then the output of Requires_grad is also true. The output Requires_grad is false only if all input Requires_grad are false.
>>>x = Variable (Torch.randn (2, 3), requires_grad=true)
>>>y = Variable (Torch.randn (2, 3), Requires_grad=false)
>>>z = Variable (Torch.randn (2, 3), requires_grad=false)
>>>out1 = X+y
>>>out1.requires_grad
True
>>>out2 = y+z
>>>out2.requires_grad
False
If you want to fix the bottom of the network during training, you can requires_grad the parameters of the corresponding sub-graphs in this network to false. Thus, the gradient of these parameters is not computed in the reverse process:
Model = torchvision.models.resnet18 (pretrained=true) for
param in Model.parameters (): #nn. Module has member function Parameters ()
Param.requires_grad = False
# Replace The last fully-connected layer
# Parameters Of newly constructed modules has requires_grad=true by default
MODEL.FC = nn. Linear #resnet18中有self. FC as the last layer of the forward process.
# Optimize only the classifier
optimizer = Optim. SGD (Model.fc.parameters (), lr=1e-2, momentum=0.9) #optimizer用于更新网络参数, all parameters are updated by default
volatile=true
The parameters of variable volatile=true and requires_grad=false are similar, but the power of volatile is even greater. When there is an input volatile=true, then the output of the volatile=true. Volatile=true is recommended for use in the inference process (testing) of the model, when only the input voliate=true is required to perform the inference with minimal memory and no intermediate state is saved.
>>> regular_input = Variable (Torch.randn (5, 5)) >>> Volatile_input = Variable (Torch.randn (5, 5), volatile=true) >>> model = torchvision.models.resnet18 (pretrained=true) >> > Model (regular_input). Requires_grad #输出的requires_grad应该是True because the requires_grad of the middle-tier variable is true true by default >>
> Model (volatile_input). The Requires_grad of the requires_grad# output is false, because the output of volatile is true (equivalent to Requires_grad is false) false >>> model (volatile_input). Volatile True