Pytorch Detach and Detach_
Pytorch's Variable object has two methods, detach and Detach_ This article mainly describes the effect of these two methods and what can be done with these two methods. Detach
This method is described in the official documentation. Returns a new Variable that is detached from the current diagram. The returned Variable will never need a gradient if the detach Variable volatile=true, then detach out of the volatile is also true there is a note, namely: return Variable and be Deta The variable of CH points to the same tensor
Import Torch from
torch.nn import init from
torch.autograd import Variable
t1 = torch. Floattensor ([1., 2.])
V1 = Variable (t1)
t2 = torch. Floattensor ([2., 3.])
v2 = Variable (t2)
v3 = v1 + v2
v3_detached = V3.detach ()
v3_detached.data.add_ (t1) # modified v3_detached Variable The value of tensor in
print (v3, v3_detached) # V3 will also change
1 2 3 4 5 6 7 8 9 10 11
# detach Source
def detach (self):
result = Nograd () (self) #-is needed, because it merges version counters
RESULT._GRAD_FN = None
return result
1 2 3) 4 5
Detach_
The official website explains that the Variable is separated from the graph that created it as a leaf node.
From the source code can also be seen this point will Variable GRAD_FN set to None, so that, BP, to this Variable can not find its grad_fn, so it will not be back to BP. Set the Requires_grad to False. This feeling is not necessary, but since the source code is so written, if there is a need for a gradient, you can manually set the Requires_grad to True
# Detach_ Source
def detach_ (self): "" "detaches the Variable from the
graph that created it, making it a
leaf.
"" "
self._grad_fn = None
Self.requires_grad = False
1 2 3 4 5 6 7
What can I do for you?
If we have two network a,ba,b, the two relationships are such y=a (x), Z=b (y) y=a (x), Z=b (y) Now we want to use Z.backward () Z.backward () to calculate the gradient for the BB network parameters, but do not want to ask for the gradient of AA network parameters. We can do this:
# y=a (x), z=b (y) asks for the gradient of the parameter in B, does not ask for the gradient of the parameter in A
# The first method
y = A (x)
z = B (Y.detach ())
Z.backward ()
# The second method
y = A (x)
y.detach_ ()
z = B (y)
Z.backward ()
1 2 3 4 5 6 7 8 9 10 11
In this case, both detach and detach_ can be used. But if you want to use YY to do BP to AA. Then you can only use the first method. Because the second method already gives the output of the AA model to detach (separate).