Yesterday (April 25), Facebook launched the Pytorch 0.4.0 version, which has a number of updates and changes, such as support Windows,variable and Tensor merger, etc., please see the article "Pytorch Heavy update."
This article is a migration guide that describes some of the code changes you need to make when migrating from a previous version to a new version:
Tensors/variables Merge
Supports 0-D (scalar) tensor
Discard volatile Flag
Dtypes,devices and Numpy-style Tensor create functions
Write some code that is not dependent on the device
▌ combined Tensor and Variable classes
In the new version, torch.autograd.Variable and torch. Tensor will belong to the same category. More precisely, torch. Tensor can track logs and run like the old version of Variable; The Variable package can still work as before, but the object type returned is torch. Tensor. This means that your code no longer requires a variable wrapper.
change of type () in Tensor
Notice here that the type of the tensor () no longer reflects the data type, but instead isinstance () or X.type () to represent the data type with the following code:
>>> x = Torch. Doubletensor ([1, 1, 1]) >>> print (type (x)) # was torch. Doubletensor "<class ' torch. Tensor ' > ' >>> print (X.type ()) # OK: ' Torch. Doubletensor ' Torch. Doubletensor ' >>> print (isinstance (x, Torch). doubletensor)) # Ok:truetrue
Autograd used to track historical records
As the core flag of the Autograd method, Requires_grad is now an attribute of the Tensors class. Let's see how this change is reflected in the code. Autograd uses the same rules that were previously used for Variable. When any input Tensor in an operation is Require_grad = True, it begins to track history. The code looks like this:
>>> x = torch.ones (1) # Create a tensor with Requires_grad=false (default)
>>> X.requires_grad
False
>>> y = torch.ones (1) # another tensor with Requires_grad=false
>>> z = x + y
>>> # both inputs have requires_grad=false. So does the output
>>> Z.requires_grad
False
>>> # then Autograd won ' t track this computation. Let ' s verify!
>>> Z.backward ()
Runtimeerror:element 0 of tensors does not require grad and does not have a GRAD_FN
>>>
>>> # now create a tensor with requires_grad=true
>>> w = torch.ones (1, requires_grad=true)
>>> W.requires_grad
True
>>> add to the previous result that has Require_grad=false
>>> total = w + Z
>>> # The total sum of now requires grad!
>>> Total.requires_grad
True
>>> # Autograd can compute the gradients as
>>> Total.backward ()
>>> W.grad
Tensor ([1.])
>>> # and no computation are wasted to compute gradients for X, Y and Z, which don ' t require grad
>>> Z.grad = = X.grad = Y.grad = None
True
Requires_grad Operation
In addition to setting the properties directly, you can also use My_tensor.requires_grad_ (Requires_grad = True) To change the flag in situ or, as shown in the example above, to pass it as a parameter at creation time (the default is False), the code is as follows:
>>> existing_tensor.requires_grad_ () >>> existing_tensor.requires_gradtrue>>> my_tensor = Torch.zeros (3, 4, requires_grad=true) >>> my_tensor.requires_gradtrue
about. Data
. Data is the primary way to get the underlying Tensor from Variable. After merging, calling y = X.data still has similar semantics. So y will be a Tensor that shares the same data as x, and Requires_grad = False, which has nothing to do with the computational history of X.
However, in some cases. Data may not be secure. Any changes to X.data are not tracked by Autograd, and if X is needed in the reverse process, the calculated gradient will not be correct. Another safer approach is to use X.detach (), which returns a Tensor that shares data with Requires_grad = False, but if X is needed in the reverse process, Autograd will change it in place.
Some operations of the ▌ 0-D tensor
In previous versions, the index of the Tensor vector (1-D tensor) would return a Python number, but an index of a variable vector would return a vector of size (1,). Similarly, the reduce function has a similar operation, that is, tensor.sum () returns a Python number, but Variable.sum () calls a vector of size (1,).
Fortunately, the new version of Pytorch introduces the appropriate scalar (0-D tensor) support. You can use the Torch.tensor function in the new version to create a scalar (which will be explained in more detail later, and now just think of it as the equivalent of Numpy.array in Pytorch), as follows:
>>> Torch.tensor (3.1416) # Create a scalar directly
Tensor (3.1416)
>>> torch.tensor (3.1416). Size () # scalar is 0-dimensional
Torch. Size ([])
>>> Torch.tensor ([3]). Size () # compare to a vector of size 1
Torch. Size ([1])
>>>
>>> vector = Torch.arange (2, 6) # This is a vector
>>> Vector
Tensor ([2., 3., 4., 5.])
>>> vector.size ()
Torch. Size ([4])
>>> vector[3] # indexing into a vector gives a scalar
Tensor (5.)
>>> Vector[3].item () #. Item () gives the value as a Python number
5.0
>>> mysum = Torch.tensor ([2, 3]). SUM ()
>>> MySum
Tensor (5)
>>> mysum.size ()
Torch. Size ([])
Cumulative loss Function
Consider the Total_loss + = loss.data [0] pattern widely used before the PyTorch0.4.0 version. Loss is a Variable that contains a tensor (1,), but in the newly released version of 0.4.0, Loss is a 0-dimensional scalar. The index of a scalar is meaningless (the current version gives a warning, but a hard error will be given in 0.5.0): Use Loss.item () to get the Python number from the scalar.
It's worth noting that if you fail to convert it to Python numbers in the event of a cumulative loss, the amount of memory usage in your program may increase. This is because the right side of the expression above is a Python floating-point number in the previous version, and now it is a 0-dimensional tensor. As a result, the total loss will accumulate in tensor and its historical gradient, which may require more time to automatically solve the gradient value.
▌ Discard volatile
In the new version, the volatile flag will be discarded and will no longer have any effect. In previous versions, any Variable that involved volatile = True was not tracked by Autograd. This has been replaced by a more flexible set of context managers, including Torch.no_grad (), torch.set_grad_enabled (Grad_mode), and so on. The code is as follows:
>>> x = Torch.zeros (1, requires_grad=true) >>> with Torch.no_grad (): ... y = x * 2>>> Y.requir Es_gradfalse>>>>>> Is_train = false>>> with torch.set_grad_enabled (is_train): ... y = x * 2&G T;>> y.requires_gradfalse>>> torch.set_grad_enabled (True) # This can also is used as a function>>> ; y = x * 2>>> y.requires_gradtrue>>> torch.set_grad_enabled (False) >>> y = x * 2>>> y.re Quires_gradfalse
▌dtypes,devices and NumPy-tensor creation functions
In previous versions of Pytorch, we usually needed to specify data types (such as float vs double), device type (CPU vs Cuda), and layout (dense vs sparse) as "tensor types". For example, Torch.cuda.sparse.DoubleTensor is a double data type for the Tensor class, used on Cuda devices, and has a COO sparse tensor layout.
In the new version, we'll introduce the Torch.dtype,torch.device and Torch.layout classes to better manage these properties through NumPy-style creation functions.
Torch.dtype
The following gives a complete list of available torch.dtypes (data types) and their corresponding tensor types.
Use Torch.set_default_dtype and Torch.get_default_dtype to manipulate the default dtype of the floating-point tensor.
Torch.device
Torch.device contains the device type (' CPU ' or ' cuda ') and an optional device sequence number (ID). It can initialize the selected device by Torch.device (' {device_type} ') or Torch.device (' {device_type}:{device_ordinal} ').
If the device serial number does not exist, the device type is represented with the current device: for example, Torch.device (' Cuda ') is equivalent to Torch.device (' cuda:x '), where X is the result of the Torch.cuda.current_device ().
The device used by the tensor can be obtained by accessing the device property.
Torch.layout
Torch.layout represents the data layout of the tensor. In the new version, torch.strided (dense tensor) and torch.sparse_coo (sparse tensor with COO format) are supported.
The data layout pattern of the tensor can be obtained by accessing the Layout property.
Create a tensor
In the new version, the method that creates the Tensor can also use the Dtype,device,layout and Requires_grad options to specify the desired properties in the returned Tensor. The code is as follows: