Pytorch is a python-based deep learning library. Pytorch Source Library of the level of abstraction is small, clear structure, the code is moderate. Compared to very engineered tensorflow,pytorch is an easy-to-start, great deep learning framework.
For the system learning Pytorch, the official provides a very good introductory tutorial, but also provides an example for deep learning, while enthusiastic netizens to share a more concise example. 1. Overview
Different from low-level libraries such as theano,tensorflow, or Keras, sonnet and other high-rise Wrapper,pytorch is a self-made system of deep learning Library (Figure 1).
Figure 1: Comparison of several deep learning library
As shown in Figure 2, pytorch from the lower to the upper layer mainly has three blocks of function.
Figure 2. Pytorch Main Function Module 1.1 tensor Computing engine (tensor computation)
The Tensor compute engine, similar to numpy and Matlab, is a basic object Tensor (an array of ndarray or matlab in the analogy numpy). In addition to providing the implementation of common CPU-based operations, Pytorch also provides an efficient GPU implementation, which is critical for deep learning. 1.2 automatic derivation mechanism (AUTOGRAD)
Since deep learning models are becoming more complex, support for automatic derivation is essential for learning frameworks. Pytorch uses a dynamic derivation mechanism, and a framework using a similar approach includes: Chainer,dynet. In contrast, Theano,tensorflow adopts static automatic derivation mechanism. 1.3 High-level library of Neural Networks (NN)
The Pytorch also provides a high-level neural network module. For common network structure, such as full connection, convolution, RNN and so on. At the same time, Pytorch also provides common objective functions, optimizer and parameter initialization methods.
Here, we focus on how to customize the neural network structure. 2. Custom Module
Figure 3. Pytorch Module
Module is the basic way of pytorch tissue neural network. The module contains the parameters of the model and the calculation logic. function carries the actual functions and defines the calculation logic for forward and back.
Module is the base class for any neural network, and all models in Pytorch must be subclasses of Module. The Module can be nested to form a tree structure. A module can complete nesting by making other module properties.
Note: True to the present (04/2018), Pytorch This part of the interface is not stable, the following explanation has been inconsistent with the latest version, or even incorrect. Before the final stability of the interface, the content is no longer updated, please refer directly to Pytorch's latest source code.
The following is an example of the simplest MLP network structure, which describes how to implement a custom network structure. The complete code can be found in repo. 2.1 Function
Note: To support the derivatives number (i.e. gradient gradient), pytorch 0.2 revenue new definition Function mechanism. If the higher order is not considered, the old method is still work.
Function is the core class of Pytorch automatic derivation mechanism. Function is no parameter or stateless, it is only responsible for receiving input, return the corresponding output; for the reverse, it receives the corresponding gradient of the output, and returns the corresponding gradient of the input.
Here we only focus on how to customize Function. The definition of Function is shown in source code. The following is a simplified code snippet:
Class Function (object):
def forward (self, *input):
raise Notimplementederror
def backward (self, *grad_ Output):
raise Notimplementederror
both the input and output of the forward and backward are Tensor objects
The Function object is callable, that is, it can be called by means of (). Both the input and output of the call are Variable objects. The following code example implements a ReLU activation function and makes a call:
Import Torch from
Torch.autograd import Function
class Reluf (function):
def forward (self, input):
Self.save_for_backward (input)
output = Input.clamp (min=0)
return output
def backward (self, output_grad) :
Input, = self.saved_tensors
Input_grad = Output_grad.clone ()
input_grad[input < 0] = 0
return Input_grad
# Test
if __name__ = = "__main__": From
Torch.autograd import Variable
Torch.manual_ Seed (1111)
a = Torch.randn (2, 3)
va = Variable (A, requires_grad=true)
vb = Reluf () (VA)
Print Va.data, Vb.data
vb.backward (Torch.ones (Va.size ()))
print Vb.grad.data, va.grad.data
If you need to use forward input in backward, you need to explicitly save the required input in forward. In the code above, forward uses the Self.save_for_backward function to temporarily save the input and use saved_tensors in backward (Python tuple objects) are removed.
Obviously, the input of the forward should correspond to the input of the backward, and the output of the forward should match the input of the backward.
Because function may require the staging of input tensor, it is recommended that you no longer use a function object to avoid the problem of premature memory release. As shown in the sample code, each call to forward regenerates a Reluf object and cannot be called repeatedly in forward when it is initialized. 2.2 Module
Similar to the Function,module object is also callable Yes, the input and output are also Variable. The difference is that the Module is [can] have parameters. The Module contains two main parts: Parameters and Calculation logic (Function calls). Since the ReLU activation function has no parameters, here's an example of how to customize the Module with the most basic fully connected layer.
The operational logic for the fully connected layer defines the following Function:
import torch from Torch.autograd Import function Class Linearf (function): Def forward (self , input, Weight, Bias=none): Self.save_for_backward (input, weight, bias) output = torch.mm (input, weigh
T.T ()) If bias is not none:output + = Bias.unsqueeze (0). Expand_as (output) return output def backward (self, grad_output): input, weight, bias = self.saved_tensors Grad_input = grad_weight
= Grad_bias = None if self.needs_input_grad[0]: Grad_input = torch.mm (grad_output, weight) If self.needs_input_grad[1]: Grad_weight = torch.mm (grad_output.t (), input) If bias is not None and
SELF.NEEDS_INPUT_GRAD[2]: Grad_bias = grad_output.sum (0). Squeeze (0) If bias is not None: Return grad_input, Grad_weight, Grad_bias else:return grad_input, Grad_weight
The Needs_input_grad is a tuple of type bool with the same length as the forward parameter, which is used to identify whether the input is a computed gradient, and to reduce unnecessary calculations for input without gradients.
The Function (here is Linearf) defines the basic computational logic, which only needs to allocate memory space for the parameter at initialization time and, when evaluated, passes the parameter to the corresponding Function object. The code is as follows:
Import Torch
import torch.nn as nn
class Linear (NN. Module):
def __init__ (self, in_features, Out_features, bias=true):
super (Linear, self). __init__ ()
Self.in_features = in_features
self.out_features = out_features
self.weight = nn. Parameter (Torch. Tensor (Out_features, in_features))
if bias:
Self.bias = nn. Parameter (Torch. Tensor (out_features))
else:
self.register_parameter (' bias ', None)
def forward (self, input):
Return Linearf () (Input, self.weight, Self.bias)
It is important to note that the parameter is memory space maintained by the tensor object, but tensor needs to be wrapped as a parameter object. Parameter is a special subclass of Variable, only the difference is Parameter default Requires_grad is True. Varaible is the core class of the automatic derivation mechanism, which is not covered here, see Tutorial. 3. Custom Loop Neural Network (RNN)
We try to define a more complex module--rnn ourselves. Here, we only define the most basic vanilla RNN (Figure 4), the basic calculation formula is as follows:
Ht=relu (w⋅x+u⋅ht−1) H t = r e L U (w⋅x + u⋅h t−1) h_t = Relu (W \cdot x + U \cdot h_{t-1})
Figure 4. RNN "Source"
The implementation of more complex LSTM, GRU, or other variants is very similar. 3.1 Defining the Cell
Import Torch from
torch.nn import Module, Parameter
class Rnncell (Module):
def __init__ (self, input_size, hidden_size):
super (Rnncell, self). __init__ ()
self.input_size = input_size
self.hidden_size = Hidden_ Size
Self.weight_ih = Parameter (torch. Tensor (Hidden_size, input_size))
self.weight_hh = Parameter (torch. Tensor (Hidden_size, hidden_size))
Self.bias_ih = Parameter (torch. Tensor (hidden_size))
self.bias_hh = Parameter (torch. Tensor (hidden_size))
self.reset_parameters ()
def reset_parameters (self):
STDV = 1.0/math.sqrt ( self.hidden_size) for
weight in self.parameters ():
weight.data.uniform_ (-STDV, STDV)
def forward ( Self, input, h):
output = Linearf () (Input, Self.weight_ih, Self.bias_ih) + Linearf () (H, self.weight_hh, Self.bias_ HH)
output = Reluf () (output)
return output
3.2 Defining the complete RNN
Import Torch from
torch.nn import Module
class RNN (moudule):
def __init__ (self, Input_size, hidden_size): C3/>super (RNN, self). __init__ ()
self.input_size = input_size
self.hidden_size = hidden_size
Sef.cell = Rnncell (Input_size, hidden_size)
def forward (self, inputs, initial_state):
time_steps = inputs.size (1)
state = initial_state
outputs = []
for T in range (TIME_STEPS): state
= Self.cell (inputs[:, T,:], State)
outputs.append (state)
return outputs
The complete code to run is shown in repo. discussion
The Module structure of Pytorch is inherited from Torch, which is also referenced by Keras (functional API). In some [early] deep learning frameworks such as Caffe, the network is composed of several layers, which are made up of different topologies. and in (PYT) torch there is no layer and network is the distinction, everything is callable Module. The input and output of the Module's invocation are tensor (encapsulated by the Variable), and the user can construct arbitrarily directed acyclic network structures (dags) very naturally.
At the same time, Pytorch's autograd mechanism encapsulation is relatively shallow, can be relatively easy to customize the reverse transfer or modify the gradient. This is very important for some algorithms.
In summary, Pytorch is a very elegant deep learning framework for custom algorithms only.