Stanford cs231n 2017 newest Course: Li Feifei Detailed framework realization and comparison of depth learning by Zhuzhibosmith June 19, 2017 13:37
Stanford University Course cs231n (convolutional Neural Networks for visual recognition) is widely admired in academia as an important foundation course in depth learning and computer vision. This April, cs231n again, the new cs231n Spring 2017 is still led by Li Feifei, bringing a lot of fresh content. Today, the heart of the machine to share is one of the eighth lecture-deep learning software (Deep Learning Software). The main contents are: CPU and GPU contrast, introduction to depth learning framework, examples of TensorFlow and Pytorch, and comparison of various depth learning frameworks.
One, CPU and GPU
CPU: The number of cores is less;
But each core is faster and more powerful;
More suitable for processing continuity (sequential) tasks.
GPU: The number of cores is more;
But the processing speed of each core is slow;
More appropriate for parallel (parallel) tasks.
A brief introduction of the depth learning framework
Last year we had only the Caffe, Torch, Theano and tensorflow these deep learning frameworks to use; but this year, on this basis, we have added a new Caffe2, Pytorch, TensorFlow, Paddlepaddle, CNDK, mxnet and so on a series of new framework, is called "Blossom." The most commonly used frameworks today are pytorch and TensorFlow, while Caffe and Caffe2 are second.
The key point of the deep learning Framework is:
(1) Easy to build large computer graphics;
(2) It is easy to calculate the gradient in computer graphics;
(3) able to operate efficiently on GPU (CUDNN, CUBLA, etc.)
Three, TensorFlow simple example
Below we will detail a simple example of training a neural network under TensorFlow: To train a two-layer network with random data, and activate the function as Relu.
A. Defining computer graphics
1. Create placeholder for input x, weighting coefficients w1, w2, and objective function Y:
2. Define forward Transmission: This is to calculate Y's predictive value and error loss (loss); In fact there is no computational process-just to create graphics.
3. Tell TensorFlow to calculate the gradient loss for W1 and W2; There is still no computational process--just to create graphics.
B. Run
Now that you have completed the steps to create the graphic, we go to the part of the graph that is being calculated.
Creates a numpy array, which will be filled in the upper placeholder.
To perform an operation on a graph: input x, Y, W1, W2 into the NumPy array, and a w1 array of losses (loss), W2 gradients, and numpy gradients.
Training Network: Repeatedly on the graph operation, using gradient (gradient) to update the weight (weights).
Change the corresponding function of W1 and W2 from placeholder () to Variable ().
Add the Assign action to update W1 and W2 (part of the graph).
Perform an operation on the graph to initialize W1 and W2, and then perform multiple iterations of training.
The complete code is as follows:
But there is a problem: the loss of error (loss) has not declined. This is because the Assign instruction was not actually executed.
Then we need to add the virtual graphics node and tell the graph to compute the virtual node.
You can use optimizer to calculate gradients and update weight coefficients; Remember to perform optimizer output.
Use a predefined common loss function:
Initializes using Xavier, and Tf.layer automatically sets the weighting factor (weight) and the offset (bias).
C. Senior Wrapper--keras
Keras can be understood as a layer at the top of the TensorFlow, which can make some work simpler (and also support Theano backend).
Define the model goals as a series of layer:
To define the optimizer target (Optimizer object):
Create a model that explicitly prescribes the loss function (loss functions):
You can train your model with just one line of code.
In addition to Keras, there are other types of advanced containers (wrapper) available for use:
Iv. Examples of Pytorch
Pytorch is a deep learning framework launched by Facebook, which has been widely used in both industry and academia. It includes three levels of abstract concepts: tensor (Tensor): Imperative Multidimensional array objects (Ndarray), running on the GPU; variables (varaible): Nodes of computed graphics (computational graph) For storing data and gradients (gradient) modules: representing a neural network layer, either storing States (state), or storing the measurable weighting coefficients (learnable weights)
Equivalence relation of abstract concepts in Pytorch and TensorFlow:
The tensor (Tensor) settings in a. Pytorch
The tensor in the Pytorch is like an array in NumPy, but these tensor can be run on the GPU;
Here we set up a two-layer network with a pytorch tensor:
Here's a step-by-step explanation:
1. Create random tensor for data and weights (weights):
2. Set forward Propagation: Calculate forecast value (prediction) and loss (loss):
3. Set reverse propagation: compute gradient (gradients):
4. Gradient descent (gradient descent) and weights (weights) correspond to:
5. To run on the GPU, set the tensor (tensors) to the Cuda data type:
B. autogradient settings in Pytorch
The Pytorch tensor (tensors) and variables (Variables) have the same API for application programming interfaces. Variables (Variables) can remember how they were produced (because of the reverse propagation).
The following step-by-step interpretation is still in progress:
1. We do not want (loss loss) gradients and data to be relevant, but we want gradients and weights (weights) to be correlated. Related Settings as shown:
2. The forward propagation here looks similar to the corresponding version of the tensor (Tensor), but it should be noted that all of these are now variables (variable).
3. Calculate the gradient of loss function to W1 and W2 (gradient set at the beginning 0):
4. Make gradients and weights (weights) correspond to:
C. Defining a new Autograd function
Define your own Autograd functions by the forward and reverse propagation of the tensor:
You can use the new Autograd function in forward propagation:
D. Neural network (NN) settings in Pytorch
A more advanced "container" (wrapper) is used to process neural networks (neural nets), similar to Keras. The complete code is as follows:
The following step-by-step interpretation:
Define our model as a series of layers:
You also define commonly used loss functions:
Forward propagation: Input data to the model; Input predictive information to the loss function (loss function) (prediction):
Reverse propagation: Calculate all Gradients (gradients):
Let the gradient correspond to each model parameter:
Here we add an optimizer (optimizer):
All parameters (parameters) are updated after the gradient is computed:
E. Neural networks in Pytorch--defining a new model
The module in Pytorch is actually a neural network layer (neural net layer), which needs to be aware that its inputs and outputs are variables, and that modules (module) contain weights (treated as variables) or other modules; you can use Autograd To define your own module. The detailed code is as follows:
The following step-by-step interpretation:
1. Define our overall model as a single module:
2. Set up two sub modules with initialization program (a parent module can contain child modules)
3. Define forward propagation with the Autograd OPS on the sub modules and variables, and do not need to define the reverse propagation-because Autograd will handle it accordingly:
4. Create and train a model instance:
E. Data storage in Pytorch (dataloaders)
The Data Memory (Dataloader) includes a dataset, and gives you a small batch processing (minibatching), " Shuffle processing (shuffling) and multithreading (multithreading); When you need to load custom data, write down your own dataset class (DataSet classes).
By traversing the memory (loader) to form a small batch (Minibatch), the memory will give you a tensor (tensors), so you need to "pack" (wrap) into the variable:
Note: It is easier to use a torchvision model with a good (pretrained models).
F. Simple comparisons of Torch and Pytorch
Conclusion: Try to use Pytorch to do your new project.
V. Brief introduction of CAFFE2
Six, the depth of learning framework of the dispute, who is better than one.
In fact, the specific choice of the framework for in-depth study depends on what we do. After referring to the relevant literature, we can generally draw the following conclusions (for reference only): Pytorch and Torch are more suitable for academic studies (research), TENSORFLOW,CAFFE,CAFFE2 is more suitable for industrial production environment deployment (industrial Production). Caffe is suitable for working with static images, Torch and pytorch are more suitable for dynamic images (dynamically graph), and tensorflow is useful in both cases. TensorFlow and Caffe2 can be used on the mobile side.
Attached to the main reference Cs231n_2017_lecture8, links can be downloaded directly ppt:http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Other references: http://203.187.160.132:9011/dl.ee.cuhk.edu.hk/c3pr90ntc0td/slides/tutorial-caffe.pdf http:// 203.187.160.132:9011/dl.ee.cuhk.edu.hk/c3pr90ntc0td/slides/dl_in_action.pdf