1. CPU vs. GPU:
Less CPU cores (few), better at serial tasks. The GPU has a lot of cores (thousands of), each of which is weak and has its own memory (several g), which is ideal for parallel tasks. The most typical application of GPUs is matrix operations.
GPU Programming: 1) Cuda, only in Nvidia, 2) OpenCL similar to Cuda, the advantage is that it can be run on any platform, but relatively slowly. Deep learning can call off-the-shelf libraries without having to write Cuda code on their own.
Using CUDNN is a few times faster than not.
The bottleneck of deep learning may not be the operation of the GPU, while in the GPU and data communication, the solution is: 1) read data into ram;2) with SSD instead of hdd;3, and read the data ahead of time with CPU multithreading.
2. Deep Learning Framework: Caffe (UC Berkeley)/caffe2 (Facebook), Torch (NYU, Facebook)/pytorch (Facebook), Theano (U Montreal)/tensorflow (Google), paddle (Baidu), CNTK (Microsoft), MXNet (Amazon).
The framework is divided into static (TensorFlow, Caffe2) and dynamic (Pytorch). TensorFlow is a very safe choice. Pytorch is best suited for research. TensorFlow and Caffe2 are better suited to actually deploying applications.
CS231N Spring Lecture8 Lecture Notes