The default compilation installation in Caffe uses the Atlas library, but this version of Blas does not utilize multi-core CPUs, and Openblas is required to accelerate caffe using multi-core parallel computing. Let's talk about how to use Openblas.
After the default compilation of Caffe, we see a single-threaded version of Openblas using the "ldd Build/tools/caffe" command, as follows:
$ LDD Build/tools/caffe | grep openblas
libopenblas.so.0 =>/lib64/libopenblas.so.0 (0x00007f1fe656f000)
If you want to use a multithreaded version of Openblas, you should see a result similar to the following, where the last "P" of the so file is a multithreaded version.
$ LDD Build/tools/caffe | grep openblas
libopenblasp.so.0 =>/lib64/libopenblasp.so.0 (0x00007f0854b90000)
Let's see how we can compile caffe using multi-threaded version Openblas. compiling
First, modify the "Makefile.config" file, which
BLAS: = Atlas
instead
BLAS: = Open
Modify the Blas_include and Blas_lib parameters at the same time, as follows:
Blas_include: =/usr/include/openblas
blas_lib: =/usr/lib64/libopenblasp.so
Then, modify the "Makefile" file, which
Libraries + + Openblas
changed to
libraries + = openblasp
After modifying the top two files, recompile the Caffe
Make-clean-make-all-make test-make
runtest
Once the compilation is complete, use LDD to check the Caffe file, and you can see the Openblas already using the multi-threaded version, as follows:
$ LDD Build/tools/caffe | grep openblas
libopenblasp.so.0 =>/lib64/libopenblasp.so.0 (0x00007f0854b90000)
Test
We ran a training model to verify that, to let Caffe use the specified number of CPUs, we can set the environment variable openblas_num_threads to implement. As follows:
$ Export openblas_num_threads=2
Then we need to download the training data and run the following command in the root directory of Caffe to prepare the data.
$./data/mnist/get_mnist.sh
$/examples/mnist/create_mnist.sh
Modify the configuration so that it runs the training model using CPU mode and edits the Examples/mnist/lenet_solver.prototxt file.
Modify the
Solver_mode:gpu
to
solver_mode:cpu
Running Training model
$./examples/mnist/train_lenet.sh
After the training program starts, use the top command to observe the CPU usage of the process, because the above I set the openblas_num_threads=2, so this time the process CPU utilization is about 200%.