A server is loaded with multiple GPUs, and by default, when a deep learning training task is started, this task fills up almost all of the storage space for each GPU. This results in the fact that a server can only perform a single task, while the task may not require so many resources, which is tantamount to a waste of resources.
The following solutions are available for this issue.
First, directly set the visible GPU
Write a script that sets environment variables
export CUDA_VISIBLE_DEVICES=0python model.py
Second, set the storage limit for each GPU
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
Iii. incremental use of GPU storage
config = tf.ConfigProto()config.gpu_options.allow_growth=Truesess = tf.Session(config=config)
TensorFlow all of the full GPU resources by default