Reprinted from: Click to open link
1. Install ganglia, where the 3.1* version is installed, because the module that monitors the GPU only supports the 3.1* version series
Apt-get Install ganglia*
2. Download and install the PYNVML and nvml modules, download the address Https://github.com/ganglia/gmond_python_modules/tree/master/gpu
Install PYNVML, the installation documentation requires Python 2.5 or earlier, in fact, the system comes with Python version 2.7.3 can be compiled, there is no need to change the Python environment
CD ~/nvidia/nvidia-ml-py-*
Installing NVML
2.1 Copy the Python module to the Ganglia module directory
Mkdir/usr/lib/ganglia/python_modules
CP python_modules/*/usr/lib/ganglia/python_modules
2.2 Copying configuration files and front-end graphics presentation files to ganglia related directories
MKDIR/ETC/GANGLIA/CONF.D
CP conf.d/*/ETC/GANGLIA/CONF.D
CP graph.d/*/usr/share/ganglia-webfrontend/ graph.d/
2.3 Patching a Web page
CP ganglia_web.patch/usr/share/ganglia-webfrontend/
CP ganglia_web.patch/usr/share/ganglia-webfrontend/ templates/default/
cd/usr/share/ganglia-webfrontend/
cp host_view.php host_view.php.bak
Patch < Ganglia_web.patch
cd/usr/share/ganglia-webfrontend/templates/default/
CP host_view.tpl Host_view.tpl.bak
3. Copy the server-side/etc/ganglia/gmond.conf file to the client/etc/ganglia/and create a new modpython.conf file in the CONF.D directory, as follows
Modules {
Module {
name = "Python_module"
path = "/usr/lib/ganglia/modpython.so"
params = "/usr/lib/ Ganglia/python_modules "
}
}
4. Start the service
Service Ganglia-monitor Start
5. Note: The above is the client configuration, the server-side configuration to go through the above steps, just need to open a few services, if the GPU is not visible to monitor the graphics, please run the following command
python/usr/lib/ganglia/python_moudles/nvidia.py
Service Ganglia-monitor Restart
6. The script of the above steps is placed on the server side, the client is deployed in bulk, the script is as follows
#!/bin/bash
CD ~
wget 192.168.87.102/nvidia.zip
unzip nvidia.zip
CP ~/nvidia/sources.list/etc/apt/
Apt-get Update
apt-get-y install ganglia*
cd ~/nvidia/nvidia-ml-py-*
python setup.py install
CP ~/nvidia/ graph.d/*/usr/share/ganglia-webfrontend/graph.d/
CP ~/nvidia/host_view.php/usr/share/ganglia-webfrontend/
CP ~/nvidia/host_view.tpl/usr/share/ganglia-webfrontend/templates/default/
mkdir/usr/lib/ganglia/ Python_modules
cp ~/nvidia/python_modules/*/usr/lib/ganglia/python_modules
MKDIR/ETC/GANGLIA/CONF.D
CP ~/nvidia/conf.d/*/ETC/GANGLIA/CONF.D
CP ~/nvidia/gmond.conf/etc/ganglia/
Service Ganglia-monitor Restart
rm-rf ~/nvidia*
rm-rf ~/gpu*
Execute script
wget 192.168.87.102/gpu.sh && chmod +x gpu.sh && sh gpu.sh
The following figure is the detailed interface of the monitored GPU server