ubuntu12.04 GPU monitoring with the NVML module via ganglia

Source: Internet
Author: User
Tags mkdir zip

Reprinted from: Click to open link


1. Install ganglia, where the 3.1* version is installed, because the module that monitors the GPU only supports the 3.1* version series

Apt-get Install ganglia*

2. Download and install the PYNVML and nvml modules, download the address Https://github.com/ganglia/gmond_python_modules/tree/master/gpu

Install PYNVML, the installation documentation requires Python 2.5 or earlier, in fact, the system comes with Python version 2.7.3 can be compiled, there is no need to change the Python environment

CD ~/nvidia/nvidia-ml-py-*

Installing NVML

2.1 Copy the Python module to the Ganglia module directory

Mkdir/usr/lib/ganglia/python_modules
CP python_modules/*/usr/lib/ganglia/python_modules

2.2 Copying configuration files and front-end graphics presentation files to ganglia related directories

MKDIR/ETC/GANGLIA/CONF.D
CP conf.d/*/ETC/GANGLIA/CONF.D
CP graph.d/*/usr/share/ganglia-webfrontend/ graph.d/

2.3 Patching a Web page

CP ganglia_web.patch/usr/share/ganglia-webfrontend/
CP ganglia_web.patch/usr/share/ganglia-webfrontend/ templates/default/
cd/usr/share/ganglia-webfrontend/
cp host_view.php host_view.php.bak
Patch < Ganglia_web.patch 
cd/usr/share/ganglia-webfrontend/templates/default/
CP host_view.tpl Host_view.tpl.bak

3. Copy the server-side/etc/ganglia/gmond.conf file to the client/etc/ganglia/and create a new modpython.conf file in the CONF.D directory, as follows

Modules {
  Module {
    name = "Python_module"
    path = "/usr/lib/ganglia/modpython.so"
    params = "/usr/lib/ Ganglia/python_modules " 
  }

}

4. Start the service

Service Ganglia-monitor Start

5. Note: The above is the client configuration, the server-side configuration to go through the above steps, just need to open a few services, if the GPU is not visible to monitor the graphics, please run the following command

python/usr/lib/ganglia/python_moudles/nvidia.py
Service Ganglia-monitor Restart

6. The script of the above steps is placed on the server side, the client is deployed in bulk, the script is as follows

#!/bin/bash
CD ~ wget 192.168.87.102/nvidia.zip unzip nvidia.zip CP ~/nvidia/sources.list/etc/apt/ Apt-get Update apt-get-y install ganglia* cd ~/nvidia/nvidia-ml-py-* python setup.py install CP ~/nvidia/ graph.d/*/usr/share/ganglia-webfrontend/graph.d/ CP ~/nvidia/host_view.php/usr/share/ganglia-webfrontend/ CP ~/nvidia/host_view.tpl/usr/share/ganglia-webfrontend/templates/default/ mkdir/usr/lib/ganglia/ Python_modules cp ~/nvidia/python_modules/*/usr/lib/ganglia/python_modules MKDIR/ETC/GANGLIA/CONF.D CP ~/nvidia/conf.d/*/ETC/GANGLIA/CONF.D CP ~/nvidia/gmond.conf/etc/ganglia/ Service Ganglia-monitor Restart rm-rf ~/nvidia* rm-rf ~/gpu*

Execute script

wget 192.168.87.102/gpu.sh && chmod +x gpu.sh && sh gpu.sh

The following figure is the detailed interface of the monitored GPU server

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.