forked from thomas-krenn/check_gpu_sensor_v1
-
Notifications
You must be signed in to change notification settings - Fork 0
License
Tontonitch/check_gpu_sensor_v1
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
check_gpu_sensor
=================
check_gpu_sensor: Nagios/Icinga plugin to check GPU sensors via NVML
Copyright (C) 2011-2013 Thomas-Krenn.AG,
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 3 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
details.
You should have received a copy of the GNU General Public License along with
this program; if not, see <http://www.gnu.org/licenses/>.
HOWTO:
------
http://www.thomas-krenn.com/en/wiki/GPU_Sensor_Monitoring_Plugin
Current Updates:
----------------
https://github.com/thomas-krenn/check_gpu_sensor_v1.git
Example Output:
---------------
Warning - Tesla K20 [persistenceMode = Warning]|ECCL2AggSgl=0;1;2;
ECCTexAggSgl=0;1;2; memUtilRate=0 PWRUsage=31.23;150;200; ECCRegAggSgl=0;1;2;
SMClock=705 ECCL1AggSgl=0;1;2; GPUTemperature=38;85;100; memClock=2600
usedMemory=0.24;95;99; fanSpeed=30;80;95; graphicsClock=705 GPUUtilRate=0
ECCMemAggSgl=0;1;2;
Requirements:
-------------
o NVDIA display driver
* Currently the proprietary NVIDIA driver from the official website
(under Ubuntu 10.4, cf. http://www.thomas-krenn.com/de/wiki/CUDA_Installation)
as well as the driver from the repositories (with Ubuntu 11.10 and 12.04)
have been tested successfully.
o NVML perl bindings
* The bindigs are required to access the NVML library via Perl. Fetch
a copy of the bindings from CPAN:
* http://search.cpan.org/~nvbinding/
For more information about the NVML library also visit
http://developer.nvidia.com/nvidia-management-library-nvml.
o Perl
o Required perl modules (should be installed per default under Ubuntu)
* strict
* warnings
* Getopt:Long
* use feature qw(switch)
feature switch is available with Perl 5.10.1. The ePN of icinga 1.6.1 showed up
errors with Switch-Case statements. Therefore it was changed to given-when.
NOT installed per default:
* Config::IniFiles;
On Ubuntu you can use
$ sudo apt-get install libconfig-inifiles-perl
o Nagios or Icinga
Installation hints:
-------------------
On Ubuntu using the driver from the repositories the file "Makefile.PL" in the
nvml package must be modified. Add the parameter '-L/usr/lib/nvidia-current'
or '-L/usr/lib/nvidia-experimental-304' (for experimental drivers) to 'LIBS'
in order to get rid of the warning:
Note (probably harmless): No library found for -lnvidia-ml
when calling 'perl Makefile.PL'.
Notes on sensor warnings:
-------------------------
* clocks_throttle_reason_unknown
This throttle reason can also be reported during PState or clock change
before driver version r19. Please recheck the throttle reasons with
$ nvidia-smi -i 0 -q | grep -i throttle -A 5
Clocks Throttle Reasons
Idle : Not Active
User Defined Clocks : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
* fanSpeed
Currently the sensor "Fan speed" is the speed the fan algorithm tries to
run the fan. Don't rely on this sensor if you want to know if the fan is
really working properly.
Configuration:
--------------
* Querying only specific sensors (for performance data):
./check_gpu_sensor -d 0 -T GPUTemperature,fanSpeed
Warning - Tesla K20 [persistenceMode = Warning]|fanSpeed=30;80;95; GPUTemperature=38;85;100;
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Perl 100.0%