Skip to main content

Nvidia SMI

The NVIDIA System Management Interface (nvidia-smi) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.

This utility allows administrators to query GPU device state and with the appropriate privileges, permits administrators to modify GPU device state. It is targeted at the TeslaTM, GRIDTM, QuadroTM and Titan X product, though limited support is also available on other NVIDIA GPUs.

NVIDIA-smi ships with NVIDIA GPU display drivers on Linux, and with 64bit Windows Server 2008 R2 and Windows 7. Nvidia-smi can report query information as XML or human readable plain text to either standard output or a file. For more details, please refer to the nvidia-smi documentation.

Dependency

This monitor has dependency on nvidia-smi. Please use [Nvidia Driver Installation] dependency first to make sure nvidia-smi is present on the system.

Supported Platforms

  • linux-x64
  • linux-arm64

Supported Query

Right now the query supported are --query-gpu and --query-c2c. Please create a feature request if you need other queries.

nvidia-smi Output Description

The following section describes the various counters/metrics that are available with the nvidia-smi toolset.

Metric NameDescription
utilization.gpuGPU Utilization percentage.
utilization.memoryGPU Memory Utilization percentage.
temperature.gpuGPU temperature in celsuis.
temperature.memoryGPU memory temperature in celsuis.
power.draw.averageAverage GPU Power Draw in Watts.
clocks.grGPU Graphics Clock in MHz.
clocks.smGPU SM Clock in MHz.
clocks.videoGPU Video Clock in MHz.
clocks.memGPU Memory Clock in MHz.
memory.totalTotal GPU Memory in MiB.
memory.freeFree GPU Memory in MiB.
memory.usedUsed GPU Memory in MiB.
power.draw.instantInstantaneous GPU Power Draw in Watts.
pcie.link.gen.gpucurrentCurrent PCIe Link Generation.
pcie.link.width.currentCurrent PCIe Link Width.
ecc.errors.corrected.volatile.device_memoryVolatile Device Memory Corrected ECC Errors.
ecc.errors.corrected.volatile.dramVolatile DRAM Corrected ECC Errors.
ecc.errors.corrected.volatile.sramVolatile SRAM Corrected ECC Errors.
ecc.errors.corrected.volatile.totalVolatile Total Corrected ECC Errors.
ecc.errors.corrected.aggregate.device_memoryAggregate Device Memory Corrected ECC Errors.
ecc.errors.corrected.aggregate.dramAggregate DRAM Corrected ECC Errors.
ecc.errors.corrected.aggregate.sramAggregate SRAM Corrected ECC Errors.
ecc.errors.corrected.aggregate.totalAggregate Total Corrected ECC Errors.
ecc.errors.uncorrected.volatile.device_memoryVolatile Device Memory Uncorrected ECC Errors.
ecc.errors.uncorrected.volatile.dramVolatile DRAM Uncorrected ECC Errors.
ecc.errors.uncorrected.volatile.sramVolatile SRAM Uncorrected ECC Errors.
ecc.errors.uncorrected.volatile.totalVolatile Total Uncorrected ECC Errors.
ecc.errors.uncorrected.aggregate.device_memoryAggregate Device Memory Uncorrected ECC Errors.
ecc.errors.uncorrected.aggregate.dramAggregate DRAM Uncorrected ECC Errors.
ecc.errors.uncorrected.aggregate.sramAggregate SRAM Uncorrected ECC Errors.
ecc.errors.uncorrected.aggregate.totalAggregate Total Uncorrected ECC Errors.
GPU 0: C2C Link 0 SpeedC2C link speed in GB/s.

Example

This is an example of the minimum profile to run NvidiaSmiMonitor.

{
"Description": "Default Monitors",
"Parameters": {
"MonitorFrequency": "00:01:00",
"MonitorWarmupPeriod": "00:01:00"
},
"Actions": [
],
"Dependencies": [
],
"Monitors": [
{
"Type": "NvidiaSmiMonitor",
"Parameters": {
"Scenario": "CaptureNvidiaSmiCounters",
"MonitorFrequency": "$.Parameters.MonitorFrequency",
"MonitorWarmupPeriod": "$.Parameters.MonitorWarmupPeriod"
}
}
]
}