DCGMI Workload Profiles

The following profiles run DCGMI for qualifying GPUs.

Workload Details

QUAL-GPU-DCGMI.json

DCGM is part of the Nvidia GPU Deployment Kit and is designed to work with Nvidia's Tesla GPU accelerators, which are commonly used in data centers for high-performance computing and other GPU-accelerated workloads. This profile is designed to identify general/broad regressions when compared against a baseline by validating few tests as part of Active health checks.

Supported Platform/Architectures
- linux-x64
Supports Disconnected Scenarios
- No. Internet connection required.
Dependencies
The dependencies defined in the 'Dependencies' section of the profile itself are required in order to run the workload operations effectively.
- Internet connection.
- This monitor has dependency on Nvidia Driver Installation and nvidia-dcgm installation [DCGMI installation - Version 3.1].
Additional information on components that exist within the 'Dependencies' section of the profile can be found in the following locations:
- Installing Dependencies
Profile Parameters

The following parameters can be optionally supplied on the command line to change this default behavior.

Parameter Purpose Default value
Username Mandatory. User which needs to be created in container to run MLPerf benchmarks. null
Level Optional. Which level of tests to run 4
Profile Runtimes
The runtime is dependent on the value of "Level" parameter.

Level value Runtime
4 1-2 hour
3 30 min
2 2 min
1 few seconds

Timings Documentation for DCGMI diag

Parameter	Purpose	Default value
Username	Mandatory. User which needs to be created in container to run MLPerf benchmarks.	null
Level	Optional. Which level of tests to run	4

Level value	Runtime
4	1-2 hour
3	30 min
2	2 min
1	few seconds

Usage Examples

The following section provides a few basic examples of how to use the monitor profile.

# Execute the monitor profile
VirtualClient.exe --profile=QUAL-GPU-DCGMI.json --system=Demo --timeout=1440 --packageStore="{BlobConnectionString|SAS Uri}"


VirtualClient.exe --profile=QUAL-GPU-DCGMI.json --system=Demo --timeout=1440 --packageStore="{BlobConnectionString|SAS Uri}" --parameters=Level=1,,,Username=testusername

DCGMI Workload Profiles

QUAL-GPU-DCGMI.json​

QUAL-GPU-DCGMI.json