Skip to main content

DCGMI Workload Profiles

The following profiles run DCGMI for qualifying GPUs.

QUAL-GPU-DCGMI.json

DCGM is part of the Nvidia GPU Deployment Kit and is designed to work with Nvidia's Tesla GPU accelerators, which are commonly used in data centers for high-performance computing and other GPU-accelerated workloads. This profile is designed to identify general/broad regressions when compared against a baseline by validating few tests as part of Active health checks.

  • Supported Platform/Architectures

    • linux-x64
  • Supports Disconnected Scenarios

    • No. Internet connection required.
  • Dependencies
    The dependencies defined in the 'Dependencies' section of the profile itself are required in order to run the workload operations effectively.

    • Internet connection.
    • This monitor has dependency on Nvidia Driver Installation and nvidia-dcgm installation [DCGMI installation - Version 3.1].

    Additional information on components that exist within the 'Dependencies' section of the profile can be found in the following locations:

  • Profile Parameters

    The following parameters can be optionally supplied on the command line to change this default behavior.

    ParameterPurposeDefault value
    UsernameMandatory. User which needs to be created in container to run MLPerf benchmarks.null
    LevelOptional. Which level of tests to run4
  • Profile Runtimes
    The runtime is dependent on the value of "Level" parameter.

    Level valueRuntime
    41-2 hour
    330 min
    22 min
    1few seconds

    Timings Documentation for DCGMI diag

  • Usage Examples

    The following section provides a few basic examples of how to use the monitor profile.

    # Execute the monitor profile
    VirtualClient.exe --profile=QUAL-GPU-DCGMI.json --system=Demo --timeout=1440 --packageStore="{BlobConnectionString|SAS Uri}"
    
    
    VirtualClient.exe --profile=QUAL-GPU-DCGMI.json --system=Demo --timeout=1440 --packageStore="{BlobConnectionString|SAS Uri}" --parameters=Level=1,,,Username=testusername