Skip to main content

MLPerf Workload Profiles

The following profiles run customer-representative or benchmarking scenarios using the MLPerf workload.

PERF-GPU-MLPERF.json

Runs the MLPerf benchmark workload to test GPU performance.

warning

This workload is supported ONLY for systems that contain Nvidia GPU hardware components. See the documentation above for more specifics.

  • Workload Profile

  • Supported Platform/Architectures

    • linux-x64
  • Supports Disconnected Scenarios

    • No. Internet connection required.
  • Dependencies
    The dependencies defined in the 'Dependencies' section of the profile itself are required in order to run the workload operations effectively.

    • Internet connection.
    • The VM must run on hardware containing Nvidia GPU cards/components.

    Additional information on components that exist within the 'Dependencies' section of the profile can be found in the following locations:

  • Profile Parameters
    The following parameters can be optionally supplied on the command line to modify the behaviors of the workload.

    ParameterPurposeDefault Value
    UsernameOptional. User which needs to be created in container to run MLPerf benchmarks.testuser
    DiskFilterOptional. Filter to decide the disk that will be used to download benchmarks. Since benchmarks data is around 800gb, default disk filter is greater than 1000gb.SizeGreaterThan:1000gb
    LinuxCudaVersionOptional. Version of CUDA that needs to be installed.12.4
    LinuxDriverVersionOptional. Version of GPU driver that needs to be installed.550
    LinuxLocalRunFileOptional. Link to download specified CUDA and GPU driver versions.https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.4.0_550.54.14_linux.run
    RequireCustomSystemSupportOptional. This enables additional A100_PCIe_40GBx8 system support that was not supported by github repo of MLPerf. Ones that are supported by github repo of MLPerf are still supported.true
  • Profile Runtimes
    See the 'Metadata' section of the profile for estimated runtimes. These timings represent the length of time required to run a single round of profile actions. These timings can be used to determine minimum required runtimes for the Virtual Client in order to get results. These are often estimates based on the number of system cores.

  • Usage Examples
    The following section provides a few basic examples of how to use the workload profile.

    # Execute the workload profile
    VirtualClient.exe --profile=PERF-GPU-MLPERF.json --system=Demo --timeout=1440 --packageStore="{BlobConnectionString|SAS Uri}"

PERF-GPU-MLPERF-TRAINING-NVIDIA.json

Runs the MLPerf benchmark workload to test GPU performance.

warning

This workload is supported ONLY for systems that contain Nvidia GPU hardware components. See the documentation above for more specifics.

  • Workload Profile

  • Supported Platform/Architectures

    • linux-x64
  • Supports Disconnected Scenarios

    • No. Internet connection required.
  • Storage Requirements

    • Min OS Disk Size : 256GB
    • Min Data Disk Size : 8TB (1 Disk)
  • Dependencies
    The dependencies defined in the 'Dependencies' section of the profile itself are required in order to run the workload operations effectively.

    • Internet connection.
    • The VM must run on hardware containing Nvidia GPU cards/components.

    Additional information on components that exist within the 'Dependencies' section of the profile can be found in the following locations:

  • Profile Parameters
    The following parameters can be optionally supplied on the command line to modify the behaviors of the workload.

    ParameterPurposeDefault Value
    UsernameOptional. User which needs to be created in container to run MLPerf benchmarks.
    DiskFilterOptional. Filter to decide the disk that will be used to download training data.BiggestSize
    LinuxCudaVersionOptional. Version of CUDA that needs to be installed.12.0
    LinuxDriverVersionOptional. Version of GPU driver that needs to be installed.525
    LinuxLocalRunFileOptional. Link to download specified CUDA and GPU driver versions.https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run
  • MLPerfTrainingExecutor Parameters

    ParameterPurposeDefault Value
    ModelOptional. The type of model to run for training supported models mentioned in mlperf.mdbert
    UsernameOptional. Username for running the docker. On null it selects logged in user.null
    BatchSizeOptional. BatchSize for the datachunks in training model.40
    ImplementationOptional. Implementation present for a given model/benchmark. Example for bert linkpytorch-22.09
    ContainerNameOptional. Name for docker model.language_model
    DataPathOptional. Folder name for training data. /mlperftraining0/{DataPath}mlperf-training-data-bert.1.0.0
    GPUNumOptional. Number of GPUs to stress.8
    ConfigFileOptional. Configuration for running workload. Visit the implementation for a model for all supported config files. link.config_DGXA100_1x8x56x1.sh
    PackageNameRequired. Packname for mlperf training.
  • MLPERF Trainging Changes

    • Changed "nvidia-docker" to "docker" inside run_with_docker.sh file.
  • Profile Runtimes
    See the 'Metadata' section of the profile for estimated runtimes. These timings represent the length of time required to run a single round of profile actions. These timings can be used to determine minimum required runtimes for the Virtual Client in order to get results. These are often estimates based on the number of system cores.

  • Usage Examples
    The following section provides a few basic examples of how to use the workload profile.

    # Execute the workload profile
    sudo ./VirtualClient --profile=PERF-GPU-MLPERF-TRAINING.json --system=Demo --timeout=1440 --packageStore="{BlobConnectionString|SAS Uri}"