MLPerf Workload Profiles
The following profiles run customer-representative or benchmarking scenarios using the MLPerf workload.
PERF-GPU-MLPERF.json
Runs the MLPerf benchmark workload to test GPU performance.
This workload is supported ONLY for systems that contain Nvidia GPU hardware components. See the documentation above for more specifics.
-
Supported Platform/Architectures
- linux-x64
-
Supports Disconnected Scenarios
- No. Internet connection required.
-
Dependencies
The dependencies defined in the 'Dependencies' section of the profile itself are required in order to run the workload operations effectively.- Internet connection.
- The VM must run on hardware containing Nvidia GPU cards/components.
Additional information on components that exist within the 'Dependencies' section of the profile can be found in the following locations:
-
Profile Parameters
The following parameters can be optionally supplied on the command line to modify the behaviors of the workload.Parameter Purpose Default Value Username Optional. User which needs to be created in container to run MLPerf benchmarks. testuser DiskFilter Optional. Filter to decide the disk that will be used to download benchmarks. Since benchmarks data is around 800gb, default disk filter is greater than 1000gb. SizeGreaterThan:1000gb LinuxCudaVersion Optional. Version of CUDA that needs to be installed. 12.4 LinuxDriverVersion Optional. Version of GPU driver that needs to be installed. 550 LinuxLocalRunFile Optional. Link to download specified CUDA and GPU driver versions. https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.4.0_550.54.14_linux.run RequireCustomSystemSupport Optional. This enables additional A100_PCIe_40GBx8 system support that was not supported by github repo of MLPerf. Ones that are supported by github repo of MLPerf are still supported. true -
Profile Runtimes
See the 'Metadata' section of the profile for estimated runtimes. These timings represent the length of time required to run a single round of profile actions. These timings can be used to determine minimum required runtimes for the Virtual Client in order to get results. These are often estimates based on the number of system cores. -
Usage Examples
The following section provides a few basic examples of how to use the workload profile.# Execute the workload profile
VirtualClient.exe --profile=PERF-GPU-MLPERF.json --system=Demo --timeout=1440 --packageStore="{BlobConnectionString|SAS Uri}"
PERF-GPU-MLPERF-TRAINING-NVIDIA.json
Runs the MLPerf benchmark workload to test GPU performance.
This workload is supported ONLY for systems that contain Nvidia GPU hardware components. See the documentation above for more specifics.
-
Supported Platform/Architectures
- linux-x64
-
Supports Disconnected Scenarios
- No. Internet connection required.
-
Storage Requirements
- Min OS Disk Size : 256GB
- Min Data Disk Size : 8TB (1 Disk)
-
Dependencies
The dependencies defined in the 'Dependencies' section of the profile itself are required in order to run the workload operations effectively.- Internet connection.
- The VM must run on hardware containing Nvidia GPU cards/components.
Additional information on components that exist within the 'Dependencies' section of the profile can be found in the following locations:
-
Profile Parameters
The following parameters can be optionally supplied on the command line to modify the behaviors of the workload.Parameter Purpose Default Value Username Optional. User which needs to be created in container to run MLPerf benchmarks. DiskFilter Optional. Filter to decide the disk that will be used to download training data. BiggestSize LinuxCudaVersion Optional. Version of CUDA that needs to be installed. 12.0 LinuxDriverVersion Optional. Version of GPU driver that needs to be installed. 525 LinuxLocalRunFile Optional. Link to download specified CUDA and GPU driver versions. https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run -
MLPerfTrainingExecutor Parameters
Parameter Purpose Default Value Model Optional. The type of model to run for training supported models mentioned in mlperf.md bert Username Optional. Username for running the docker. On null it selects logged in user. null BatchSize Optional. BatchSize for the datachunks in training model. 40 Implementation Optional. Implementation present for a given model/benchmark. Example for bert link pytorch-22.09 ContainerName Optional. Name for docker model. language_model DataPath Optional. Folder name for training data. /mlperftraining0/{DataPath} mlperf-training-data-bert.1.0.0 GPUNum Optional. Number of GPUs to stress. 8 ConfigFile Optional. Configuration for running workload. Visit the implementation for a model for all supported config files. link. config_DGXA100_1x8x56x1.sh PackageName Required. Packname for mlperf training. -
MLPERF Trainging Changes
- Changed "nvidia-docker" to "docker" inside run_with_docker.sh file.
-
Profile Runtimes
See the 'Metadata' section of the profile for estimated runtimes. These timings represent the length of time required to run a single round of profile actions. These timings can be used to determine minimum required runtimes for the Virtual Client in order to get results. These are often estimates based on the number of system cores. -
Usage Examples
The following section provides a few basic examples of how to use the workload profile.# Execute the workload profile
sudo ./VirtualClient --profile=PERF-GPU-MLPERF-TRAINING.json --system=Demo --timeout=1440 --packageStore="{BlobConnectionString|SAS Uri}"