MLPerf

MLPerf is a consortium of AI leaders from academia, research labs, and industry whose mission is to “build fair and useful benchmarks” that provide unbiased evaluations of training and inference performance for hardware, software, and services—all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.

System Requirements

This is a GPU-specific workload and requires high-performance graphic cards to run. It is recommended that the system-under-test have a high-performing Nvidia (e.g. M60 or higher) or AMD (e.g. MI25 or higher) graphics card.

Supported Hardware Systems

The following section defines the hardware systems/SKUs on which the MLPerf workload will run effectively in cloud environments. These hardware systems contain GPU components for which the MLPerf workload is designed to test.

Datacenter systems MLPerf Inference
- A100-SXM4-40GBx8
- A100-SXM-80GBx8 (NVIDIA DGX A100, 80GB variant)
- A100-SXM-80GBx4 (NVIDIA DGX Station A100, "Red SEPTober", 80GB variant)
- A100-PCIex8 (80GB variant)
- A2x2
- A30x8
Edge Systems MLPerf Inference
- A100-SXM-80GBx1
- A100-PCIex1 (80 GB variant)
- A30x1
- A2x1
- Orin
- Xavier NX
Supported Config Files for MlPerf Bert Training (config_{nodes}x{gpus per node}x{local batch size}x{gradien accumulation}.sh)
- config_A30_1x2x224x14.sh
- config_DGXA100_1x4x56x2.sh
- config_DGXA100_1x8x56x1.sh
- config_DGXA100_4gpu_common.sh
- config_DGXA100_512x8x2x1_pack.sh
- config_DGXA100_8x8x48x1.sh
- config_DGXA100_common.sh

Source: link

Additional details on whether a system is supported or not can be found in the documetation here, for each benchmark check it's respective implementation folder : https://github.com/mlcommons/training_results_v2.1/tree/main/NVIDIA/benchmarks https://github.com/mlcommons/inference_results_v4.1/tree/master/closed/NVIDIA

For systems which are not already included by MLPerf, add the config information in the appropriate __init__.py file under GPUConfigFiles.
For example with A100-SXM4-40GBx8, we have the following section in the 3d-unet, SingleStream file which is copied during initialization:

@ConfigRegistry.register(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
class A100_SXM4_40GBx8(A100_SXM4_40GBx1):
    system = KnownSystem.A100_SXM4_40GBx8
    gpu_batch_size = {'3d-unet': 8}
    start_from_device = True
    end_on_device = True
    single_stream_expected_latency_ns = 520000000

What is Being Measured?

GPU performance across a wide range of inference models. Work is planned for integrating support for training models as well.

Training Benchmarks
- bert
- ~~dlrm (not supported yet)~~
- ~~maskrcnn (not supported yet)~~
- ~~minigo (not supported yet)~~
- ~~resnet (not supported yet)~~
- ~~rnnt (not supported yet)~~
- ~~ssd (not supported yet)~~
- ~~unet3 (not supported yet)~~
Inference Benchmarks
- bert
- 3d-unet
- ~~dlrm-v2 (not supported yet)~~
- ~~gptj (not supported yet)~~
- ~~llama2-70b (not supported yet)~~
- ~~mixtral-8x7b (not supported yet)~~
- ~~resnet50 (not supported yet)~~
- ~~retinanet (not supported yet)~~
- ~~stable-diffusion-xl (not supported yet)~~

Workload Metrics MLPerf Inference

The following metrics are examples of those captured by the Virtual Client when running the MLPerf Inference workload.

Scenario	Metric Name	Example Value	Unit
bert	PerformanceMode_p99	1.0	VALID/INVALID
bert	latency_ns_p99	525066834
bert	samples_per_second_p99	25.2768
bert	AccuracyMode_p99	1.0	PASS/FAIL
bert	AccuracyValue_p99	0.86112
bert	ThresholdValue_p99	0.853083
bert	AccuracyThresholdRatio_p99	1.00831923740128	PASS/FAIL

Workload Metrics MLPerf Training

Scenario	Metric Name	Example Value (min)	Example Value (max)	Example Value (avg)	Unit
training-mlperf-bert-batchsize-45-gpu-8	eval_mlm_accuracy	0.650552854	0.672552854	0.662552854	%
training-mlperf-bert-batchsize-45-gpu-8	e2e_time	1071.040571	1078.040571	1074.040571	s
training-mlperf-bert-batchsize-45-gpu-8	training_sequences_per_second	2288.463615	2300.463615	2295.463615
training-mlperf-bert-batchsize-45-gpu-8	final_loss	0	0	0
training-mlperf-bert-batchsize-45-gpu-8	raw_train_time	1053.982237	1070.982237	1063.982237	s

MLPerf

System Requirements​

Supported Hardware Systems​

What is Being Measured?​

Workload Metrics MLPerf Inference​

Workload Metrics MLPerf Training​

System Requirements

Supported Hardware Systems

What is Being Measured?

Workload Metrics MLPerf Inference

Workload Metrics MLPerf Training