Skip to main content

MLPerf

MLPerf is a consortium of AI leaders from academia, research labs, and industry whose mission is to “build fair and useful benchmarks” that provide unbiased evaluations of training and inference performance for hardware, software, and services—all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.

System Requirements

This is a GPU-specific workload and requires high-performance graphic cards to run. It is recommended that the system-under-test have a high-performing Nvidia (e.g. M60 or higher) or AMD (e.g. MI25 or higher) graphics card.

Supported Hardware Systems

The following section defines the hardware systems/SKUs on which the MLPerf workload will run effectively in cloud environments. These hardware systems contain GPU components for which the MLPerf workload is designed to test.

  • Datacenter systems MLPerf Inference

    • A100-SXM4-40GBx8
    • A100-SXM-80GBx8 (NVIDIA DGX A100, 80GB variant)
    • A100-SXM-80GBx4 (NVIDIA DGX Station A100, "Red SEPTober", 80GB variant)
    • A100-PCIex8 (80GB variant)
    • A2x2
    • A30x8
  • Edge Systems MLPerf Inference

    • A100-SXM-80GBx1
    • A100-PCIex1 (80 GB variant)
    • A30x1
    • A2x1
    • Orin
    • Xavier NX
  • Supported Config Files for MlPerf Bert Training (config_{nodes}x{gpus per node}x{local batch size}x{gradien accumulation}.sh)

    • config_A30_1x2x224x14.sh
    • config_DGXA100_1x4x56x2.sh
    • config_DGXA100_1x8x56x1.sh
    • config_DGXA100_4gpu_common.sh
    • config_DGXA100_512x8x2x1_pack.sh
    • config_DGXA100_8x8x48x1.sh
    • config_DGXA100_common.sh

Source: link

Additional details on whether a system is supported or not can be found in the documetation here, for each benchmark check it's respective implementation folder : https://github.com/mlcommons/training_results_v2.1/tree/main/NVIDIA/benchmarks https://github.com/mlcommons/inference_results_v4.1/tree/master/closed/NVIDIA

For systems which are not already included by MLPerf, add the config information in the appropriate __init__.py file under GPUConfigFiles.
For example with A100-SXM4-40GBx8, we have the following section in the 3d-unet, SingleStream file which is copied during initialization:

@ConfigRegistry.register(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
class A100_SXM4_40GBx8(A100_SXM4_40GBx1):
system = KnownSystem.A100_SXM4_40GBx8
gpu_batch_size = {'3d-unet': 8}
start_from_device = True
end_on_device = True
single_stream_expected_latency_ns = 520000000

What is Being Measured?

GPU performance across a wide range of inference models. Work is planned for integrating support for training models as well.

  • Training Benchmarks

    • bert
    • dlrm (not supported yet)
    • maskrcnn (not supported yet)
    • minigo (not supported yet)
    • resnet (not supported yet)
    • rnnt (not supported yet)
    • ssd (not supported yet)
    • unet3 (not supported yet)
  • Inference Benchmarks

    • bert
    • 3d-unet
    • dlrm-v2 (not supported yet)
    • gptj (not supported yet)
    • llama2-70b (not supported yet)
    • mixtral-8x7b (not supported yet)
    • resnet50 (not supported yet)
    • retinanet (not supported yet)
    • stable-diffusion-xl (not supported yet)

Workload Metrics MLPerf Inference

The following metrics are examples of those captured by the Virtual Client when running the MLPerf Inference workload.

ScenarioMetric NameExample ValueUnit
bertPerformanceMode_p991.0VALID/INVALID
bertlatency_ns_p99525066834
bertsamples_per_second_p9925.2768
bertAccuracyMode_p991.0PASS/FAIL
bertAccuracyValue_p990.86112
bertThresholdValue_p990.853083
bertAccuracyThresholdRatio_p991.00831923740128PASS/FAIL

Workload Metrics MLPerf Training

ScenarioMetric NameExample Value (min)Example Value (max)Example Value (avg)Unit
training-mlperf-bert-batchsize-45-gpu-8eval_mlm_accuracy0.6505528540.6725528540.662552854%
training-mlperf-bert-batchsize-45-gpu-8e2e_time1071.0405711078.0405711074.040571s
training-mlperf-bert-batchsize-45-gpu-8training_sequences_per_second2288.4636152300.4636152295.463615
training-mlperf-bert-batchsize-45-gpu-8final_loss000
training-mlperf-bert-batchsize-45-gpu-8raw_train_time1053.9822371070.9822371063.982237s