Model Benchmarks

PyTorch Model Benchmarks#

Run training or inference tasks with single or half precision for deep learning models, including the following categories:

For inference, supported percentiles include 50^th, 90^th, 95^th, 99^th, and 99.9^th.

New: Support fp8_hybrid and fp8_e4m3 precision for BERT models.

Name	Unit	Description
model-benchmarks/pytorch-${model_name}/${precision}_train_step_time	time (ms)	The average training step time with fp32/fp16 precision.
model-benchmarks/pytorch-${model_name}/${precision}_train_throughput	throughput (samples/s)	The average training throughput with fp32/fp16 precision per GPU.
model-benchmarks/pytorch-${model_name}/${precision}_inference_step_time	time (ms)	The average inference step time with fp32/fp16 precision.
model-benchmarks/pytorch-${model_name}/${precision}_inference_throughput	throughput (samples/s)	The average inference throughput with fp32/fp16 precision.
model-benchmarks/pytorch-${model_name}/${precision}_inference_step_time_${percentile}	time (ms)	The n^th percentile inference step time with fp32/fp16 precision.
model-benchmarks/pytorch-${model_name}/${precision}_inference_throughput_${percentile}	throughput (samples/s)	The n^th percentile inference throughput with fp32/fp16 precision.

Run GPT pretrain tasks with float32, float16, bfloat16 precisions with Megatron-LM or Megatron-DeepSpeed.

tips: batch_size in this benchmark represents global batch size, the batch size on each GPU instance is micro_batch_size.

Name	Unit	Description
megatron-gpt/${precision}_train_step_time	time (ms)	The average training step time per iteration.
megatron-gpt/${precision}_train_throughput	throughput (samples/s)	The average training throughput per iteration.
megatron-gpt/${precision}_train_tflops	tflops/s	The average training tflops per second per iteration.
megatron-gpt/${precision}_train_mem_allocated	GB	The average GPU memory allocated per iteration.
megatron-gpt/${precision}_train_max_mem_allocated	GB	The average maximum GPU memory allocated per iteration.