Releasing SuperBench v0.3

September 22, 2021 · 4 min read

SuperBench Team

We are very happy to announce that SuperBench 0.3.0 version is officially released today!

You can install and try superbench by following Getting Started Tutorial.

SuperBench 0.3.0 Release Notes#

Memory (Tool: NVIDIA/AMD Bandwidth Test Tool)
Metrics Unit Description
H2D_Mem_BW_GPU GB/s host-to-GPU bandwidth for each GPU
D2H_Mem_BW_GPU GB/s GPU-to-host bandwidth for each GPU

Metrics	Unit	Description
H2D_Mem_BW_GPU	GB/s	host-to-GPU bandwidth for each GPU
D2H_Mem_BW_GPU	GB/s	GPU-to-host bandwidth for each GPU

IBLoopback (Tool: PerfTest – Standard RDMA Test Tool)

Metrics	Unit	Description
IB_Write	MB/s	The IB write loopback throughput with different message sizes
IB_Read	MB/s	The IB read loopback throughput with different message sizes
IB_Send	MB/s	The IB send loopback throughput with different message sizes

NCCL/RCCL (Tool: NCCL/RCCL Tests)

Metrics	Unit	Description
NCCL_AllReduce	GB/s	The NCCL AllReduce performance with different message sizes
NCCL_AllGather	GB/s	The NCCL AllGather performance with different message sizes
NCCL_broadcast	GB/s	The NCCL Broadcast performance with different message sizes
NCCL_reduce	GB/s	The NCCL Reduce performance with different message sizes
NCCL_reduce_scatter	GB/s	The NCCL ReduceScatter performance with different message sizes

Disk (Tool: FIO – Standard Disk Performance Tool)

Metrics	Unit	Description
Seq_Read	MB/s	Sequential read performance
Seq_Write	MB/s	Sequential write performance
Rand_Read	MB/s	Random read performance
Rand_Write	MB/s	Random write performance
Seq_R/W_Read	MB/s	Read performance in sequential read/write, fixed measurement (read:write = 4:1)
Seq_R/W_Write	MB/s	Write performance in sequential read/write (read:write = 4:1)
Rand_R/W_Read	MB/s	Read performance in random read/write (read:write = 4:1)
Rand_R/W_Write	MB/s	Write performance in random read/write (read:write = 4:1)

H2D/D2H SM Transmission Bandwidth (Tool: MSR-A build)
Metrics Unit Description
H2D_SM_BW_GPU GB/s host-to-GPU bandwidth using GPU kernel for each GPU
D2H_SM_BW_GPU GB/s GPU-to-host bandwidth using GPU kernel for each GPU

Metrics	Unit	Description
H2D_SM_BW_GPU	GB/s	host-to-GPU bandwidth using GPU kernel for each GPU
D2H_SM_BW_GPU	GB/s	GPU-to-host bandwidth using GPU kernel for each GPU

Kernel Launch (Tool: MSR-A build)
Metrics Unit Description
Kernel_Launch_Event_Time Time (ms) Dispatch latency measured in GPU time using hipEventRecord()
Kernel_Launch_Wall_Time Time (ms) Dispatch latency measured in CPU time

Metrics	Unit	Description
Kernel_Launch_Event_Time	Time (ms)	Dispatch latency measured in GPU time using hipEventRecord()
Kernel_Launch_Wall_Time	Time (ms)	Dispatch latency measured in CPU time

GEMM FLOPS (Tool: AMD rocblas-bench Tool)

CNN models -- Use PyTorch torchvision models
- ResNet: ResNet-50, ResNet-101, ResNet-152
- DenseNet: DenseNet-169, DenseNet-201
- VGG: VGG-11, VGG-13, VGG-16, VGG-19
BERT -- Use huggingface Transformers
- BERT
- BERT Large
LSTM -- Use PyTorch
GPT-2 -- Use huggingface Transformers