Skip to main content

Releasing SuperBench v0.10

· 2 min read
Peng Cheng
SuperBench Team

We are very happy to announce that SuperBench 0.10.0 version is officially released today!

You can install and try superbench by following Getting Started Tutorial.

SuperBench 0.10.0 Release Notes#

SuperBench Improvements#

  • Support monitoring for AMD GPUs.
  • Support ROCm 5.7 and ROCm 6.0 dockerfile.
  • Add MSCCL support for Nvidia GPU.
  • Fix NUMA domains swap issue in NDv4 topology file.
  • Add NDv5 topo file.
  • Fix NCCL and NCCL-test to 2.18.3 for hang issue in CUDA 12.2.

Micro-benchmark Improvements#

  • Add HPL random generator to gemm-flops with ROCm.
  • Add DirectXGPURenderFPS benchmark to measure the FPS of rendering simple frames.
  • Add HWDecoderFPS benchmark to measure the FPS of hardware decoder performance.
  • Update Docker image for H100 support.
  • Update MLC version into 3.10 for CUDA/ROCm dockerfile.
  • Bug fix for GPU Burn test.
  • Support INT8 in cublaslt function.
  • Add hipBLASLt function benchmark.
  • Support cpu-gpu and gpu-cpu in ib-validation.
  • Support graph mode in NCCL/RCCL benchmarks for latency metrics.
  • Support cpp implementation in distributed inference benchmark.
  • Add O2 option for gpu copy ROCm build.
  • Support different hipblasLt data types in dist inference.
  • Support in-place in NCCL/RCCL benchmark.
  • Support data type option in NCCL/RCCL benchmark.
  • Improve P2P performance with fine-grained GPU memory in GPU-copy test for AMD GPUs.
  • Update hipblaslt GEMM metric unit to tflops.
  • Support FP8 for hipblaslt benchmark.

Model Benchmark Improvements#

  • Change torch.distributed.launch to torchrun.
  • Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark.

Result Analysis#

  • Support baseline generation from multiple nodes.