Skip to main content

Releasing SuperBench v0.8

· 2 min read
Peng Cheng
SuperBench Team

We are very happy to announce that SuperBench 0.8.0 version is officially released today!

You can install and try superbench by following Getting Started Tutorial.

SuperBench 0.8.0 Release Notes#

SuperBench Improvements#

  • Support SuperBench Executor running on Windows.
  • Remove fixed rccl version in rocm5.1.x docker file.
  • Upgrade networkx version to fix installation compatibility issue.
  • Pin setuptools version to v65.7.0.
  • Limit ansible_runner version for Python 3.6.
  • Support cgroup V2 when read system metrics in monitor.
  • Fix analyzer bug in Python 3.8 due to pandas api change.
  • Collect real-time GPU power in monitor.
  • Remove unreachable condition when write host list in mpi mode.
  • Upgrade Docker image with cuda12.1, nccl 2.17.1-1, hpcx v2.14, and mlc 3.10.
  • Fix wrong unit of cpu-memory-bw-latency in document.

Micro-benchmark Improvements#

  • Add STREAM benchmark for sustainable memory bandwidth and the corresponding computation rate.
  • Add HPL Benchmark for HPC Linpack Benchmark.
  • Support flexible warmup and non-random data initialization in cublas-benchmark.
  • Support error tolerance in micro-benchmark for CuDNN function.
  • Add distributed inference benchmark.
  • Support tensor core precisions (e.g., FP8) and batch/shape range in cublaslt gemm.

Model Benchmark Improvements#

  • Fix torch.dist init issue with multiple models.
  • Support TE FP8 in BERT/GPT2 model.
  • Add num_workers configurations in model benchmark.