Skip to main content

Releasing SuperBench v0.12

· 2 min read
Guoshuai Zhao
SuperBench Team

We are very happy to announce that SuperBench 0.12.0 version is officially released today!

You can install and try SuperBench by following the Getting Started Tutorial.

SuperBench 0.12.0 Release Notes#

SuperBench Improvements#

  • Optimized cutlass build process for faster builds and smaller binaries.
  • Improve image build pipeline.
  • Add support for arm64 builds.
  • Upgrade pipeline dependencies.
  • Fix SuperBench installation and code lint issues.
  • Update Flake8 repository.
  • Add support for the latest Python versions.
  • Enhance error handling for pkg_resources imports.
  • Update ROCm image build labels.
  • Add CUDA 12.8 and CUDA 12.9 support.
  • Consolidate multi-architecture Docker images.
  • Upgrade runner OS to latest version.

Micro-benchmark Improvements#

  • Add general CPU bandwidth and latency benchmarks.
  • Add nvbandwidth build process and benchmarks.
  • Add architecture support for 10.0 in gemm-flops.
  • Add GPU Stream micro benchmark.
  • Add FP4 GEMM FLOPS support in cublaslt_gemm benchmark.
  • Add Grace CPU support for CPU Stream benchmark.
  • Revise CPU Stream benchmark.
  • Fix NUMA error on Grace CPU in gpu-copy benchmark.
  • Bump onnxruntime-gpu dependency from 1.10.0 to 1.12.0.
  • Fix stderr message in gpu-copy benchmark.
  • Fix TensorRT inference parsing.
  • Handle N/A values in nvbandwidth benchmark.
  • Avoid unintended nvbandwidth function calls in all benchmarks.
  • Support CUDA arch flag and autotuning in cublaslt GEMM.

Model-benchmark Improvements#

  • Add LLaMA-2 model benchmarks.
  • Add Mixture of Experts model benchmarks.
  • Add DeepSeek inference benchmark (AMD GPU).
  • Fix typos in documentation and code.

Result Analysis#

  • Enhance logging for diagnosis rule baseline errors.

Documentation Updates#

  • Update CODEOWNERS file.