Welcome to MSCCL++’s documentation!

MSCCL++ is a GPU-driven communication stack for scalable AI applications. It is designed to provide a high-performance, scalable, and customizable communication stack for distributed GPU applications.

Getting Started

  • Follow the quick start for your platform of choice.

  • Take a look at the tutorials to learn how to write your first mscclpp program.

Design

  • Design doc for those who want to understand the internals of MSCCL++.

  • NCCL over MSCCL++ doc for those who want to understand how to use NCCL over MSCCL++.

Performance

  • We evaluate the performance of MSCCL++ in A100 and H100. Here are some performance results for all-reduce operations.

C++ API

Indices and tables