Welcome to MSCCL++’s documentation!
MSCCL++ is a GPU-driven communication stack for scalable AI applications. It is designed to provide a high-performance, scalable, and customizable communication stack for distributed GPU applications.
Getting Started
Follow the quick start for your platform of choice.
Take a look at the tutorials to learn how to write your first mscclpp program.
Design
Design doc for those who want to understand the internals of MSCCL++.
NCCL over MSCCL++ doc for those who want to understand how to use NCCL over MSCCL++.
Performance
We evaluate the performance of MSCCL++ in A100 and H100. Here are some performance results for all-reduce operations.