mscclpp

MSCCL++ Python API.

Functions

`deprecated`(new_cls)
`get_include`()	Return the directory that contains the MSCCL++ headers.
`get_lib`()	Return the directory that contains the MSCCL++ headers.

class mscclpp.Algorithm(id=None, execution_plan=None, native_handle=None, tags=None, constraint=None)

Bases: object

A wrapper for collective communication algorithms.

This class provides a Python interface for collective communication algorithms such as allreduce, allgather, and reduce-scatter. Algorithms can be either DSL-based (defined using MSCCL++ execution plans) or native (implemented in C++/CUDA).

Parameters:

id (Optional[str])
execution_plan (Optional[CppExecutionPlan])
native_handle (Optional[CppAlgorithm])
tags (Optional[Dict[str, int]])
constraint (Optional[Constraint])

name: Human-readable name of the algorithm.

collective: The collective operation this algorithm implements (e.g., “allreduce”).

message_size_range: Tuple of (min_size, max_size) in bytes for valid message sizes.

tags: Dictionary of tag names to tag values for algorithm selection hints.

buffer_mode: The buffer mode supported by this algorithm (IN_PLACE, OUT_OF_PLACE, or ANY).

class Constraint(world_size=0, n_ranks_per_node=0)

Bases: object

Constraints that define valid execution environments for the algorithm.

Parameters:

world_size (int) – Required world size (number of ranks). 0 means any size.
n_ranks_per_node (int) – Required number of ranks per node. 0 means any.

property buffer_mode: CppCollectiveBufferMode: The buffer mode supported by this algorithm (IN_PLACE, OUT_OF_PLACE, or ANY).

property collective: str: The collective operation this algorithm implements (e.g., “allreduce”, “allgather”).

classmethod create_from_native_capsule(obj)

Create an Algorithm instance from a PyCapsule object.

Parameters:: obj – A PyCapsule containing a native algorithm pointer.
Returns:: A new Algorithm instance wrapping the algorithm from the capsule.

classmethod create_from_native_handle(handle)

Create an Algorithm instance from a native C++ algorithm handle.

Parameters:: handle (CppAlgorithm) – The native C++ algorithm handle.
Returns:: A new Algorithm instance wrapping the native handle.

execute(comm, input_buffer, output_buffer, input_size, output_size, dtype, op=<NamedMock name='mock.CppReduceOp.NOP' id='140417031498512'>, stream=0, executor=None, nblocks=0, nthreads_per_block=0, symmetric_memory=False, extras=None)

Execute the collective algorithm.

Parameters:

comm (CppCommunicator) – The communicator to use.
input_buffer (int) – Device pointer to the input buffer.
output_buffer (int) – Device pointer to the output buffer.
input_size (int) – Size of the input buffer in bytes.
output_size (int) – Size of the output buffer in bytes.
dtype (CppDataType) – Data type of the elements.
op (CppReduceOp) – Reduction operation for reduce-type collectives (default: NOP).
stream (int) – CUDA stream to execute on (default: 0).
executor (Optional[CppExecutor]) – The executor for DSL algorithms (required for DSL, optional for native).
nblocks – Number of CUDA blocks (0 for auto-selection).
nthreads_per_block – Number of threads per block (0 for auto-selection).
symmetric_memory (bool) – Whether to use symmetric memory optimization (default: False).
extras (Optional[Dict[str, int]]) – Additional algorithm-specific parameters.

Return type:

int

Returns:

The result code (0 for success).

is_dsl_algorithm()

Check if this is a DSL-based algorithm.

Return type:: bool
Returns:: True if this algorithm is defined using DSL/execution plan, False otherwise.

is_native_algorithm()

Check if this is a native C++/CUDA algorithm.

Return type:: bool
Returns:: True if this algorithm is implemented natively, False otherwise.

property message_size_range: Tuple[int, int]: The valid message size range (min_size, max_size) in bytes.

property name: str: The human-readable name of the algorithm.

reset(): Reset the internal state of the algorithm, if applicable.

set_message_size_range(min_message_size, max_message_size)

Set the valid message size range in bytes.

Parameters:

min_message_size (int) – Minimum supported message size in bytes.
max_message_size (int) – Maximum supported message size in bytes.

Only supported for native algorithms. Raises TypeError for DSL algorithms.

property tags: Dict[str, int]: Dictionary of tag names to tag values for algorithm selection hints.

class mscclpp.GpuBuffer(*args, **kwargs)

Bases: ndarray

Parameters:

shape (int | Tuple[int])
dtype (cupy.dtype)
strides (Tuple[int])
order (str)

mscclpp.get_include()

Return the directory that contains the MSCCL++ headers.

Return type:: str

mscclpp.get_lib()

Return the directory that contains the MSCCL++ headers.

Return type:: str

Modules

`default_algos`
`ext`
`language`	MSCCL++ DSL.
`utils`