mscclpp.language.program

Classes

CollectiveProgram(name, collective, num_ranks)

A program definition for MSCCL++ collective communication operations.

class mscclpp.language.program.CollectiveProgram(name, collective, num_ranks, instances=1, protocol='Simple', instr_fusion=True, auto_sync=True, replication_policy=ReplicationPolicy.interleaved, reuse_resources=False, num_threads_per_block=1024, use_double_scratch_buffer=False, buffer_alignment=16, min_message_size=0, max_message_size=18446744073709551615)

Bases: object

A program definition for MSCCL++ collective communication operations.

CollectiveProgram serves as the main container for defining and executing collective communication programs using the MSCCL++ DSL. It manages GPU resources, channels, operations, and provides serialization to JSON format for execution.

Parameters:
name

The name of the program.

Type:

str

collective

The collective operation this program implements.

Type:

Collective

num_ranks

The number of ranks participating in the program.

Type:

int

instances

The number of instances to replicate.

Type:

int

protocol

The communication protocol (“Simple” or “LL”).

Type:

str

instr_fusion

Whether to enable instruction fusion optimization.

Type:

bool

replication_policy

The policy for replicating operations.

Type:

ReplicationPolicy

reuse_resources

Whether to reuse resources across instances.

Type:

bool

num_threads_per_block

Number of threads per GPU thread block.

Type:

int

use_double_scratch_buffer

Whether to use double scratch buffering.

Type:

bool

buffer_alignment

Buffer alignment in bytes.

Type:

int

min_message_size

Minimum message size for this program.

Type:

int

max_message_size

Maximum message size for this program.

Type:

int

buffers

Buffer configurations for each rank.

Type:

list

gpus

List of GPU objects representing each rank.

Type:

List[Gpu]

loop_context

Current pipeline loop context, if any.

classmethod from_spec(spec)

Initialize a new CollectiveProgram from an algorithm specification.

This constructor provides an alternative way to create a CollectiveProgram using an AlgoSpec object, which contains the complete algorithm specification including collective instance, protocol parameters, and optimization settings. The collective operation is directly provided through the spec’s collective attribute.

Parameters:

spec (AlgoSpec) – Algorithm specification containing all program parameters and configuration settings, including a Collective instance.

Raises:

AssertionError – If protocol is not “Simple” or “LL”.

Example

>>> from mscclpp.language.utils import AlgoSpec
>>> from mscclpp.language.collectives import AllReduce
>>> collective = AllReduce(num_ranks=4, chunk_factor=1, inplace=False)
>>> spec = AlgoSpec(
...     name="my_allreduce",
...     collective=collective,
...     world_size=4,
...     instances=1,
...     protocol="Simple",
...     in_place=False
... )
>>> with CollectiveProgram.from_spec(spec) as prog:
...     # Define communication operations
...     pass