mscclpp.language.program
Classes
|
A program definition for MSCCL++ collective communication operations. |
- class mscclpp.language.program.CollectiveProgram(name, collective, num_ranks, instances=1, protocol='Simple', instr_fusion=True, auto_sync=True, replication_policy=ReplicationPolicy.interleaved, reuse_resources=False, num_threads_per_block=1024, use_double_scratch_buffer=False, buffer_alignment=16, min_message_size=0, max_message_size=18446744073709551615)
Bases:
objectA program definition for MSCCL++ collective communication operations.
CollectiveProgram serves as the main container for defining and executing collective communication programs using the MSCCL++ DSL. It manages GPU resources, channels, operations, and provides serialization to JSON format for execution.
- Parameters:
name (str)
collective (Collective)
num_ranks (int)
instances (int)
protocol (str)
instr_fusion (bool)
auto_sync (bool)
replication_policy (ReplicationPolicy)
reuse_resources (bool)
num_threads_per_block (int)
use_double_scratch_buffer (bool)
buffer_alignment (int)
min_message_size (int)
max_message_size (int)
- collective
The collective operation this program implements.
- Type:
- replication_policy
The policy for replicating operations.
- Type:
- gpus
List of GPU objects representing each rank.
- Type:
List[Gpu]
- loop_context
Current pipeline loop context, if any.
- classmethod from_spec(spec)
Initialize a new CollectiveProgram from an algorithm specification.
This constructor provides an alternative way to create a CollectiveProgram using an AlgoSpec object, which contains the complete algorithm specification including collective instance, protocol parameters, and optimization settings. The collective operation is directly provided through the spec’s collective attribute.
- Parameters:
spec (AlgoSpec) – Algorithm specification containing all program parameters and configuration settings, including a Collective instance.
- Raises:
AssertionError – If protocol is not “Simple” or “LL”.
Example
>>> from mscclpp.language.utils import AlgoSpec >>> from mscclpp.language.collectives import AllReduce >>> collective = AllReduce(num_ranks=4, chunk_factor=1, inplace=False) >>> spec = AlgoSpec( ... name="my_allreduce", ... collective=collective, ... world_size=4, ... instances=1, ... protocol="Simple", ... in_place=False ... ) >>> with CollectiveProgram.from_spec(spec) as prog: ... # Define communication operations ... pass