mscclpp
MSCCL++ Python API.
Functions
|
|
Return the directory that contains the MSCCL++ headers. |
|
|
Return the directory that contains the MSCCL++ headers. |
- class mscclpp.Algorithm(id=None, execution_plan=None, native_handle=None, tags=None, constraint=None)
Bases:
objectA wrapper for collective communication algorithms.
This class provides a Python interface for collective communication algorithms such as allreduce, allgather, and reduce-scatter. Algorithms can be either DSL-based (defined using MSCCL++ execution plans) or native (implemented in C++/CUDA).
- Parameters:
id (Optional[str])
execution_plan (Optional[CppExecutionPlan])
native_handle (Optional[CppAlgorithm])
constraint (Optional[Constraint])
- name
Human-readable name of the algorithm.
- collective
The collective operation this algorithm implements (e.g., “allreduce”).
- message_size_range
Tuple of (min_size, max_size) in bytes for valid message sizes.
- tags
Dictionary of tag names to tag values for algorithm selection hints.
- buffer_mode
The buffer mode supported by this algorithm (IN_PLACE, OUT_OF_PLACE, or ANY).
- class Constraint(world_size=0, n_ranks_per_node=0)
Bases:
objectConstraints that define valid execution environments for the algorithm.
- property buffer_mode: CppCollectiveBufferMode
The buffer mode supported by this algorithm (IN_PLACE, OUT_OF_PLACE, or ANY).
- property collective: str
The collective operation this algorithm implements (e.g., “allreduce”, “allgather”).
- classmethod create_from_native_capsule(obj)
Create an Algorithm instance from a PyCapsule object.
- Parameters:
obj – A PyCapsule containing a native algorithm pointer.
- Returns:
A new Algorithm instance wrapping the algorithm from the capsule.
- classmethod create_from_native_handle(handle)
Create an Algorithm instance from a native C++ algorithm handle.
- Parameters:
handle (
CppAlgorithm) – The native C++ algorithm handle.- Returns:
A new Algorithm instance wrapping the native handle.
- execute(comm, input_buffer, output_buffer, input_size, output_size, dtype, op=<NamedMock name='mock.CppReduceOp.NOP' id='140595804162240'>, stream=0, executor=None, nblocks=0, nthreads_per_block=0, extras=None)
Execute the collective algorithm.
- Parameters:
comm (
CppCommunicator) – The communicator to use.input_buffer (
int) – Device pointer to the input buffer.output_buffer (
int) – Device pointer to the output buffer.input_size (
int) – Size of the input buffer in bytes.output_size (
int) – Size of the output buffer in bytes.dtype (
CppDataType) – Data type of the elements.op (
CppReduceOp) – Reduction operation for reduce-type collectives (default: NOP).stream (
int) – CUDA stream to execute on (default: 0).executor (
Optional[CppExecutor]) – The executor for DSL algorithms (required for DSL, optional for native).nblocks – Number of CUDA blocks (0 for auto-selection).
nthreads_per_block – Number of threads per block (0 for auto-selection).
extras (
Optional[Dict[str,int]]) – Additional algorithm-specific parameters.
- Return type:
- Returns:
The result code (0 for success).
- is_dsl_algorithm()
Check if this is a DSL-based algorithm.
- Return type:
- Returns:
True if this algorithm is defined using DSL/execution plan, False otherwise.
- is_native_algorithm()
Check if this is a native C++/CUDA algorithm.
- Return type:
- Returns:
True if this algorithm is implemented natively, False otherwise.
- class mscclpp.GpuBuffer(*args, **kwargs)
Bases:
ndarray
Modules
MSCCL++ DSL. |
|