utils.distributed
distributed utils
#
ranks_already_setReturn True is both local and global ranks have been set.
#
fetch_ranks_from_azureml_preprocessLook up distributed arguments from Azure ML environment variables.
Assumes OpenMPI image.
Notes:
Sets up NCCL environment variables used by Azure ML:
- NCCL_SOCKET_IFNAME
- NCCL_IB_DISABLE
#
fetch_ranks_from_azuremlLook up distributed arguments from Azure ML environment variables.
Assumes OpenMPI image.
Notes:
Sets up NCCL environment variables used by Azure ML:
- NCCL_SOCKET_IFNAME
- NCCL_IB_DISABLE
#
fetch_ranks_from_torch_distributed_launchRead distributed arguments set by torch.distributed.launch via environment variables.
#
set_environment_variables_for_nccl_backendSets distributed training environments for azureml openmpi runs with NCCL backend.
#
rank_zero_onlyDecorates functions to only execute on global rank 0, else wait via torch.distributed