utils.distributed
distributed utils
ranks_already_set#
Return True is both local and global ranks have been set.
fetch_ranks_from_azureml_preprocess#
Look up distributed arguments from Azure ML environment variables.
Assumes OpenMPI image.
Notes:
Sets up NCCL environment variables used by Azure ML:
- NCCL_SOCKET_IFNAME
- NCCL_IB_DISABLE
fetch_ranks_from_azureml#
Look up distributed arguments from Azure ML environment variables.
Assumes OpenMPI image.
Notes:
Sets up NCCL environment variables used by Azure ML:
- NCCL_SOCKET_IFNAME
- NCCL_IB_DISABLE
fetch_ranks_from_torch_distributed_launch#
Read distributed arguments set by torch.distributed.launch via environment variables.
set_environment_variables_for_nccl_backend#
Sets distributed training environments for azureml openmpi runs with NCCL backend.
rank_zero_only#
Decorates functions to only execute on global rank 0, else wait via torch.distributed