utils.distributed

distributed utils

def ranks_already_set(args) -> bool

Return True is both local and global ranks have been set.

def fetch_ranks_from_azureml_preprocess()

Look up distributed arguments from Azure ML environment variables.

Assumes OpenMPI image.

Notes:

Sets up NCCL environment variables used by Azure ML:

def fetch_ranks_from_azureml()

Look up distributed arguments from Azure ML environment variables.

Assumes OpenMPI image.

Notes:

Sets up NCCL environment variables used by Azure ML:

def fetch_ranks_from_torch_distributed_launch()

Read distributed arguments set by torch.distributed.launch via environment variables.

def set_environment_variables_for_nccl_backend()

Sets distributed training environments for azureml openmpi runs with NCCL backend.

def rank_zero_only(fn)

Decorates functions to only execute on global rank 0, else wait via torch.distributed