Skip to content

CLI Reference

submit-aml

Submit a job to be run on Azure Machine Learning.

Unrecognized arguments are ignored and propagated to the script.

submit-aml \
    --script run.py \
    --experiment-name "my-experiment" \
    --mount "vindr_dir=VINDR-CXR-V2" \
        --my-script-arg "hello"

Usage:

submit [OPTIONS]

Options:

  -e, --experiment-name TEXT      Name of the Azure ML experiment to which the
                                  job will be submitted. If not provided, the
                                  name of the current directory name will be
                                  used.
  -r, --run-name TEXT             Display name of the Azure ML run.
  --workspace TEXT                Name of the Azure ML workspace.
  -g, --resource-group TEXT       Name of the Azure ML resource group.
  --subscription TEXT             Subscription ID of the workspace.
  --description TEXT              Description for the Azure ML job. If not
                                  provided, the local command will be used.
  -c, --compute-target TEXT       Name of the Azure ML compute target to run
                                  the job on.
  -i, --docker-image TEXT         Base Docker image to use for the job.
                                  [default: mcr.microsoft.com/azureml/openmpi4
                                  .1.0-cuda11.8-cudnn8-ubuntu22.04]
  --build-context / --no-build-context
                                  Whether to build a Docker context from the
                                  project directory.  [default: build-context]
  --docker-run TEXT               Extra command to run in Docker build before
                                  syncing the environment.
  --aml-environment TEXT          Name of an existing Azure ML environment to
                                  use for the job. If provided, the Docker
                                  image and build context arguments will be
                                  ignored.
  --shared-memory INTEGER         Amount of shared memory for the Docker
                                  container (in GB)  [default: 256]
  -n, --num-nodes INTEGER         Number of nodes to use for the job.
                                  [default: 1]
  -d, --download TEXT             Azure ML dataset or job output folder to
                                  download. To download an Azure ML dataset,
                                  the argument should take the form: alias,
                                  name and version of the dataset; for
                                  example: 'vindr_dir=VINDR-CXR-V2:1'. If the
                                  version is omitted, the last one will be
                                  used. To download the output folder of a
                                  previous job, the argument should take the
                                  form 'alias=job_dir:<job_id>:<path/in/job/ou
                                  tputs>'; for example: 'checkpoint=job_dir:cr
                                  usty_hat_43s6lmvb25:outputs/checkpoint-10000
                                  '. The alias can be used to pass input
                                  datasets to the script, e.g.,
                                  '${{inputs.vindr_dir}}' or
                                  '${{inputs.checkpoint}}'. This option can be
                                  used multiple times.
  -m, --mount TEXT                Azure ML dataset or job output folder to
                                  mount. For an Azure ML dataset, the alias,
                                  name and version should be provided while
                                  for a job output folder, the alias, job ID
                                  and path in the job outputs should be
                                  provided. See the --download option for more
                                  information.
  -o, --output TEXT               Alias, datastore and path to folder into
                                  which outputs will be written, expressed as
                                  "alias=datastore/path/to/dir". For example:
                                  "out_dir=mydatastore/my_dataset". The alias
                                  can be used to pass outputs to the script,
                                  e.g., "${{outputs.out_dir}}". See the
                                  example for more information. This option
                                  can be used multiple times.
  --command-prefix TEXT           Prefix to prepend to the command. For
                                  example, `uv run`.  [default: uv run --no-
                                  default-groups]
  --executable TEXT               The executable, e.g., `python`, `'torchrun
                                  --nproc-per-node auto'`, `bash`, or `nvidia-
                                  smi`.  [default: python]
  -s, --script PATH               Path to the script that will be run on Azure
                                  ML.
  --sweep TEXT                    Azure ML hyperparameter for sweep jobs.
                                  Examples: "seed=[0, 1, 2]",
                                  "model/unet=['tiny', 'small']",
                                  "+trainer.max_epochs=[10, 20]",
                                  "model.learning_rate=[1.0e-4, 2.0e-4]". If a
                                  `--sweep-prefix` is passed, the sweep
                                  arguments will be added to the command with
                                  the prefix. The keys are adapted to be
                                  compatible with Azure ML Inputs and will be
                                  available as environment variables in the
                                  job. For the examples above, the environment
                                  variables will be `AZUREML_SWEEP_seed`,
                                  `AZUREML_SWEEP_model_unet`,
                                  `AZUREML_SWEEP_trainer_max_epochs`, and
                                  `AZUREML_SWEEP_model_learning_rate`.
  --sweep-prefix TEXT             Prefix to prepend to the sweep arguments in
                                  the command. If not provided, the sweep
                                  arguments will not be added to the command.
  --max-concurrent-trials INTEGER
                                  Maximum number of concurrent trials for the
                                  sweep job.
  -l, --stream-logs               Wait for completion and stream the logs of
                                  the job.
  --source-dir PATH               Path to the directory containing the source
                                  code for the job. If not provided, the
                                  current directory is used.
  -P, --project-dir PATH          Directory containing a pyproject.toml,
                                  uv.lock and .python-version file. These
                                  files will be used to build the Docker
                                  image. If not provided, the current
                                  directory is used.
  --num-gpus INTEGER              Number of requested GPUs per node. This
                                  should typically match the number of GPUs in
                                  the compute target. If provided, the
                                  `PyTorchDistribution` will be selected.
                                  Otherwise, the `MpiDistribution` will be
                                  used and  `--executable` should be set to
                                  `'torchrun --nproc-per-node auto'` for
                                  multi-GPU PyTorch runs. Must not be set for
                                  Lightning jobs. More information at
                                  https://learn.microsoft.com/en-
                                  us/azure/machine-learning/how-to-train-
                                  distributed-gpu?view=azureml-api-2.
  --debug / --no-debug            Install debugpy on AML and run the command
                                  using debugpy. The job will not start until
                                  a remote debugger is attached. More
                                  information at
                                  https://learn.microsoft.com/en-
                                  us/azure/machine-learning/how-to-
                                  interactive-jobs?view=azureml-
                                  api-2&tabs=ui#attach-a-debugger-to-a-job.
                                  [default: no-debug]
  --tensorboard / --no-tensorboard
                                  Enable a TensorBoard interactive service for
                                  the job.  [default: tensorboard]
  --tensorboard-dir PATH          Directory in which the TensorBoard logs are
                                  expected to be stored.  [default:
                                  logs/tensorboard]
  --profiler / --no-profiler      Enable profiling on Azure ML. Needs CUDA >=
                                  12 and PyTorch >= 2.  [default: no-profiler]
  -G, --dependency-group TEXT     Dependency groups to install in the Docker
                                  image. If not provided, no dependency groups
                                  are installed. The groups are defined in the
                                  pyproject.toml file. This option can be used
                                  multiple times.
  --extra TEXT                    Optional dependency groups (extras) to
                                  install in the Docker image. If not
                                  provided, no extras are installed. The
                                  optional groups are defined in the
                                  pyproject.toml file. This option can be used
                                  multiple times.
  --conda-env-file PATH           Path to a conda environment YAML file (e.g.,
                                  environment.yml). If provided, a conda
                                  environment will be used instead of Docker
                                  build context. Cannot be used together with
                                  --build-context, --aml-environment, or uv-
                                  specific options.
  --only-env                      Exit after instantiating the environment.
                                  This is useful during development so that
                                  the AML environment build runs immediately
                                  and the job starts faster once the script is
                                  ready to be submitted.
  -E, --set TEXT                  Environment variables to set on the job. The
                                  format is `KEY=VALUE`. This option can be
                                  used multiple times.
  -D, --dry-run                   Exit before submitting the job.
  --install-completion            Install completion for the current shell.
  --show-completion               Show completion for the current shell, to
                                  copy it or customize the installation.