Examples
Basic submission
Submit a training script to Azure ML:
With a specific experiment name and run name:
Pass extra arguments to the script after --:
With a specific experiment name and run name:
Pass extra arguments to the script:
Choosing a compute target
Multi-node training
MPI (default)
When --num-gpus is not set, MPI distribution is used:
PyTorch distributed
Set --num-gpus to enable
PyTorchDistribution:
This configures 4 processes per node across 2 nodes (8 GPUs total).
Sweep jobs
Run a grid sweep over hyperparameters:
Limit concurrent trials:
Data mounting
Datasets are passed to the job as
Input
objects.
Mount a dataset:
Download a dataset:
Use outputs from a previous job:
submit_to_aml(
script_path="train.py",
datasets_mount=["data=MY-DATASET:2"],
)
# Or download instead of mount
submit_to_aml(
script_path="train.py",
datasets_download=["data=MY-DATASET"],
)
# Use outputs from a previous job
submit_to_aml(
script_path="evaluate.py",
datasets_mount=["checkpoint=job_dir:my-training-job:models/best.pth"],
)
Configure an output datastore:
Environment management
Docker build context (default)
By default, submit-aml builds a Docker context from your project's
pyproject.toml, uv.lock, and .python-version:
Custom Docker image
You can use any image, including ones from the Azure ML containers repo:
Existing Azure ML environment
Conda environment
Setting environment variables
Debugging
Enable remote debugging with debugpy:
This installs debugpy, starts the script with a debug listener on port 5678,
and adds a VS Code service for remote connection.
Dry run
Preview the job configuration without submitting:
Stream logs
Submit and wait for the job to complete, streaming logs: