Local Setup: run a sample benchmark pipeline on AzureML
Objectives - By following this tutorial, you will be able to setup resources in Azure to be able to run the pipelines in this repo.:
Requirements - To enjoy this tutorial, you first need to: - install the local python requirements. - provision Azure resources first, and have a working AzureML workspace.
A. Edit config files to point to your AzureML workspace
To be able to submit the benchmark pipelines in AzureML, you need to edit some configuration files with the right references to connect to your AzureML resources.
-
Edit file under
conf/aml/custom.yaml
to match with your AzureML workspace references:# @package _group_ subscription_id: TODO resource_group: TODO workspace_name: TODO tenant: TODO auth: "interactive"
-
Edit file under
conf/compute/custom.yaml
to match with the name of your compute targets in AzureML. Check below for reference. If you haven't created a gpu cluster, you can leave the config file as is for the gpu lines.# @package _group_ linux_cpu: "cpu-cluster" linux_gpu: "linux-gpu-nv6" windows_cpu: "win-cpu"
Note
Configs the repo asusme you use custom
as name to find your aml/compute config. If in the future you have multiple aml/compute configs (ex: myotheraml.yaml
), when you'll want to run a pipeline, use arguments aml=myotheraml compute=myotheraml
to override.
B. Verify your setup: run a sample pipeline in your workspace
Running a pipeline consists in launching a python script with a pipeline configuration file.
For instance, when you run:
python pipelines/azureml/pipelines/data_generation.py --exp-config pipelines/azureml/conf/experiments/data-generation.yaml
The python script will build a pipeline based on the collection of manual scripts, each running in its own python environment. The configuration for the parameters from each scripts will be provided from the configuration file in conf/experiments/data-generation.yaml
.
# This experiment generates multiple synthetic datasets for regression
# with varying number of features
#
# to execute:
# > python src/pipelines/azureml/data_generation.py --exp-config conf/experiments/data-generation.yaml
defaults:
- aml: custom
- compute: custom
### CUSTOM PARAMETERS ###
experiment:
name: "data_generation_dev"
description: "something interesting to say about this"
data_generation_config:
# name of your particular benchmark
benchmark_name: "benchmark-dev" # override this with a unique name
# DATA
tasks:
- task: "regression"
train_samples: 100000
test_samples: 10000
inferencing_samples: 10000
n_features: 10
n_informative: 10
- task: "lambdarank"
train_samples: 100
test_samples: 100
inferencing_samples: 100
n_features: 10
n_informative: 13
n_label_classes: 5
docs_per_query: 10
train_partitions: 7
- task: "classification"
train_samples: 100
test_samples: 100
inferencing_samples: 100
n_features: 10
n_informative: 13
n_label_classes: 3
register_outputs: false
register_outputs_prefix: "data-synthetic" # "{prefix}-{task}-{n_features}cols-{samples}samples-{train|test|inference}"
Running the python command should open a browser to your workspace opening the experiment view.