hi-ml

Setting up Azure

If you already have an AzureML workspace available, you can go straight to the last step.

To set up all your Azure resources, you need to have:

There are two ways to set up all necessary resources, either via the Azure portal or via the Azure Command-line Interface (CLI). We recommend the CLI because all necessary resources can be easily created via a single script.

Creating an AzureML workspace via the Azure Portal

If you prefer to create your workspace via the web UI on the Azure Portal, please follow the steps below.

Creating an AzureML workspace via the Azure Command-line Tools

A pureley command-line driven setup is possible via the Azure Command-line Tools. These tools are available for multiple platforms, including Linux, Mac, and Windows.

After downloading the command-line tools, you can run the following command add the ml extension that is required to create an AzureML workspace:

az extension add --name ml

Documentation

Collecting the necessary information

Find out which Azure data centre locations you can use:

az account list-locations -o table

You will need the location names (second column in the table) to create resources in the right geographical regions. Choosing the right region can be particularly important if your data governance requires the data to be processed inside certain geographical boundaries.

For the storage account, please choose an SKU from based on your needs as described here. Most likely, Standard_LRS will be the right SKU for you.

Creating resources

The script below will create

In the script, you will need to replace the values of the following variables:

export location=uksouth     # The Azure location where the resources should be created
export prefix=himl          # The name of the AzureML workspace. This is also the prefix for all other resources.
export container=datasets
export datastorefile=datastore.yaml
az group create \
    --name ${prefix}rg \
    --location ${location}
az storage account create \
    --name ${prefix}data \
    --resource-group ${prefix}rg \
    --location ${location} \
    --sku Standard_LRS
az storage container create \
    --account-name ${prefix}data \
    --name ${container} \
    --auth-mode
az ml workspace create \
    --resource-group ${prefix}rg \
    --name ${prefix} \
    --location ${location}
key=$(az storage account keys list --resource-group ${prefix}rg --account-name ${prefix}data --query [0].value -o tsv)
cat >${datastorefile} <<EOL
\$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: datasets
type: azure_blob
description: Pointing to the `${container}` container in the ${prefix}data storage account.
account_name: ${prefix}data
container_name: ${container}
credentials:
  account_key: ${key}
EOL
az ml datastore create --file ${datastorefile} --resource-group ${prefix}rg --workspace-name ${prefix}
rm ${datastorefile}

Note that the datastore will use the storage account key to authenticate. If you want to use Shared Access Signature (SAS) instead, replace the creation of the datastore config file in the above script with the following command:

key=$(az storage container generate-sas --account-name ${prefix}data --name ${container} --permissions acdlrw --https-only --expiry 2024-01-01 -o tsv)
cat >${datastorefile} <<EOL
\$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: ${name}
type: azure_blob
description: Pointing to the `${container}` container in the ${prefix}data storage account.
account_name: ${prefix}data
container_name: ${container}
credentials:
  sas_token: ${key}
EOL

You can adjust the expiry date of the SAS token and the permissions of the SAS token (full read/write permission

in the script above). For further options, run az storage container generate-sas --help

Creating compute clusters and permissions

Now that you have created the core AzureML workspace, you need to create a compute cluster.

To adjust permissions, find the AzureML workspace that you just created in the Azure Portal. Add yourself and your team members with “Contributor” permissions to the workspace, following the guidelines here.

Accessing the workspace

The hi-ml toolbox relies on a workspace configuration file called config.json to access the right AzureML workspace. This file can be downloaded from the UI of the workspace. It needs to be placed either in your copy of the hi-ml repository, or in your repository that uses the hi-ml package.

The file config.json should look like this:

{
  "subscription_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "resource_group": "myresourcegroup",
  "workspace_name": "myworkspace"
}

As an alternative to keeping the config.json file in your repository, you can specify the necessary information in environment variables. The environment variables are:

When accessing the workspace, the hi-ml toolbox will first look for the config.json file. If it is not found, it will fall back to the environment variables. For details, see the documentation of the get_workspace function in readthedocs.