If you already have an AzureML workspace available, you can go straight to the last step.
To set up all your Azure resources, you need to have:
There are two ways to set up all necessary resources, either via the Azure portal or via the Azure Command-line Interface (CLI). We recommend the CLI because all necessary resources can be easily created via a single script.
If you prefer to create your workspace via the web UI on the Azure Portal, please follow the steps below.
A pureley command-line driven setup is possible via the Azure Command-line Tools. These tools are available for multiple platforms, including Linux, Mac, and Windows.
After downloading the command-line tools, you can run the following command add the ml
extension that is required to create an AzureML workspace:
az extension add --name ml
Find out which Azure data centre locations you can use:
az account list-locations -o table
You will need the location names (second column in the table) to create resources in the right geographical regions. Choosing the right region can be particularly important if your data governance requires the data to be processed inside certain geographical boundaries.
For the storage account, please choose an SKU from based on your needs as described here. Most likely, Standard_LRS
will be the right SKU for you.
The script below will create
datasets
.In the script, you will need to replace the values of the following variables:
location
- the location of the Azure datacenter you want to use.prefix
- the prefix you want to use for the resources you create. This will also be the name of the AzureML workspace.export location=uksouth # The Azure location where the resources should be created
export prefix=himl # The name of the AzureML workspace. This is also the prefix for all other resources.
export container=datasets
export datastorefile=datastore.yaml
az group create \
--name ${prefix}rg \
--location ${location}
az storage account create \
--name ${prefix}data \
--resource-group ${prefix}rg \
--location ${location} \
--sku Standard_LRS
az storage container create \
--account-name ${prefix}data \
--name ${container} \
--auth-mode
az ml workspace create \
--resource-group ${prefix}rg \
--name ${prefix} \
--location ${location}
key=$(az storage account keys list --resource-group ${prefix}rg --account-name ${prefix}data --query [0].value -o tsv)
cat >${datastorefile} <<EOL
\$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: datasets
type: azure_blob
description: Pointing to the `${container}` container in the ${prefix}data storage account.
account_name: ${prefix}data
container_name: ${container}
credentials:
account_key: ${key}
EOL
az ml datastore create --file ${datastorefile} --resource-group ${prefix}rg --workspace-name ${prefix}
rm ${datastorefile}
Note that the datastore will use the storage account key to authenticate. If you want to use Shared Access Signature (SAS) instead, replace the creation of the datastore config file in the above script with the following command:
key=$(az storage container generate-sas --account-name ${prefix}data --name ${container} --permissions acdlrw --https-only --expiry 2024-01-01 -o tsv)
cat >${datastorefile} <<EOL
\$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: ${name}
type: azure_blob
description: Pointing to the `${container}` container in the ${prefix}data storage account.
account_name: ${prefix}data
container_name: ${container}
credentials:
sas_token: ${key}
EOL
You can adjust the expiry date of the SAS token and the permissions of the SAS token (full read/write permission
in the script above). For further options, run az storage container generate-sas --help
Now that you have created the core AzureML workspace, you need to create a compute cluster.
To adjust permissions, find the AzureML workspace that you just created in the Azure Portal. Add yourself and your team members with “Contributor” permissions to the workspace, following the guidelines here.
The hi-ml
toolbox relies on a workspace configuration file called config.json
to access the right AzureML workspace.
This file can be downloaded from the UI of the workspace. It needs to be placed either in your copy of the hi-ml
repository,
or in your repository that uses the hi-ml
package.
config.json
. Copy that file to the root folder of your repository.The file config.json
should look like this:
{
"subscription_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"resource_group": "myresourcegroup",
"workspace_name": "myworkspace"
}
As an alternative to keeping the config.json
file in your repository, you can specify the
necessary information in environment variables. The environment variables are:
HIML_SUBSCRIPTION_ID
: The subscription ID of the AzureML workspace, taken from the subscription_id
field in the
config.json
file.HIML_RESOURCE_GROUP
: The resource group of the AzureML workspace, taken from the resource_group
field in the
config.json
file.HIML_WORKSPACE_NAME
: The name of the AzureML workspace, taken from the workspace_name
field in the config.json
file.When accessing the workspace, the hi-ml
toolbox will first look for the config.json
file. If it is not found, it
will fall back to the environment variables. For details, see the documentation of the get_workspace
function in
readthedocs.