Flow run management in Azure#

Authored by:  Avatar AvatarOpen on GitHub

Requirements - In order to benefit from this tutorial, you will need:

Learning Objectives - By the end of this tutorial, you should be able to:

  • create run with remote data

  • create run which references another runs inputs

  • manage runs via run.yaml

  • create run with connection override

Motivations - This guide will walk you through cloud run management abilities.

0. Install dependent packages#

%pip install -r ../../requirements.txt

1. Connect to Azure Machine Learning Workspace#

The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

1.1 Import the required libraries#

from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml.entities import Data
from azure.core.exceptions import ResourceNotFoundError

from promptflow.azure import PFClient
from promptflow.entities import Run

1.2 Configure credential#

We are using DefaultAzureCredential to get access to workspace. DefaultAzureCredential should be capable of handling most Azure SDK authentication scenarios.

Reference for more available credentials if it does not work for you: configure credential example, azure-identity reference doc.

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

1.3 Get a handle to the workspace#

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. Check this notebook for configure a workspace

# Get a handle to workspace
pf = PFClient.from_config(credential=credential)

1.4 Create necessary connections#

Connection helps securely store and manage secret keys or other sensitive credentials required for interacting with LLM and other external tools for example Azure Content Safety.

In this notebook, we will use flow web-classification which uses connection open_ai_connection inside, we need to set up the connection if we haven’t added it before.

Prepare your Azure OpenAI resource follow this instruction and get your api_key if you don’t have one.

Please go to workspace portal, click Prompt flow -> Connections -> Create, then follow the instruction to create your own connections. Learn more on connections.

2. Create run with remote data#

Instead of relying on local files, there may be situations where you want to reuse data that’s already available in your workspace when submitting a flow. The following code cells show how to create flow run with remote data.

2.1 Create or update remote data#

data_name, data_version = "flow_run_test_data", "1"

try:
    data = pf.ml_client.data.get(name=data_name, version=data_version)
except ResourceNotFoundError:
    data = Data(
        name=data_name,
        version=data_version,
        path=f"../../flows/standard/web-classification/data.jsonl",
        type="uri_file",
    )
    data = pf.ml_client.data.create_or_update(data)

2.2 Prepare remote data id#

data_id = f"azureml:{data.name}:{data.version}"
print(data_id)

2.3 Create a flow run with remote data#

You can change instance type or idle time of the runtime or reset it to clean state. The following code cell shows how to do so.

# create run
run = Run(
    # local flow file
    flow="../../flows/standard/web-classification",
    # remote data
    data=data_id,
    # to customize runtime instance type and compute instance, you can provide them in resources
    # resources={
    #     "instance_type": "STANDARD_DS11_V2",
    #     "compute": "my_compute_instance"
    # }
    # to customize identity, you can provide them in identity
    # identity={
    #     "type": "managed",
    # }
)

base_run = pf.runs.create_or_update(run=run)

2.4 Stream the flow run to make sure it runs successfully#

pf.runs.stream(base_run)

3 Create a flow run which uses an existing run’s inputs#

When running a flow with an existing run, you can reference either it’s inputs or outputs in column mapping. The following code cell show how to reference a run’s inputs in column mapping.

run = Run(
    # local flow file
    flow="../../flows/standard/web-classification",
    # run name
    run=run,
    column_mapping={
        # reference another run's input data columns
        "url": "${run.inputs.url}",
        "answer": "${run.inputs.answer}",
        "evidence": "${run.inputs.evidence}",
    },
)

base_run = pf.runs.create_or_update(
    run=run,
)

pf.runs.stream(base_run)

4. Create a flow run with connection override#

Sometime you want to switch connection or deployment name inside a flow when submitting it. Connection override provided an easy way to do it without changing original flow.dag.yaml. In the following code cell, we will submit flow web-classification and override it’s connection open_ai_connection to azure_open_ai_connection. Please make sure the connection azure_open_ai_connection exists in your workspace.

run = Run(
    # local flow file
    flow="../../flows/standard/web-classification",
    data="../../flows/standard/web-classification/data.jsonl",
    # override connection for node classify_with_llm & summarize_text_content
    connections={
        "classify_with_llm": {"connection": "azure_open_ai_connection"},
        "summarize_text_content": {"connection": "azure_open_ai_connection"},
    },
)

base_run = pf.runs.create_or_update(
    run=run,
)

pf.runs.stream(base_run)