Stream chat with flex flow#

Authored by:  Avatar AvatarOpen on GitHub

Learning Objectives - Upon completing this tutorial, you should be able to:

  • Write LLM application using class based flex flow.

  • Use AzureOpenAIModelConfiguration as class init parameter.

  • Use prompty to stream completions.

  • Convert the application into a flow and batch run against multi lines of data.

  • Use classed base flow to evaluate the main flow and learn how to do aggregation.

0. Install dependent packages#

%%capture --no-stderr
%pip install -r ./requirements.txt

1. Trace your application with promptflow#

Assume we already have a python program, which leverage prompty.

with open("flow.py") as fin:
    print(fin.read())

When stream=true is configured in the parameters of a prompt whose output format is text, promptflow sdk will return a generator type, which item is the content of each chunk.

Reference openai doc on how to do it using plain python code: how_to_stream_completions

with open("chat.prompty") as fin:
    print(fin.read())

Create necessary connections#

Connection helps securely store and manage secret keys or other sensitive credentials required for interacting with LLM and other external tools for example Azure Content Safety.

Above prompty uses connection open_ai_connection inside, we need to set up the connection if we haven’t added it before. After created, it’s stored in local db and can be used in any flow.

Prepare your Azure OpenAI resource follow this instruction and get your api_key if you don’t have one.

from promptflow.client import PFClient
from promptflow.connections import AzureOpenAIConnection, OpenAIConnection

# client can help manage your runs and connections.
pf = PFClient()
try:
    conn_name = "open_ai_connection"
    conn = pf.connections.get(name=conn_name)
    print("using existing connection")
except:
    # Follow https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal to create an Azure OpenAI resource.
    connection = AzureOpenAIConnection(
        name=conn_name,
        api_key="<your_AOAI_key>",
        api_base="<your_AOAI_endpoint>",
        api_type="azure",
    )

    # use this if you have an existing OpenAI account
    # connection = OpenAIConnection(
    #     name=conn_name,
    #     api_key="<user-input>",
    # )

    conn = pf.connections.create_or_update(connection)
    print("successfully created connection")

print(conn)

Visualize trace by using start_trace#

Note we add @trace in the my_llm_tool function, re-run below cell will collect a trace in trace UI.

from promptflow.tracing import start_trace
from promptflow.core import AzureOpenAIModelConfiguration

from flow import ChatFlow

# create a chatFlow obj with connection
config = AzureOpenAIModelConfiguration(
    connection="open_ai_connection", azure_deployment="gpt-4o"
)
chat_flow = ChatFlow(config)

# start a trace session, and print a url for user to check trace
start_trace()

# run the flow as function, which will be recorded in the trace
result = chat_flow(question="What is ChatGPT? Please explain with detailed statement")
# note the type is generator object as we enabled stream in prompty
result
import time

# print result in stream manner
for r in result:
    print(r, end="")
    # For better animation effects
    time.sleep(0.01)
result = chat_flow(question="What is ChatGPT? Please explain with consise statement")
answer = "".join(result)
answer

Eval the result#

%load_ext autoreload
%autoreload 2

import paths  # add the code_quality module to the path
from check_list import EvalFlow

eval_flow = EvalFlow(config)
# evaluate answer agains a set of statement
eval_result = eval_flow(
    answer=answer,
    statements={
        "correctness": "It contains a detailed explanation of ChatGPT.",
        "consise": "It is a consise statement.",
    },
)
eval_result

2. Batch run the function as flow with multi-line data#

Batch run with a data file (with multiple lines of test data)#

from promptflow.client import PFClient

pf = PFClient()
data = "./data.jsonl"  # path to the data file
# create run with the flow function and data
base_run = pf.run(
    flow=chat_flow,
    data=data,
    column_mapping={
        "question": "${data.question}",
        "chat_history": "${data.chat_history}",
    },
    stream=True,
)
details = pf.get_details(base_run)
details.head(10)

3. Evaluate your flow#

Then you can use an evaluation method to evaluate your flow. The evaluation methods are also flows which usually using LLM assert the produced output matches certain expectation.

Run evaluation on the previous batch run#

The base_run is the batch run we completed in step 2 above, for web-classification flow with “data.jsonl” as input.

eval_run = pf.run(
    flow=eval_flow,
    data="./data.jsonl",  # path to the data file
    run=base_run,  # specify base_run as the run you want to evaluate
    column_mapping={
        "answer": "${run.outputs.output}",
        "statements": "${data.statements}",
    },
    stream=True,
)
details = pf.get_details(eval_run)
details.head(10)
import json

metrics = pf.get_metrics(eval_run)
print(json.dumps(metrics, indent=4))
pf.visualize([base_run, eval_run])

Next steps#

By now you’ve successfully run your chat flow and did evaluation on it. That’s great!

You can check out more examples:

  • Stream Chat: demonstrates how to create a chatbot that runs in streaming mode.