Stream chat with async flex flow#
Learning Objectives - Upon completing this tutorial, you should be able to:
Write LLM application using class based flex flow.
Use AzureOpenAIModelConfiguration as class init parameter.
Use prompty to stream completions.
Convert the application into a async flow and batch run against multi lines of data.
Use classed base flow to evaluate the main flow and learn how to do aggregation.
0. Install dependent packages#
%%capture --no-stderr
%pip install -r ./requirements.txt
1. Trace your application with promptflow#
Assume we already have a python program, which leverage prompty.
with open("flow.py") as fin:
print(fin.read())
When stream=true
is configured in the parameters of a prompt whose output format is text, promptflow sdk will return a generator type, which item is the content of each chunk.
Reference openai doc on how to do it using plain python code: how_to_stream_completions
with open("chat.prompty") as fin:
print(fin.read())
Create necessary connections#
Connection helps securely store and manage secret keys or other sensitive credentials required for interacting with LLM and other external tools for example Azure Content Safety.
Above prompty uses connection open_ai_connection
inside, we need to set up the connection if we haven’t added it before. After created, it’s stored in local db and can be used in any flow.
Prepare your Azure OpenAI resource follow this instruction and get your api_key
if you don’t have one.
from promptflow.client import PFClient
from promptflow.connections import AzureOpenAIConnection, OpenAIConnection
# client can help manage your runs and connections.
pf = PFClient()
try:
conn_name = "open_ai_connection"
conn = pf.connections.get(name=conn_name)
print("using existing connection")
except:
# Follow https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal to create an Azure OpenAI resource.
connection = AzureOpenAIConnection(
name=conn_name,
api_key="<your_AOAI_key>",
api_base="<your_AOAI_endpoint>",
api_type="azure",
)
# use this if you have an existing OpenAI account
# connection = OpenAIConnection(
# name=conn_name,
# api_key="<user-input>",
# )
conn = pf.connections.create_or_update(connection)
print("successfully created connection")
print(conn)
Visualize trace by using start_trace#
Note we add @trace
in the my_llm_tool
function, re-run below cell will collect a trace in trace UI.
from promptflow.tracing import start_trace
from promptflow.core import AzureOpenAIModelConfiguration
from flow import ChatFlow
# create a chatFlow obj with connection
config = AzureOpenAIModelConfiguration(
connection="open_ai_connection", azure_deployment="gpt-4o"
)
chat_flow = ChatFlow(config)
# start a trace session, and print a url for user to check trace
start_trace()
# run the flow as function, which will be recorded in the trace
result = chat_flow(question="What is ChatGPT? Please explain with detailed statement")
# note the type is async generator object as we enabled stream in prompty
result
import asyncio
# print result in stream manner
async for output in result:
print(output, end="")
await asyncio.sleep(0.01)
result = chat_flow(question="What is ChatGPT? Please explain with consise statement")
answer = ""
async for output in result:
answer += output
answer
Eval the result#
%load_ext autoreload
%autoreload 2
import paths # add the code_quality module to the path
from check_list import EvalFlow
eval_flow = EvalFlow(config)
# evaluate answer agains a set of statement
eval_result = eval_flow(
answer=answer,
statements={
"correctness": "It contains a detailed explanation of ChatGPT.",
"consise": "It is a consise statement.",
},
)
eval_result
2. Batch run the function as flow with multi-line data#
Batch run with a data file (with multiple lines of test data)#
from promptflow.client import PFClient
pf = PFClient()
data = "./data.jsonl" # path to the data file
# create run with the flow function and data
base_run = pf.run(
flow=chat_flow,
data=data,
column_mapping={
"question": "${data.question}",
"chat_history": "${data.chat_history}",
},
stream=True,
)
details = pf.get_details(base_run)
details.head(10)
3. Evaluate your flow#
Then you can use an evaluation method to evaluate your flow. The evaluation methods are also flows which usually using LLM assert the produced output matches certain expectation.
Run evaluation on the previous batch run#
The base_run is the batch run we completed in step 2 above, for web-classification flow with “data.jsonl” as input.
eval_run = pf.run(
flow=eval_flow,
data="./data.jsonl", # path to the data file
run=base_run, # specify base_run as the run you want to evaluate
column_mapping={
"answer": "${run.outputs.output}",
"statements": "${data.statements}",
},
stream=True,
)
details = pf.get_details(eval_run)
details.head(10)
import json
metrics = pf.get_metrics(eval_run)
print(json.dumps(metrics, indent=4))
pf.visualize([base_run, eval_run])
Next steps#
By now you’ve successfully run your chat flow and did evaluation on it. That’s great!
You can check out more examples:
Stream Chat: demonstrates how to create a chatbot that runs in streaming mode.