Copilot Studio Response Analysis Tool :computer:
Purpose:
Provide a lightweight, developer-friendly tool to -
- Measure Copilot Agent response-time performance and correlate it with output size/tokens. :telescope:
- Get metrics (Mean, Median, Max, Min, Standard Deviation) and visual charts to understand Copilot Agent response time trends and variability. :movie_camera:
- Aggregated real-time metrics, charts and tables to spot spikes, drift, and outliers across single conversation. :bar_chart:
- Trace planner steps: tool invocations, and arguments to view and validate dynamic plan composition. :computer:
- Each planner step includes Thought, Tool, and Arguments, which together explain why the agent chose a path and how it executed it.
- Compare planner steps across queries highlights tool calls and reasoning.
- Automatically generates a CSV file containing all queries, their responses, and corresponding response times. :floppy_disk:
Interpretion:
1. Statistics Tab :mag_right:
Provides an overview of Copilot Agent performance metrics, including response time summaries (Mean, Median, Max, Min), variability (Standard Deviation), token correlation, and visual charts for trends and distribution.
*Response Statistics:* :chart_with_upwards_trend:
| Metric | Description | Purpose | | :——- | :———- | :———- | | Mean | The average response time across all responses. | Gives an overall sense of typical performance but can be skewed by very high or low values. | Median | The middle value when all response times are arranged in ascending order. | Represents the central tendency and is less affected by outliers than the mean. Useful for understanding the “typical” response time. | Max | The longest response time recorded during the test run. | Highlights the worst-case scenario for latency, which is critical for identifying performance bottlenecks. | Min | The shortest response time recorded during the test run. | Shows the best-case performance and can indicate the system’s potential under optimal conditions. | Standard Deviation | Measures how much response times vary from the average. | Helps assess consistency—low SD means stable performance, high SD indicates fluctuating response times. | Token Correlation | Represents the correlation between Cresponse time and the number of tokens in response. | Indicate orchestrator efficiency—higher cost may indicate longer responses or slower performance.
*Response Time Analysis:* :chart_with_downwards_trend:
| Chart | Description | | :——- | :———- | | Line Chart | Shows how Copilot Agent response time changes across individual queries to identify spikes or trends. | | Box Plot | Summarizes overall response time distribution, highlighting consistency and outliers for performance benchmarking. |
2. Data Tab :calendar:
Displays detailed per-query information, including the user’s prompt, Copilot Agent response, response time, output size, and step-by-step planner actions with tools and arguments—used for debugging and performance analysis.
Query Response / Time Data:
A per‑query ledger linking the user prompt, the agent’s full reply, its latency, and output size to diagnose performance.
| Metric | Description |
|---|---|
| Serial | Sequence number of the query in the run. |
| Query | Exact user prompt/utterance. |
| Response | Agent’s generated reply (full text). |
| Min | Latency to produce the response (typically in seconds or ms). |
| Char | Output length indicator (character count). |
LLM Planner Steps Data:
A step‑by‑step trace of the agent’s planning, tools, and arguments that explains how each response was produced.
| Metric | Description |
|---|---|
| Serial | Sequence number matching the query above. |
| Query | Exact user prompt/utterance. |
| PlannerStep | Named step decided by orchestrator. |
| Thought | Model’s internal reasoning summary for the step (high-level) |
| Tool | Action or connector invoked. |
| Arguments | Parameters passed / revieved. |
Prerequisite:
To set up this sample, you will need the following:
- Python version 3.9 or higher
- An Agent Created in Microsoft Copilot Studio or access to an existing Agent.
- Ability to Create an Application Identity in Azure for a Public Client/Native App Registration Or access to an existing Public Client/Native App registration with the
CopilotStudio.Copilots.InvokeAPI Permission assigned.
Authentication:
The Copilot Studio Response Analysis Tool requires a User Token to operate. For this sample, we are using a user interactive flow to get the user token for the application ID created above. Other flows are allowed.
[!Important] The token is cached in the user machine in
.local_token_cache.json
Step 1. Create an Agent in Copilot Studio.
- Create an Agent in Copilot Studio
- Publish your newly created Copilot
- Goto Settings => Advanced => Metadata and copy the following values, You will need them later:
- Schema name
- Environment Id
Step 2. Create an Application Registration in Entra ID.
This step will require permissions to create application identities in your Azure tenant. For this sample, you will create a Native Client Application Identity, which does not have secrets.
- Open https://portal.azure.com
- Navigate to Entra Id
- Create a new App Registration in Entra ID
- Provide a Name
- Choose “Accounts in this organization directory only”
- In the “Select a Platform” list, Choose “Public Client/native (mobile & desktop)
- In the Redirect URI url box, type in
http://localhost(note: use HTTP, not HTTPS) - Then click register.
- In your newly created application
- On the Overview page, Note down for use later when configuring the example application:
- The Application (client) ID
- The Directory (tenant) ID
- Go to API Permissions in
Managesection - Click Add Permission
- In the side panel that appears, Click the tab
API's my organization uses - Search for
Power Platform API.- If you do not see
Power Platform APIsee the note at the bottom of this section.
- If you do not see
- In the Delegated permissions list, choose
CopilotStudioand CheckCopilotStudio.Copilots.Invoke - Click
Add Permissions
- In the side panel that appears, Click the tab
- (Optional) Click
Grant Admin consent for copilotsdk
- On the Overview page, Note down for use later when configuring the example application:
If you do not see
Power Platform APIin the list of API’s your organization uses, you need to add the Power Platform API to your tenant. To do that, goto Power Platform API Authentication and follow the instructions on Step 2 to add the Power Platform Admin API to your Tenant
Step 3. Configure the Copilot Studio Response Analysis Tool.
With the above information, you can now run the Copilot Studio Response Analysis Tool sample.
- Open the
env.TEMPLATEfile and rename it to.env. - Configure the values based on what was recorded during the setup phase.
COPILOTSTUDIOAGENT__ENVIRONMENTID="" # Environment ID of environment with the CopilotStudio App.
COPILOTSTUDIOAGENT__SCHEMANAME="" # Schema Name of the Copilot to use
COPILOTSTUDIOAGENT__TENANTID="" # Tenant ID of the App Registration used to login, this should be in the same tenant as the Copilot.
COPILOTSTUDIOAGENT__AGENTAPPID="" # App ID of the App Registration used to login, this should be in the same tenant as the CopilotStudio environment.
-
Run
pip install -r requirements.txtto install all dependencies. -
List test utterances sequentially in the
/data/input.txtfile. Marke the end of the file with “exit”.
Step 4. Run the Copilot Studio Response Analysis Tool.
- Run the Copilot Studio Response Analysis Tool using. This should challenge you to login and connect to the Copilot Studio Hosted agent
python -m src.main
-
The command displays the local URL hosting the ap.
-
Copy the URL in browser and load application.
-
Click
Start Test Runbutton the console. This initiates a test run for a test utterances. The tool uses M365 Agent SDK to send utterance in/data/input.txtfile and recieve response and logs for representation.
[!Important] Cross check test utterances are sequentially listed in the
/data/input.txtfile.
If the tool is properly setup,
Process Statusdisplays the current state of processing, including the number of utterances analyzed and conversation identifiers.
Start Test Runbutton woudl be disabled till completion of the session.
Utterances in the data file can be updated or altered after each session and the session re-executed.
After each test run, the tool automatically generates a CSV file containing all queries, their responses, and corresponding response times. The file is stored in the
/data/directory for easy access.
If any utterances appear to be missing in the result, restart the tool and start a new session.