Browse source on GitHub

PytestAgentsSDK

This project provides a sample test harness for evaluating Copilot Studio agents using Pytest and DeepEval. It uses the Microsoft 365 Agents SDK to communicate with Copilot Studio and focuses on semantic evaluation of agent responses using DeepEval’s GEval metric.

Features

  • Multi-turn conversation testing against a Copilot Studio agent
  • Semantic response evaluation using DeepEval’s GEval metric
  • Loads test cases from a CSV file
  • Custom HTML reporting with detailed metadata (user input, actual and expected output, score, reason)
  • Authentication via MSAL, supporting “Authenticate with Microsoft” in Copilot Studio
  • Easily extensible for use with additional metrics and long-term result tracking using DeepEval and Pytest plugins

Setup

1. Clone the repository

git clone https://github.com/microsoft/CopilotStudioSamples.git
cd CopilotStudioSamples/FunctionalTesting/PytestAgentsSDK

2. Create and activate a virtual environment

python3 -m venv .venv
source .venv/bin/activate  # On Windows use `.venv\Scripts\activate`

3. Install required dependencies

pip install -r requirements.txt

4. Create an app registration

You will need to register an application in Azure for the SDK to authenticate with Copilot Studio:

  • Create a single-tenant app registration in Azure
  • Under Authentication → Platform configurations, click Add a platform, and select Mobile and desktop applications
  • Add these redirect URIs:
    • msal40347a26-35bb-48f3-bdc4-7f4f209aecb1://auth (MSAL only)
    • http://localhost
  • Under API permissions, click Add a permission
    • Choose APIs my organization uses, then search for Power Platform API
    • Choose Delegated permissions, then add CopilotStudio.Copilots.Invoke

Note: If the Power Platform API doesn’t appear, visibility can be stale — run the refresh script in the Microsoft docs.

5. Authentication and Agent details

Create a .env file (you can copy from .env.template) and populate it with your MSAL and Copilot Studio agent configuration:

APP_CLIENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
TENANT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
ENVIRONMENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
AGENT_IDENTIFIER=cr26e_dMyAgent  # This is the schema name, found under Settings > Advanced > Metadata

6. Configure Azure OpenAI or OpenAI details

You can use either OpenAI or Azure OpenAI with DeepEval.

To configure Azure OpenAI using the DeepEval CLI:

deepeval set-azure-openai \
    --openai-endpoint=<endpoint> \                     # e.g. https://example-resource.openai.azure.com/
    --openai-api-key=<api_key> \
    --openai-model-name=<model_name> \                 # e.g. gpt-4o
    --deployment-name=<deployment_name> \              # e.g. Test Deployment
    --openai-api-version=<openai_api_version>          # e.g. 2025-01-01-preview

These values will be stored in a local .deepeval configuration file.

Alternatively, if you’re using OpenAI (not Azure), set the following environment variable:

export OPENAI_API_KEY=<your-openai-key>

7. Publish and set agent authentication

Before running tests, ensure that your Copilot Studio agent is:

8. Prepare Test Cases (CSV Input)

Before running the tests, populate the CSV file at input/test_cases.csv with your test cases.

The CSV file must contain two columns:

  • input_text: The message sent to the Copilot Studio agent
  • expected_output: The ideal response you’d expect from the agent

Example:

input_text,expected_output
What is the capital of France?,The capital of France is Paris, which is known for its historical landmarks like the Eiffel Tower and the Louvre Museum.
Who wrote 'Hamlet'?,William Shakespeare wrote the play 'Hamlet', which is considered one of the greatest works of English literature.
What is the chemical symbol for water?,H3O is the correct chemical symbol for water.

Running the Tests

From the PytestAgentsSDK directory, run:

pytest tests/multi_turn_eval_openai.py --html=reports/multi_turn_eval_openai.html --self-contained-html

This will:

  • Start a conversation with your Copilot Studio agent
  • Send test questions and capture responses
  • Evaluate the responses using DeepEval
  • Generate a self-contained HTML report in the reports/ folder

Output

The HTML report includes:

  • Pass/Fail status based on semantic threshold
  • User message and expected answer
  • Actual response from the agent
  • DeepEval score
  • Explanation for the result
  • Conversation ID (for debugging)