Task 01 - Create a red team agent and baseline

Introduction

With the widespread adoption of AI agents in various applications, ensuring their security and reliability has become paramount. Red teaming is a proactive approach to identify vulnerabilities and weaknesses in AI systems by simulating real-world attacks and adversarial scenarios. By conducting red teaming exercises, organizations can uncover potential risks, assess the robustness of their AI models, and implement necessary safeguards to mitigate threats.

Description

In this task, you will create an AI Red Teaming Agent using Microsoft Foundry and run it locally. You will configure the agent to perform automated scans against your deployed AI models, using a set of predefined attack prompts. You will also establish a baseline for your models’ performance and behavior, which will help you identify any anomalies or vulnerabilities during the red teaming exercises.

Success Criteria

  • You have created an AI Red Teaming Agent in Microsoft Foundry.
  • You have configured the agent with a set of attack prompts.
  • You have established a baseline for your AI models.

Learning Resources

AI Red Teaming Agent Run AI Red Teaming Agent locally Run AI Red Teaming Agent in the cloud

Key Tasks

01: Create the AI Red Teaming Agent

The Azure AI Red Teaming Agent is a powerful tool that can help you identify potential risks and vulnerabilities in your AI models. To create the agent, you will need to use the azure-ai-evaluation[redteam] library.

Expand this section to view the solution

You have already installed the azure-ai-evaluation[redteam] library as part of product setup. With this library, you can create an AI Red Teaming Agent that can perform automated scans against your deployed AI models.

Create a new file called redTeamingAgent_initializer.py in the src/app/agents directory of your project. Add the following code to the file on lines 1-8:

# Azure imports
from azure.identity import DefaultAzureCredential
from azure.ai.evaluation.red_team import RedTeam, RiskCategory, AttackStrategy
from pyrit.prompt_target import OpenAIChatTarget
import os
import asyncio
from dotenv import load_dotenv
load_dotenv()

The AI Red Teaming Agent uses the PyRIT library to define the target model for red teaming. In this example, we will use the OpenAIChatTarget class to define the target model. After the imports, add the following code on lines 10-11 to define the Azure AI project information:

# Azure AI Project Information
azure_ai_project = os.getenv("AZURE_AI_AGENT_ENDPOINT")

Next, instantiate the AI Red Teaming Agent with the desired configuration. Add the following code on lines 13-24:

# Instantiate your AI Red Teaming Agent
red_team_agent = RedTeam(
    azure_ai_project=azure_ai_project,
    credential=DefaultAzureCredential(),
    risk_categories=[
        RiskCategory.Violence,
        RiskCategory.HateUnfairness,
        RiskCategory.Sexual,
        RiskCategory.SelfHarm
    ],
    num_objectives=5,
)

The risk_categories parameter is optional and it specifies the types of risks that the agent will look for during the scans. There are eight risk categories in total and you can include any combination of the eight in your scans. The num_objectives parameter specifies the number of unique attack prompts that the agent will generate for each scan. By default, the agent generates 10 unique attack prompts per scan.

Once you have the red team agent instantiated, you will need to define the target model for red teaming. We will first define a simple test target function that simulates an AI assistant. Add the following code on lines 26-27:

def test_chat_target(query: str) -> str:
    return "I am a simple AI assistant that follows ethical guidelines. I'm sorry, Dave. I'm afraid I can't do that."

This function simulates an AI assistant that follows ethical guidelines and refuses to comply with harmful requests but it does not represent actual interactions with a deployed AI model.

After defining the test target function, add the following code to execute the red team scan:

async def main():
    red_team_result = await red_team_agent.scan(target=test_chat_target)

asyncio.run(main())

In order to run the red team agent, navigate to the src/ directory of your project in a terminal window. Ensure that you are in the correct virtual environment and run the following command:

python app/agents/redTeamingAgent_initializer.py

02: Review the results in Microsoft Foundry

Each execution of the AI Red Teaming Agent generates a report that summarizes the findings of the scan. The report includes information about the attack prompts used, the responses generated by the target model, and any risks or vulnerabilities identified during the scan. Navigate to Microsoft Foundry and select your AI project. In the left-hand menu, select Evaluation from the Protect and govern section. Then, select the AI red teaming tab in the Assess and compare AI application performance menu.

Review the overall report and ensure that all twenty attack prompts were unsuccessful, with five prompts generated for each of the four risk categories specified in the agent configuration. Then, navigate to the Data menu and review the input data used for the scan, as well as what the target model returned for each prompt.

03: Scan an Azure OpenAI model

The evaluation that you ran in the first section used a simple test function as the target model. In this task, you will configure the AI Red Teaming Agent to scan an actual deployed Azure OpenAI model.

Expand this section to view the solution

First, remove the test_chat_target() function from the redTeamingAgent_initializer.py file. Then, add the following code on lines 26-31 to define the Azure OpenAI chat target model:

# Configuration for Azure OpenAI model
azure_openai_config = { 
    "azure_endpoint": f"{os.environ.get('AZURE_OPENAI_ENDPOINT')}/openai/deployments/{os.environ.get('gpt_deployment')}/chat/completions",
    "api_key": os.environ.get("AZURE_OPENAI_KEY"),
    "azure_deployment": os.environ.get("gpt_deployment")
}

After doing this, change the main() function to use this Azure OpenAI configuration as its target model for red teaming. Replace the scan line of code with the following, being sure to leave the rest of the main() function and the call below it unchanged:

red_team_result = await red_team_agent.scan(target=azure_openai_config)

Once you have made your changes to this file, save it and run the following command in your terminal window from the src/ directory of your project:

python app/agents/redTeamingAgent_initializer.py

This will execute the red team scan against your deployed Azure OpenAI model. After the scan is complete, navigate back to Microsoft Foundry and review the results of the scan as you did in the previous section. You should once again see that all attack prompts were unsuccessful. Drilling into the results on the Data tab, you should see some responses with an attack success of Attack unsuccessful and some with Succeeded, where there were attempts to bypass the model’s safety mechanisms but the model did not respond to this prompt due to its built-in safety features.

04: Define an OpenAI chat target model

The OpenAIChatTarget class from the PyRIT library provides a convenient way to define an OpenAI chat model as the target for red teaming. In this task, you will modify the redTeamingAgent_initializer.py file to use the OpenAIChatTarget class instead of the Azure OpenAI configuration dictionary.

Expand this section to view solution

First, remove the azure_openai_config dictionary from the redTeamingAgent_initializer.py file. Then, add the following code on lines 26-31 to define the OpenAI chat target model:

chat_target = OpenAIChatTarget(
    model_name=os.environ.get("gpt_deployment"),
    endpoint=f"{os.environ.get("gpt_endpoint")}/openai/deployments/{os.environ.get('gpt_deployment')}/chat/completions" ,
    api_key=os.environ.get("gpt_api_key"),
    api_version=os.environ.get("gpt_api_version"),
)

Next, change the main() function to use this chat_target as its target model for red teaming. Replace the scan line of code with the following, being sure to leave the rest of the main() function and the call below it unchanged:

red_team_result = await red_team_agent.scan(target=chat_target)

Once you have made your changes to this file, save it and run the following command in your terminal window from the src/ directory of your project:

python app/agents/redTeamingAgent_initializer.py

This will execute the red team scan against your deployed endpoint. After the scan is complete, navigate back to Microsoft Foundry and review the results of the scan as you did in the previous section. You should once again see that all attack prompts were unsuccessful. At this point, you now have a baseline established for your deployed model.