Task 01 - Create a Red Teaming Agent via code

Introduction

With the widespread adoption of AI agents in various applications, ensuring their security and reliability has become paramount. Red teaming is a proactive approach to identify vulnerabilities and weaknesses in AI systems by simulating real-world attacks and adversarial scenarios. By conducting red teaming exercises, organizations can uncover potential risks, assess the robustness of their AI models, and implement necessary safeguards to mitigate threats.

Description

In this task, you will create an AI Red Teaming Agent using Microsoft Foundry and run it locally. You will configure the agent to perform automated scans against your deployed AI models, using a set of predefined attack prompts. You will also establish a baseline for your models’ performance and behavior, which will help you identify any anomalies or vulnerabilities during the red teaming exercises.

Success Criteria

You have created an AI Red Teaming Agent in Microsoft Foundry.
You have configured the agent with a set of attack prompts.
You have established a baseline for your AI models.

Learning Resources

AI Red Teaming Agent Run AI Red Teaming Agent locally Run AI Red Teaming Agent in the cloud

Key Tasks

01: Create the AI Red Teaming Agent

The Azure AI Red Teaming Agent is a powerful tool that can help you identify potential risks and vulnerabilities in your AI models. To create the agent, you will need to use the azure-ai-evaluation[redteam] library.

Expand this section to view the solution

You have already installed the azure-ai-evaluation[redteam] library as part of product setup. With this library, you can create an AI Red Teaming Agent that can perform automated scans against your deployed AI models.

Create a new file called redTeamingAgent_initializer.py in the src/app/agents directory of your project. Add the following code to the file on lines 1-8:

# Azure imports
from azure.identity import DefaultAzureCredential
from azure.ai.evaluation.red_team import RedTeam, RiskCategory, AttackStrategy
from pyrit.prompt_target import OpenAIChatTarget
import os
import asyncio
from dotenv import load_dotenv
load_dotenv()

The AI Red Teaming Agent uses the PyRIT library to define the target model for red teaming. In this example, we will use the OpenAIChatTarget class to define the target model. After the imports, add the following code on lines 10-11 to define the Azure AI project information:

# Azure AI Project Information
azure_ai_project = os.getenv("FOUNDRY_ENDPOINT")

Next, instantiate the AI Red Teaming Agent with the desired configuration. Add the following code on lines 13-24:

# Instantiate your AI Red Teaming Agent
red_team_agent = RedTeam(
    azure_ai_project=azure_ai_project,
    credential=DefaultAzureCredential(),
    risk_categories=[
        RiskCategory.Violence,
        RiskCategory.HateUnfairness,
        RiskCategory.Sexual,
        RiskCategory.SelfHarm
    ],
    num_objectives=5,
)

The risk_categories parameter is optional and it specifies the types of risks that the agent will look for during the scans. There are eight risk categories in total and you can include any combination of the eight in your scans. The num_objectives parameter specifies the number of unique attack prompts that the agent will generate for each scan. By default, the agent generates 10 unique attack prompts per scan.

Once you have the red team agent instantiated, you will need to define the target model for red teaming. We will first define a simple test target function that simulates an AI assistant. Add the following code on lines 26-27:

def test_chat_target(query: str) -> str:
    return "I am a simple AI assistant that follows ethical guidelines. I'm sorry, Dave. I'm afraid I can't do that."

This function simulates an AI assistant that follows ethical guidelines and refuses to comply with harmful requests but it does not represent actual interactions with a deployed AI model.

After defining the test target function, add the following code to execute the red team scan:

async def main():
    red_team_result = await red_team_agent.scan(target=test_chat_target)

asyncio.run(main())

In order to run the red team agent, navigate to the src/ directory of your project in a terminal window. Ensure that you are in the correct virtual environment and run the following command:

python app/agents/redTeamingAgent_initializer.py

02: Scan an Azure OpenAI model

The evaluation that you ran in the first section used a simple test function as the target model. In this task, you will configure the AI Red Teaming Agent to scan an actual deployed Azure OpenAI model.

Expand this section to view the solution

First, remove the test_chat_target() function from the redTeamingAgent_initializer.py file. Then, add the following code on lines 26-31 to define the Azure OpenAI chat target model:

# Configuration for Azure OpenAI model
azure_openai_config = { 
    "azure_endpoint": f"{os.environ.get('gpt_endpoint')}/openai/deployments/{os.environ.get('gpt_deployment')}/chat/completions",
    "api_key": os.environ.get("FOUNDRY_KEY"),
    "azure_deployment": os.environ.get("gpt_deployment")
}

After doing this, change the main() function to use this Azure OpenAI configuration as its target model for red teaming. Replace the scan line of code with the following, being sure to leave the rest of the main() function and the call below it unchanged:

red_team_result = await red_team_agent.scan(target=azure_openai_config)

Once you have made your changes to this file, save it and run the following command in your terminal window from the src/ directory of your project:

python app/agents/redTeamingAgent_initializer.py

This will execute the red team scan against your deployed Azure OpenAI model.

03: Review the results

Each execution of the AI Red Teaming Agent generates a report that summarizes the findings of the scan. The report includes information about the attack prompts used, the responses generated by the target model, and any risks or vulnerabilities identified during the scan.

Expand this section to view the solution

For local red team evaluations, you can find a link to the evaluation results in the terminal window.

Open the evaluation file.

This opens a lengthy JSON file. You can format the file in Visual Studio Code using Ctrl + Shift + I. Scroll through the file to learn what information the report generates. In the following example, we see that the red team agent attempted to trick the language model to generate ransomware code. The result is a 400 Bad Request as this triggered Azure OpenAI content restrictions.

Review the evaluation results.

04: Define an OpenAI chat target model

The OpenAIChatTarget class from the PyRIT library provides a convenient way to define an OpenAI chat model as the target for red teaming. In this task, you will modify the redTeamingAgent_initializer.py file to use the OpenAIChatTarget class instead of the Azure OpenAI configuration dictionary.

Expand this section to view solution

First, remove the azure_openai_config dictionary from the redTeamingAgent_initializer.py file. Then, add the following code on lines 26-31 to define the OpenAI chat target model:

chat_target = OpenAIChatTarget(
    model_name=os.environ.get("gpt_deployment"),
    endpoint=f"{os.environ.get("gpt_endpoint")}/openai/deployments/{os.environ.get('gpt_deployment')}/chat/completions" ,
    api_key=os.environ.get("gpt_api_key"),
    api_version=os.environ.get("gpt_api_version"),
)

Next, change the main() function to use this chat_target as its target model for red teaming. Replace the scan line of code with the following, being sure to leave the rest of the main() function and the call below it unchanged:

red_team_result = await red_team_agent.scan(target=chat_target)

Once you have made your changes to this file, save it and run the following command in your terminal window from the src/ directory of your project:

python app/agents/redTeamingAgent_initializer.py

This will execute the red team scan against your deployed endpoint. After the scan is complete, review the results in the JSON file as you did for the prior example.