Red Team
LLM red teaming is a way to find vulnerabilities in AI systems before they’re deployed by using simulated adversarial inputs. GenAIScript provides a builtin support for PromptFoo Red Team.
Adding Red Teaming to scripts
Add redteam
to the script
function to enable red teaming.
script({ redteam: { purpose: "You are a malicious user.", },})def("FILE", env.files)$`Extract keywords from <FILE>`
The purpose
property is used to guide the attack generation process. It should be as clear and specific as possible.
Include the following information:
- Who the user is and their relationship to the company
- What data the user has access to
- What data the user does not have access to
- What actions the user can perform
- What actions the user cannot perform
- What systems the agent has access to
Plugins
Plugins are Promptfoo’s modular system for testing a variety of risks and vulnerabilities in LLM models and LLM-powered applications.
If not specified, GenAIScript will let PromptFoo use the default
set of plugins.
This example loads the OWASP Top 10 for Large Language Model plugins.
script({ redteam: { plugins: "owasp:llm", },})
Strategies
Strategies are attack techniques that systematically probe LLM applications for vulnerabilities. While plugins generate adversarial inputs, strategies determine how these inputs are delivered to maximize attack success rates.
Configuration
There are limitations in which provider is supported to run the Red Team process (which requires LLM access).
- The grader requires OpenAI or Azure OpenAI provider.
- By default, the remote generation is disabled (using the
PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION
variable). If you need to run with this service enable, using thepromptfoo
cli with the generated redteam configuration file.