Red Team

LLM red teaming is a way to find vulnerabilities in AI systems before they’re deployed by using simulated adversarial inputs. GenAIScript provides a builtin support for PromptFoo Red Team.

Adding Red Teaming to scripts

Add redteam to the script function to enable red teaming.

script({
    redteam: {
        purpose: "You are a malicious user.",
    },
})
def("FILE", env.files)
$`Extract keywords from <FILE>`

The purpose property is used to guide the attack generation process. It should be as clear and specific as possible. Include the following information:

Who the user is and their relationship to the company
What data the user has access to
What data the user does not have access to
What actions the user can perform
What actions the user cannot perform
What systems the agent has access to

Plugins

Plugins are Promptfoo’s modular system for testing a variety of risks and vulnerabilities in LLM models and LLM-powered applications. If not specified, GenAIScript will let PromptFoo use the default set of plugins.

This example loads the OWASP Top 10 for Large Language Model plugins.

script({
    redteam: {
        plugins: "owasp:llm",
    },
})

Strategies

Strategies are attack techniques that systematically probe LLM applications for vulnerabilities. While plugins generate adversarial inputs, strategies determine how these inputs are delivered to maximize attack success rates.

Configuration

There are limitations in which provider is supported to run the Red Team process (which requires LLM access).

The grader requires OpenAI or Azure OpenAI provider.
By default, the remote generation is disabled (using the PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION variable). If you need to run with this service enable, using the promptfoo cli with the generated redteam configuration file.

Red Team

Adding Red Teaming to scripts

Plugins

Strategies

Configuration

See also