Evals with multiple Models
GenAIScript allows you to evaluate multiple models in a single script against multiple tests. This is useful when you want to compare the performance of different models on the same input.
GenAIScript leverages PromptFoo to evaluate the outputs of the models.
In this example, we will evaluate the performance of three models on a summarizing script.
const file = def("FILE", env.files)$`Summarize ${file} in one sentence.`
Defining tests
First, you need to add one or more tests as the tests
field in the script
function.
script({ tests: { files: "markdown.md", keywords: "markdown" },})...
In this case, we add a simple keyword
assertion but you can find many other options in the tests reference.
Defining test models
Next add the list of model identifier or model aliases you want to test against.
script({ ..., testModels: [ "azure_ai_inference:gpt-4o", "azure_ai_inference:gpt-4o-mini", "azure_ai_inference:deepseek-r1", ],})...
Running tests
Tests can be run using the genaiscript
CLI or in Visual Studio Code (see testing scripts).
genaiscript test summarizer
Next, open the PromptFoo dashboard to see the results of the tests.
genaiscript test view