Evals with multiple Models

GenAIScript allows you to evaluate multiple models in a single script against multiple tests. This is useful when you want to compare the performance of different models on the same input.

GenAIScript leverages PromptFoo to evaluate the outputs of the models.

In this example, we will evaluate the performance of three models on a summarizing script.

const file = def("FILE", env.files)
$`Summarize ${file} in one sentence.`

Defining tests

First, you need to add one or more tests as the tests field in the script function.

script({
    tests: { files: "markdown.md", keywords: "markdown" },
})
...

In this case, we add a simple keyword assertion but you can find many other options in the tests reference.

Defining test models

Next add the list of model identifier or model aliases you want to test against.

script({
    ...,
    testModels: [
        "azure_ai_inference:gpt-4o",
        "azure_ai_inference:gpt-4o-mini",
        "azure_ai_inference:deepseek-r1",
    ],
})
...

Running tests

Tests can be run using the genaiscript CLI or in Visual Studio Code (see testing scripts).

genaiscript test summarizer

Next, open the PromptFoo dashboard to see the results of the tests.

genaiscript test view