Tests
It is possible to define tests for the LLM scripts, to evaluate the output quality of the LLM over time and model types.
The tests are executed by promptfoo, a tool for evaluating LLM output quality.
Defining tests
The tests are declared in the script
function in your test.
You may define one or many tests (array).
files
files
takes a list of file path (relative to the workspace) and populate the env.files
variable while running the test. You can provide multiple files by passing an array of strings.
rubrics
rubrics
checks if the LLM output matches given requirements,
using a language model to grade the output based on the rubric (see llm-rubric).
You can specify multiple rubrics by passing an array of strings.
facts
facts
checks a factual consistency (see factuality).
You can specify multiple facts by passing an array of strings.
given a completion A and reference answer B evaluates whether A is a subset of B, A is a superset of B, A and B are equivalent, A and B disagree, or A and B differ, but difference donโt matter from the perspective of factuality.
asserts
Other assertions on promptfoo assertions and metrics.
icontains
(not-icontains"
) output contains substring case insensitiveequals
(not-equals
) output equals stringstarts-with
(not-starts-with
) output starts with string
contains-all
(not-contains-all
) output contains all substringscontains-any
(not-contains-any
) output contains any substringicontains-all
(not-icontains-all
) output contains all substring case insensitive
transform
By default, the asserts
are executed on the raw LLM output.
However, you can use a javascript expression to select a part of the output to test.
Running tests
You can run tests from Visual Studio Code or using the command line. In both cases, genaiscript generates a promptfoo configuration file and execute promptfoo on it.
Visual Studio Code
- Open the script to test
- Right click in the editor and select Run GenAIScript Tests in the context menu
- The promptfoo web view will automatically open and refresh with the test results.
Command line
Run the test
command with the script file as argument.
You can specify additional models to test against by passing the --models
option.