Glossary
-
Prompt Under Test (PUT) - like Program Under Test; the prompt
-
Model Under Test (MUT) - Model which we are testing against with specific temperature, etc example: gpt-4o-mini
-
Model Used by PromptPex (MPP) - gpt-4o
-
Input Specification (IS) - Extracting input constraints of PUT using MPP (input_spec)
-
Output Rules (OR) - Extracting output constraints of PUT using MPP (rules_global)
-
Inverse Output Rules (IOR) - Inverse of the generated Output Rules
-
Output Rules Groundedness (ORG) - Checks if OR is grounded in PUT using MPP (check_rule_grounded)
-
Prompt Under Test Intent (PUTI) - Extracting the exact task from PUT using MMP (extract_intent)
-
Test Scenario (TS) - Set of additional input constraint variations not captured in the prompt.
-
PromptPex Tests (PPT) - Test cases generated for PUT with MPP using IS and OR (test)
-
Baseline Tests (BT) - Zero shot test cases generated for PUT with MPP (baseline_test)
-
Test Expansion (TE) - Expanding the test cases from examples and generally telling the LLM to make them more complex (test_expansion)
-
Test Validity (TV) - Checking if PPT and BT meets the constraints in IS using MPP (check_violation_with_input_spec)
-
Spec Agreement (SA) - Result generated for PPT and BT on PUTI + OR with MPP (evaluate_test_coverage)
-
Test Output (TO) - Result generated for PPT and BT on PUT with each MUT (the template is PUT)
-
Test Non-Compliance (TNC) - Checking if TO meets the constraints in PUT using MPP (check_violation_with_system_prompt)
-
Ground Truth Model (GTM) - Model used to generate the ground truth for the tests.
-
Ground Truth Eval Models (GTMEs) - Models used to evaluate the ground truth for the tests.
-
Ground Truth Eval Metrics (GTEMT) - Metric used to evaluate the ground truth for the tests.
-
PromptPex Tests with Ground Truth (PPGT) - Tests that include model-generated ground truth.
- Every node is created by an LLM call (aside from the PUT).
- Rounded nodes can be edited by the user.
- Square nodes are evaluations.
- Diamond nodes are outputs.
- Lines represent data dependencies.
- Bolded lines are the minimum path to generate tests.