Skip to content

Glossary

  • Prompt Under Test (PUT) - like Program Under Test; the prompt

  • Model Under Test (MUT) - Model which we are testing against with specific temperature, etc example: gpt-4o-mini

  • Model Used by PromptPex (MPP) - gpt-4o

  • Input Specification (IS) - Extracting input constraints of PUT using MPP (input_spec)

  • Output Rules (OR) - Extracting output constraints of PUT using MPP (rules_global)

  • Inverse Output Rules (IOR) - Inverse of the generated Output Rules

  • Output Rules Groundedness (ORG) - Checks if OR is grounded in PUT using MPP (check_rule_grounded)

  • Prompt Under Test Intent (PUTI) - Extracting the exact task from PUT using MMP (extract_intent)

  • Test Scenario (TS) - Set of additional input constraint variations not captured in the prompt.

  • PromptPex Tests (PPT) - Test cases generated for PUT with MPP using IS and OR (test)

  • Baseline Tests (BT) - Zero shot test cases generated for PUT with MPP (baseline_test)

  • Test Expansion (TE) - Expanding the test cases from examples and generally telling the LLM to make them more complex (test_expansion)

  • Test Validity (TV) - Checking if PPT and BT meets the constraints in IS using MPP (check_violation_with_input_spec)

  • Spec Agreement (SA) - Result generated for PPT and BT on PUTI + OR with MPP (evaluate_test_coverage)

  • Test Output (TO) - Result generated for PPT and BT on PUT with each MUT (the template is PUT)

  • Test Non-Compliance (TNC) - Checking if TO meets the constraints in PUT using MPP (check_violation_with_system_prompt)

  • Ground Truth Model (GTM) - Model used to generate the ground truth for the tests.

  • Ground Truth Eval Models (GTMEs) - Models used to evaluate the ground truth for the tests.

  • Ground Truth Eval Metrics (GTEMT) - Metric used to evaluate the ground truth for the tests.

  • PromptPex Tests with Ground Truth (PPGT) - Tests that include model-generated ground truth.


  • Every node is created by an LLM call (aside from the PUT).
  • Rounded nodes can be edited by the user.
  • Square nodes are evaluations.
  • Diamond nodes are outputs.
  • Lines represent data dependencies.
  • Bolded lines are the minimum path to generate tests.