Custom Metrics
You can provide custom metrics for the test result evaluation step. Metric can be qualitative (ok
, err
, unknown
) or quantitative (e.g. 0
to 100
good).
A metric should be a .metric.prompty
in the same folder as the prompt under test.
---name: Custom Test Result Evaluationdescription: | A template for a custom evaluation of the results.tags: - unlistedinputs: prompt: type: string description: The prompt to be evaluated. intent: type: string description: The extracted intent of the prompt. inputSpec: type: string description: The input specification for the prompt. rules: type: string description: The rules to be applied for the test generation. input: type: string description: The input to be used with the prompt. output: type: string description: The output from the model execution.---
system:
## Task
You are a chatbot that helps users evaluate the performance of a model.Your task is to evaluate the <CRITERIA> based <OUTPUT> provided.
<CRITERIA>The <OUTPUT> is in English.</CRITERIA>
## Output
**Binary Decision on Evaluation**: You are required to make a binary decision based on your evaluation:
- Return 'OK' if <OUTPUT> is compliant with <CRITERIA>.- Return 'ERR' if <OUTPUT> is **not** compliant with <CRITERIA> or if you are unable to confidently answer.
user:<OUTPUT>{{output}}</OUTPUT>