How to Add Custom Model Evaluator#
This document describes how to add custom model evaluator to Olive.
Olive implements evaluation logic via class (OliveEvaluator
)[https://github.com/microsoft/Olive/blob/main/olive/evaluator/olive_evaluator.py#L59]. Users can extend this implementation and override the evaluate
method to implement their own custom logic.
Arguments to the OliveEvaluator.evaluate
function:
model: OliveModelHandler
: Model to evaluate.metrics: List[Metric]
: List of metrics to evaluate for.device: Device
: Target device to evaluate on.execution_providers: Union[str, List[str]]
: Evaluation provider(s) to use for the target device.
Here is an example of how to subclass (OliveEvaluator
)[https://github.com/microsoft/Olive/blob/main/olive/evaluator/olive_evaluator.py#L59]:
from typing import List, Union
from olive.evaluator.registry import Registry
from olive.evaluator import (
Metric,
MetricResult,
OliveEvaluator,
flatten_metric_result,
)
from olive.hardware import Device
from olive.model.handler import OliveModelHandler
@Registry.register("my_custom_evaluator")
def my_custom_evaluator(**kwargs):
return MyCustomEvaluator(**kwargs)
class MyCustomEvaluator(OliveEvaluator):
def evaluate(
self,
model: OliveModelHandler,
metrics: List[Metric],
device: Device = Device.CPU,
execution_providers: Union[str, List[str]] = None,
) -> MetricResult:
# Your custom evaluation logic goes in here
# Olive expects the return to be flattened
return flatten_metric_result(metrics)
For detailed information on supported metric types, refer to Metric
. MetricResult
holds the final result of evaluation. Snippet below demonstrates how to build the final results from computed metrics.
{
"metrics":[
{
"name": "accuracy",
"type": "accuracy",
"sub_types": [
{"name": "acc", "priority": 1, "goal": {"type": "max-degradation", "value": 0.01}},
{"name": "f1"},
{"name": "auroc"}
]
},
{
"name": "latency",
"type": "latency",
"sub_types": [
{"name": "avg", "priority": 2, "goal": {"type": "percent-min-improvement", "value": 20}},
{"name": "max"},
{"name": "min"}
]
}
]
}
final_metric_results = {}
for metric_name, eval_results in results.items():
sub_metrics = {}
for sub_metric_name, sub_metric_result in eval_results.items():
sub_metrics[m] = SubMetricResult(value=sub_metric_result, priority=-1, higher_is_better=True)
final_metric_results[metric_name] = MetricResult.parse_obj(sub_metrics)
return flatten_metric_result(final_metric_results)
LMEvaluator
in (olive_evaluator.py)[https://github.com/microsoft/Olive/blob/main/olive/evaluator/olive_evaluator.py#L1068] is a good example that demonstrate evaluating a HuggingFace model using (lm_eval
)[https://github.com/EleutherAI/lm-evaluation-harness] harness.