benchmarks#
onnxrt_backend_dev.bench_run#
BenchmarkError#
get_machine#
run_benchmark#
- onnxrt_backend_dev.bench_run.run_benchmark(script_name: str, configs: List[Dict[str, str | int | float]], verbose: int = 0) List[Dict[str, str | int | float | Tuple[int, int]]] [source]#
Runs a script multiple times and extract information from the output following the pattern
:<metric>,<value>;
.- Parameters:
script_name – python script to run
configs – list of execution to do
verbose – use tqdm to follow the progress
- Returns:
values
scripts#
onnxrt_backend_dev.llama.dort_bench#
The script runs a few iterations of a dummy llama model.
See Measure LLAMA speed. The script can be called multiple times by
function onnxrt_backend_dev.bench_run.run_benchmark()
to collect many figures.
python -m onnxrt_backend_dev.llama.dort_bench --help
options:
-h, --help show this help message and exit
-r REPEAT, --repeat REPEAT
number of times to repeat the measure, default is 5
-w WARMUP, --warmup WARMUP
number of iterations to warmup, includes the settings, default is 5
--backend BACKEND 'ort' or 'inductor', default is ort
--device DEVICE 'cpu' or 'cuda', default is cpu
--num_hidden_layers NUM_HIDDEN_LAYERS
number of hidden layers, default is 1