benchmarks#

onnxrt_backend_dev.bench_run#

BenchmarkError#

class onnxrt_backend_dev.bench_run.BenchmarkError[source]#

get_machine#

onnxrt_backend_dev.bench_run.get_machine() Dict[str, str | int | float | Tuple[int, int]][source]#

Returns the machine specification.

run_benchmark#

onnxrt_backend_dev.bench_run.run_benchmark(script_name: str, configs: List[Dict[str, str | int | float]], verbose: int = 0) List[Dict[str, str | int | float | Tuple[int, int]]][source]#

Runs a script multiple times and extract information from the output following the pattern :<metric>,<value>;.

Parameters:
  • script_name – python script to run

  • configs – list of execution to do

  • verbose – use tqdm to follow the progress

Returns:

values

scripts#

onnxrt_backend_dev.llama.dort_bench#

The script runs a few iterations of a dummy llama model. See Measure LLAMA speed. The script can be called multiple times by function onnxrt_backend_dev.bench_run.run_benchmark() to collect many figures.

python -m onnxrt_backend_dev.llama.dort_bench --help
options:
    -h, --help            show this help message and exit
    -r REPEAT, --repeat REPEAT
                            number of times to repeat the measure, default is 5
    -w WARMUP, --warmup WARMUP
                            number of iterations to warmup, includes the settings, default is 5
    --backend BACKEND     'ort' or 'inductor', default is ort
    --device DEVICE       'cpu' or 'cuda', default is cpu
    --num_hidden_layers NUM_HIDDEN_LAYERS
                            number of hidden layers, default is 1