benchmarks#

onnxrt_backend_dev.bench_run#

BenchmarkError#

class onnxrt_backend_dev.bench_run.BenchmarkError[source]#

get_machine#

onnxrt_backend_dev.bench_run.get_machine() → Dict[str, str | int | float | Tuple[int, int]][source]#: Returns the machine specification.

run_benchmark#

Runs a script multiple times and extract information from the output following the pattern :<metric>,<value>;.

Parameters:

script_name – python script to run
configs – list of execution to do
verbose – use tqdm to follow the progress

Returns:

values

scripts#

onnxrt_backend_dev.llama.dort_bench#

The script runs a few iterations of a dummy llama model. See Measure LLAMA speed. The script can be called multiple times by function onnxrt_backend_dev.bench_run.run_benchmark() to collect many figures.

python -m onnxrt_backend_dev.llama.dort_bench --help

options:
    -h, --help            show this help message and exit
    -r REPEAT, --repeat REPEAT
                            number of times to repeat the measure, default is 5
    -w WARMUP, --warmup WARMUP
                            number of iterations to warmup, includes the settings, default is 5
    --backend BACKEND     'ort' or 'inductor', default is ort
    --device DEVICE       'cpu' or 'cuda', default is cpu
    --num_hidden_layers NUM_HIDDEN_LAYERS
                            number of hidden layers, default is 1