Passes

The following passes are available in Olive.

Each pass is followed by a description of the pass and a list of the pass’s configuration options.

OnnxConversion

Convert a PyTorch model to ONNX model using torch.onnx.export.

Input: PyTorchModel

Output: ONNXModel | CompositeOnnxModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

target_opset

The version of the default (ai.onnx) opset to target.

type: int

default_value: 14

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxModelOptimizer

Optimize ONNX model by fusing nodes.

Input: ONNXModel

Output: ONNXModel

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OrtTransformersOptimization

Optimize transformer based models in scenarios where ONNX Runtime does not apply the optimization at load time. It is based on onnxruntime.transformers.optimizer.

Input: ONNXModel

Output: ONNXModel

model_type

Transformer based model type, including bert (exported by PyTorch), gpt2 (exported by PyTorch), bert_tf (BERT exported by tf2onnx), bert_keras (BERT exported by keras2onnx), and unet/vae/clip (stable diffusion).

type: str

required: True

num_heads

Number of attention heads.

type: int

default_value: 0

searchable_values: None

hidden_size

Number of hidden nodes.

type: int

default_value: 0

searchable_values: None

optimization_options

Optimization options that turn on/off some fusions.

type: Dict[str, Any] | onnxruntime.transformers.fusion_options.FusionOptions

default_value: None

searchable_values: None

opt_level

Graph optimization level of Onnx Runtime: 0 - disable all (default), 1 - basic, 2 - extended, 99 - all.

type: Any

default_value: None

searchable_values: None

use_gpu

Flag for GPU inference.

type: bool

default_value: False

searchable_values: None

only_onnxruntime

Whether only use onnxruntime to optimize model, and no python fusion.

type: bool

default_value: False

searchable_values: None

float16

Whether half-precision float will be used.

type: bool

default_value: False

searchable_values: None

input_int32

Whether int32 tensors will be used as input.

type: bool

default_value: False

searchable_values: None

keep_io_types

Keep input and output tensors in their original data type

type: bool

default_value: True

searchable_values: None

force_fp32_ops

Operators that are forced to run in float32

type: List[str]

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OrtPerfTuning

Optimize ONNX Runtime inference settings.

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_config

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

data_dir

Directory of sample inference data.

type: pathlib.Path | str

default_value: None

searchable_values: None

dataloader_func

Dataloader function to load data from given data_dir with given batch size.

type: Callable | str

default_value: None

searchable_values: None

batch_size

Batch size for inference.

type: int

default_value: None

searchable_values: None

input_names

Input names list for ONNX model.

type: list

default_value: None

searchable_values: None

input_shapes

Input shapes list for ONNX model.

type: list

default_value: None

searchable_values: None

input_types

Input types list for ONNX model.

type: list

default_value: None

searchable_values: None

device

Device selected for tuning process.

type: str

default_value: cpu

searchable_values: None

cpu_cores

CPU cores used for thread tuning.

type: int

default_value: None

searchable_values: None

io_bind

Whether enable IOBinding Search for ONNX Runtime inference.

type: bool

default_value: False

searchable_values: None

providers_list

Execution providers framework list to execute the ONNX models.

type: list

default_value: None

searchable_values: None

execution_mode_list

Parallelism list between operators.

type: list

default_value: None

searchable_values: None

opt_level_list

Optimization level list for ONNX model.

type: list

default_value: None

searchable_values: None

trt_fp16_enable

Whether enable FP16 mode for TensorRT execution provider.

type: bool

default_value: False

searchable_values: None

intra_thread_num_list

List of intra thread number for test.

type: list

default_value: [None]

searchable_values: None

inter_thread_num_list

List of inter thread number for test.

type: list

default_value: [None]

searchable_values: None

extra_session_config

Extra customized session options during tuning process.

type: Dict[str, Any]

default_value: None

searchable_values: None

OnnxFloatToFloat16

Converts a model to float16. It is based on onnxconverter-common.convert_float_to_float16. See https://onnxruntime.ai/docs/performance/model-optimizations/float16.html#float16-conversion

Input: ONNXModel

Output: ONNXModel

min_positive_val

Constant values will be clipped against this value

type: float

default_value: 1e-07

searchable_values: None

max_finite_val

Constant values will be clipped against this value

type: float

default_value: 10000.0

searchable_values: None

keep_io_types

Whether model inputs/outputs should be left as float32

type: bool

default_value: False

searchable_values: None

disable_shape_infer

Skips running onnx shape/type inference.

type: bool

default_value: False

searchable_values: None

op_block_list

List of op types to leave as float32

type: List[str]

default_value: None

searchable_values: None

node_block_list

List of node names to leave as float32

type: List[str]

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OrtMixedPrecision

Convert model to mixed precision.

Input: ONNXModel

Output: ONNXModel

op_block_list

List of op types to leave as float32

type: List[str]

default_value: [‘SimplifiedLayerNormalization’, ‘SkipSimplifiedLayerNormalization’, ‘Relu’, ‘Add’]

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxDynamicQuantization

ONNX Dynamic Quantization Pass

Input: ONNXModel

Output: ONNXModel

data_config

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

quant_mode

dynamic quantization mode

type: str

default_value: dynamic

searchable_values: None

weight_type

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

per_channel

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

reduce_range

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

optimize_model

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

extra.Sigmoid.nnapi

type: bool

default_value: False

searchable_values: None

ActivationSymmetric

symmetrize calibration data for activations

type: bool

default_value: False

searchable_values: None

WeightSymmetric

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

EnableSubgraph

If enabled, subgraph will be quantized. Dynamic mode currently is supported.

type: bool

default_value: False

searchable_values: None

ForceQuantizeNoInputCheck

By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.

type: bool

default_value: False

searchable_values: None

MatMulConstBOnly

If enabled, only MatMul with const B will be quantized.

type: bool

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: None

extra_options

Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxStaticQuantization

ONNX Static Quantization Pass

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_config

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

quant_mode

static quantization mode

type: str

default_value: static

searchable_values: None

weight_type

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

per_channel

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

reduce_range

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

optimize_model

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

data_dir

Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size

Batch size for calibration, required if quant_mode is ‘static’.

type: int

default_value: 1

searchable_values: None

dataloader_func

Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’

type: Callable | str

default_value: None

searchable_values: None

calibrate_method

Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.

type: str

default_value: MinMax

searchable_values: Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])

quant_format

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: QDQ

searchable_values: Categorical([‘QOperator’, ‘QDQ’])

activation_type

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default_value: QInt8

searchable_values: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))

extra.Sigmoid.nnapi

type: bool

default_value: False

searchable_values: None

ActivationSymmetric

symmetrize calibration data for activations

type: bool

default_value: False

searchable_values: None

WeightSymmetric

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

EnableSubgraph

If enabled, subgraph will be quantized. Dynamic mode currently is supported.

type: bool

default_value: False

searchable_values: None

ForceQuantizeNoInputCheck

By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.

type: bool

default_value: False

searchable_values: None

MatMulConstBOnly

If enabled, only MatMul with const B will be quantized.

type: bool

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: None

extra_options

Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxQuantization

Quantize ONNX model with onnxruntime where we can search for best parameters for static/dynamic quantization at same time.

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_config

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

quant_mode

Onnx Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.

type: str

default_value: static

searchable_values: Categorical([‘dynamic’, ‘static’])

weight_type

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

per_channel

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

reduce_range

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

optimize_model

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

data_dir

Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size

Batch size for calibration, required if quant_mode is ‘static’.

type: int

default_value: 1

searchable_values: None

dataloader_func

Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’

type: Callable | str

default_value: None

searchable_values: None

calibrate_method

Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘MinMax’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

quant_format

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QDQ’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

activation_type

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QInt8’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: Conditional(parents: (‘quant_mode’, ‘quant_format’, ‘weight_type’), support: {(‘static’, ‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘static’, ‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

extra.Sigmoid.nnapi

type: bool

default_value: False

searchable_values: None

ActivationSymmetric

symmetrize calibration data for activations

type: bool

default_value: False

searchable_values: None

WeightSymmetric

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

EnableSubgraph

If enabled, subgraph will be quantized. Dynamic mode currently is supported.

type: bool

default_value: False

searchable_values: None

ForceQuantizeNoInputCheck

By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.

type: bool

default_value: False

searchable_values: None

MatMulConstBOnly

If enabled, only MatMul with const B will be quantized.

type: bool

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: None

extra_options

Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

IncDynamicQuantization

Intel® Neural Compressor Dynamic Quantization Pass

Input: ONNXModel

Output: ONNXModel

data_config

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

approach

dynamic quantization mode

type: str

default_value: dynamic

searchable_values: None

device

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

searchable_values: None

backend

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

searchable_values: None

domain

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

searchable_values: None

recipes

Recipes for Intel® Neural Compressor quantiztaion, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocess and postprocess ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only vaild for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only vaild for onnxrt_trt_ep

type: dict

default_value: {}

searchable_values: None

reduce_range

Whether use 7 bit to quantization.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_level

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details

type: str

default_value: auto

searchable_values: None

excluded_precisions

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

searchable_values: None

use_distributed_tuning

Intel® Neural Compressor provides distributed tuning to speed up the tuning process by leveraging the multi-node cluster. Prerequisites: A working MPI implementation and installed mpi4py.

type: bool

default_value: False

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

IncStaticQuantization

Intel® Neural Compressor Static Quantization Pass

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_config

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

approach

static quantization mode

type: str

default_value: static

searchable_values: None

device

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

searchable_values: None

backend

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

searchable_values: None

domain

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

searchable_values: None

recipes

Recipes for Intel® Neural Compressor quantiztaion, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocess and postprocess ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only vaild for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only vaild for onnxrt_trt_ep

type: dict

default_value: {}

searchable_values: None

reduce_range

Whether use 7 bit to quantization.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_level

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details

type: str

default_value: auto

searchable_values: None

excluded_precisions

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

searchable_values: None

use_distributed_tuning

Intel® Neural Compressor provides distributed tuning to speed up the tuning process by leveraging the multi-node cluster. Prerequisites: A working MPI implementation and installed mpi4py.

type: bool

default_value: False

searchable_values: None

data_dir

Path to the directory containing the dataset. For local data, it is required if approach is ‘static’.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size

Batch size for calibration, required if approach is ‘static’.

type: int

default_value: 1

searchable_values: None

dataloader_func

Function/function name to generate dataloader for calibration, required if approach is ‘static’

type: Callable | str

required: True

quant_format

Quantization format. Support ‘QDQ’ and ‘QOperator’.

type: str

default_value: QOperator

searchable_values: Categorical([‘QOperator’, ‘QDQ’])

calibration_sampling_size

Number of calibration sample.

type: list | int

default_value: [100]

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

IncQuantization

Quantize ONNX model with Intel® Neural Compressor.

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_config

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

approach

Intel® Neural Compressor Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.

type: str

default_value: static

searchable_values: Categorical([‘dynamic’, ‘static’])

device

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

searchable_values: None

backend

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

searchable_values: None

domain

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

searchable_values: None

recipes

Recipes for Intel® Neural Compressor quantiztaion, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocess and postprocess ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only vaild for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only vaild for onnxrt_trt_ep

type: dict

default_value: {}

searchable_values: None

reduce_range

Whether use 7 bit to quantization.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_level

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details

type: str

default_value: auto

searchable_values: None

excluded_precisions

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

searchable_values: None

use_distributed_tuning

Intel® Neural Compressor provides distributed tuning to speed up the tuning process by leveraging the multi-node cluster. Prerequisites: A working MPI implementation and installed mpi4py.

type: bool

default_value: False

searchable_values: None

data_dir

Path to the directory containing the dataset. For local data, it is required if approach is ‘static’.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size

Batch size for calibration, required if approach is ‘static’.

type: int

default_value: 1

searchable_values: None

dataloader_func

Function/function name to generate dataloader for calibration, required if approach is ‘static’

type: Callable | str

required: True

quant_format

Quantization format. Support ‘QDQ’ and ‘QOperator’.

type: str

default_value: QOperator

searchable_values: Conditional(parents: (‘approach’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([‘default’]))

calibration_sampling_size

Number of calibration sample.

type: list | int

default_value: [100]

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

AppendPrePostProcessingOps

Add Pre/Post nodes to the input model

Input: ONNXModel

Output: ONNXModel

pre

List of pre-processing commands to add.

type: List[str]

default_value: None

searchable_values: None

post

List of post-processing commands to add.

type: List[str]

default_value: None

searchable_values: None

tool_command

Composited tool commands to invoke.

type: str

default_value: None

searchable_values: None

tool_command_args

Arguments to pass to tool command.

type: Dict[str, Any]

default_value: None

searchable_values: None

target_opset

The version of the default (ai.onnx) opset to target.

type: int

default_value: 16

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

InsertBeamSearch

Insert Beam Search Op.

Input: OliveModel

Output: ONNXModel

no_repeat_ngram_size

If set to int > 0, all ngrams of that size can only occur once.

type: int

default_value: 3

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

QuantizationAwareTraining

Run quantization aware training on PyTorch model.

Input: PyTorchModel

Output: PyTorchModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

train_data_dir

Directory of training data.

type: str

default_value: None

searchable_values: None

val_data_dir

Directory of validation data.

type: str

default_value: None

searchable_values: None

train_dataloader_func

Dataloader function to load training data from given train_data_dir with given train_batch_size.

type: Callable | str

default_value: None

searchable_values: None

training_loop_func

Customized training loop function.

type: Callable | str

default_value: None

searchable_values: None

ptl_module

LightningModule for PyTorch Lightning trainer. It is a way of encapsulating all the logic related to the training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html for more details.

type: Callable | str

default_value: None

searchable_values: None

ptl_data_module

LightningDataModule for PyTorch Lightning trainer. It is a way of encapsulating all the data-related logic for training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html for more details.

type: Callable | str

default_value: None

searchable_values: None

train_batch_size

Batch size for training.

type: int

default_value: None

searchable_values: None

num_epochs

Maximum number of epochs for training.

type: int

default_value: None

searchable_values: None

num_steps

Maximum number of steps for training.

type: int

default_value: -1

searchable_values: None

do_validate

Whether perform one evaluation epoch over the validation set after training.

type: bool

default_value: False

searchable_values: None

modules_to_fuse

List of list of module names to fuse.

type: List[List[str]]

default_value: None

searchable_values: None

qconfig_func

Customized function to create a QConfig for QAT. Please refer to https://pytorch.org/docs/stable/generated/torch.quantization.qconfig.QConfig.html for details.

type: Callable | str

default_value: None

searchable_values: None

logger

Logger for training.

type: pytorch_lightning.loggers.logger.Logger | Iterable[pytorch_lightning.loggers.logger.Logger] | Callable | bool

default_value: False

searchable_values: None

gpus

Number of GPUs to use.

type: int

default_value: None

searchable_values: None

seed

Random seed for training.

type: int

default_value: None

searchable_values: None

checkpoint_path

Path to save checkpoints.

type: str

default_value: None

searchable_values: None

OpenVINOConversion

Converts PyTorch, ONNX or TensorFlow Model to OpenVino Model.

Input: PyTorchModel | ONNXModel

Output: OpenVINOModel

input

Input can be set by passing a list of tuples. Each tuple should contain input name and optionally input type or input shape.

type: List[Tuple]

default_value: None

searchable_values: None

input_shape

Input shape(s) that should be fed to an input node(s) of the model. Shape is defined as a comma-separated list of integer numbers enclosed in parentheses or square brackets, for example [1,3,227,227].

type: List[int]

default_value: None

searchable_values: None

extra_config

Extra configurations for OpenVINO model conversion. extra_config can be set by passing a dictionary where key is the parameter name, and the value is the parameter value. Please check ‘mo’ command usage instruction for available parameters: https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html

type: Dict

default_value: None

searchable_values: None

OpenVINOQuantization

Post-training quantization for OpenVINO model. Please refer to https://docs.openvino.ai/latest/pot_introduction.html for more details.

Input: OpenVINOModel

Output: OpenVINOModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

engine_config

Specific config for openvino.tools.pot.IEEngine. ‘engine_config’ can be set by passing a dictonary, for example engine_config: {‘device’: ‘CPU’}

type: Dict

required: True

dataloader_func

A callable function or a str of the function name from ‘user_script’ for the instance of the dataloader.

type: Callable | str

default_value: None

searchable_values: None

data_dir

Dataset path. ‘data_dir’ can be by a str or Pathlib.Path.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size

Batch size for the dataloader.

type: int

default_value: 1

searchable_values: None

metric_func

A callable function or a str of the function name from ‘user_script’ for Metric instance to calculate the accuracy metric of the model.

type: Callable | str

default_value: None

searchable_values: None

algorithms

A list defining optimization algorithms and their parameters included in the optimization pipeline. The order in which they are applied to the model in the optimization pipeline is determined by the order in the list. example: algorithms: [{‘name’: ‘DefaultQuantization’, ‘params’: {‘preset’: ‘performance’, ‘stat_subset_size’: 500},}]

type: List[Dict]

required: True

SNPEConversion

Convert ONNX or TensorFlow model to SNPE DLC. Uses snpe-tensorflow-to-dlc or snpe-onnx-to-dlc tools from the SNPE SDK.

Input: ONNXModel | TensorFlowModel

Output: SNPEModel

input_names

List of input names.

type: List[str]

required: True

input_shapes

List of input shapes. Must be the same length as input_names.

type: List[List[int]]

required: True

output_names

List of output names.

type: List[str]

required: True

output_shapes

List of output shapes. Must be the same length as output_names.

type: List[List[int]]

required: True

input_types

List of input types. If not None, it must be a list of the same length as input_names. List members can be None to use default value. Refer to olive.snpe.constants.InputType for valid values.

type: List[str | None]

default_value: None

searchable_values: None

input_layouts

List of input layouts. If not None, it must be a list of the same length as input_names. List members can be None to use infered value. Refer to olive.snpe.constants.InputLayout for valid values.

type: List[str | None]

default_value: None

searchable_values: None

extra_args

Extra arguments to pass to snpe conversion tool. Refer to snpe-onnx-to-dlc and snpe-tensorflow-to-dlc at https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.

type: str

default_value: None

searchable_values: None

SNPEQuantization

Quantize SNPE model. Uses snpe-dlc-quantize tool from the SNPE SDK.

Input: SNPEModel

Output: SNPEModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_dir

Path to the data directory.

type: str

required: True

dataloader_func

Function or function name to create dataloader for quantization. Function should take data directory as an argument and return a olive.snpe.SNPEDataLoader object.

type: Callable[[str], olive.snpe.data_loader.SNPEDataLoader] | str

required: True

use_enhanced_quantizer

Use the enhanced quantizer feature when quantizing the model. Uses an algorithm to determine optimal range instead of min and max range of data. It can be useful for quantizing models that have long tails in the distribution of the data being quantized.

type: bool

default_value: False

searchable_values: Categorical([True, False])

enable_htp

Pack HTP information in quantized DLC.

type: bool

default_value: False

searchable_values: Categorical([True, False])

htp_socs

List of SoCs to generate HTP Offline cache for.

type: List[str]

default_value: None

searchable_values: None

extra_args

Extra arguments to pass to snpe conversion tool. Refer to https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html#tools_snpe-dlc-quantize for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.

type: str

default_value: None

searchable_values: None

SNPEtoONNXConversion

Convert a SNPE DLC to ONNX to use with SNPE Execution Provider. Creates a ONNX graph with the SNPE DLC as a node.

Input: SNPEModel

Output: ONNXModel

target_device

Target device for the ONNX model. Refer to olive.snpe.SNPEDevice for valid values.

type: str

default_value: cpu

searchable_values: None

target_opset

Target ONNX opset version.

type: int

default_value: 12

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

VitisAIQuantization

Quantize ONNX model with onnxruntime where we can search for best parameters for vai_q_onnx quantization at same time.

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

quant_mode

Onnx Quantization mode. , ‘static’ for vitis ai quantization.

type: str

default_value: static

searchable_values: Categorical([‘static’])

data_dir

Path to the directory containing the dataset.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size

Batch size for calibration, required.

type: int

default_value: 1

searchable_values: None

dataloader_func

Function/function name to generate dataloader for calibration, required’

type: Callable | str

required: True

weight_type

Data type for quantizing weights which is used in vai_q_onnx quantization. ‘QInt8’ for signed 8-bit integer,

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’])

input_nodes

Start node that needs quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

output_nodes

End node that needs quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

optimize_model

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default_value: False

searchable_values: Categorical([True, False])

use_external_data_format

option used for large size (>2GB) model. Set to False by default.

type: bool

default_value: False

searchable_values: None

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

calibrate_method

Current calibration methods supported are NonOverflow and MinMSE, Please use NonOverflow or MinMSE as options.

type: str

default_value: MinMSE

searchable_values: Categorical([‘NonOverflow’, ‘MinMSE’])

quant_format

QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: QDQ

searchable_values: Categorical([‘QDQ’])

activation_type

Quantization data type of activation.

type: str

default_value: QInt8

searchable_values: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))

ActivationSymmetric

symmetrize calibration data for activations

type: bool

default_value: True

searchable_values: None

WeightSymmetric

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

AddQDQPairToWeight

remains floating-point weight and inserts both QuantizeLinear/DeQuantizeLinear nodes to weight

type: bool

default_value: True

searchable_values: None

extra_options

Key value pair dictionary for extra_options in quantization. If an option is one of [‘ActivationSymmetric’, ‘WeightSymmetric’, ‘AddQDQPairToWeight’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

OptimumConversion

Convert a Optimum model to ONNX model using the Optimum export function.

Input: OptimumModel

Output: ONNXModel | CompositeOnnxModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

target_opset

The version of the default (ai.onnx) opset to target.

type: int

default_value: 14

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OptimumMerging

Merges a decoder_model with its decoder_with_past_model via the Optimum library.

Input: CompositeOnnxModel

Output: ONNXModel | CompositeOnnxModel

execution_provider

Target execution provider. This parameter will be removed when accelerators/targets are visible to passes.

type: str

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None