Passes¶

The following passes are available in Olive.

Each pass is followed by a description of the pass and a list of the pass’s configuration options.

OnnxConversion¶

Convert a PyTorch model to ONNX model using torch.onnx.export.

Input: PyTorchModel

Output: ONNXModel | CompositeOnnxModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

target_opset¶

The version of the default (ai.onnx) opset to target.

type: int

default_value: 14

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxModelOptimizer¶

Optimize ONNX model by fusing nodes.

Input: ONNXModel

Output: ONNXModel

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OrtTransformersOptimization¶

Optimize transformer based models in scenarios where ONNX Runtime does not apply the optimization at load time. It is based on onnxruntime.transformers.optimizer.

Input: ONNXModel

Output: ONNXModel

model_type¶

Transformer based model type, including bert (exported by PyTorch), gpt2 (exported by PyTorch), bert_tf (BERT exported by tf2onnx), bert_keras (BERT exported by keras2onnx), and unet/vae/clip (stable diffusion).

type: str

required: True

num_heads¶

Number of attention heads.

type: int

default_value: 0

searchable_values: None

hidden_size¶

Number of hidden nodes.

type: int

default_value: 0

searchable_values: None

optimization_options¶

Optimization options that turn on/off some fusions.

type: Dict[str, Any] | onnxruntime.transformers.fusion_options.FusionOptions

default_value: None

searchable_values: None

opt_level¶

Graph optimization level of Onnx Runtime: 0 - disable all (default), 1 - basic, 2 - extended, 99 - all.

type: Any

default_value: None

searchable_values: None

use_gpu¶

Flag for GPU inference.

type: bool

default_value: False

searchable_values: None

only_onnxruntime¶

Whether only use onnxruntime to optimize model, and no python fusion.

type: bool

default_value: False

searchable_values: None

float16¶

Whether half-precision float will be used.

type: bool

default_value: False

searchable_values: None

input_int32¶

Whether int32 tensors will be used as input.

type: bool

default_value: False

searchable_values: None

keep_io_types¶

Keep input and output tensors in their original data type

type: bool

default_value: True

searchable_values: None

force_fp32_ops¶

Operators that are forced to run in float32

type: List[str]

default_value: None

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OrtPerfTuning¶

Optimize ONNX Runtime inference settings.

Input: ONNXModel

Output: ONNXModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_config¶

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

data_dir¶

Directory of sample inference data.

type: pathlib.Path | str

default_value: None

searchable_values: None

dataloader_func¶

Dataloader function to load data from given data_dir with given batch size.

type: Callable | str

default_value: None

searchable_values: None

batch_size¶

Batch size for inference.

type: int

default_value: None

searchable_values: None

input_names¶

Input names list for ONNX model.

type: list

default_value: None

searchable_values: None

input_shapes¶

Input shapes list for ONNX model.

type: list

default_value: None

searchable_values: None

input_types¶

Input types list for ONNX model.

type: list

default_value: None

searchable_values: None

device¶

Device selected for tuning process.

type: str

default_value: cpu

searchable_values: None

cpu_cores¶

CPU cores used for thread tuning.

type: int

default_value: None

searchable_values: None

io_bind¶

Whether enable IOBinding Search for ONNX Runtime inference.

type: bool

default_value: False

searchable_values: None

providers_list¶

Execution providers framework list to execute the ONNX models.

type: list

default_value: None

searchable_values: None

execution_mode_list¶

Parallelism list between operators.

type: list

default_value: None

searchable_values: None

opt_level_list¶

Optimization level list for ONNX model.

type: list

default_value: None

searchable_values: None

trt_fp16_enable¶

Whether enable FP16 mode for TensorRT execution provider.

type: bool

default_value: False

searchable_values: None

intra_thread_num_list¶

List of intra thread number for test.

type: list

default_value: [None]

searchable_values: None

inter_thread_num_list¶

List of inter thread number for test.

type: list

default_value: [None]

searchable_values: None

extra_session_config¶

Extra customized session options during tuning process.

type: Dict[str, Any]

default_value: None

searchable_values: None

OnnxFloatToFloat16¶

Converts a model to float16. It is based on onnxconverter-common.convert_float_to_float16. See https://onnxruntime.ai/docs/performance/model-optimizations/float16.html#float16-conversion

Input: ONNXModel

Output: ONNXModel

min_positive_val¶

Constant values will be clipped against this value

type: float

default_value: 1e-07

searchable_values: None

max_finite_val¶

Constant values will be clipped against this value

type: float

default_value: 10000.0

searchable_values: None

keep_io_types¶

Whether model inputs/outputs should be left as float32

type: bool

default_value: False

searchable_values: None

disable_shape_infer¶

Skips running onnx shape/type inference.

type: bool

default_value: False

searchable_values: None

op_block_list¶

List of op types to leave as float32

type: List[str]

default_value: None

searchable_values: None

node_block_list¶

List of node names to leave as float32

type: List[str]

default_value: None

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OrtMixedPrecision¶

Convert model to mixed precision.

Input: ONNXModel

Output: ONNXModel

op_block_list¶

List of op types to leave as float32

type: List[str]

default_value: [‘SimplifiedLayerNormalization’, ‘SkipSimplifiedLayerNormalization’, ‘Relu’, ‘Add’]

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxDynamicQuantization¶

ONNX Dynamic Quantization Pass

Input: ONNXModel

Output: ONNXModel

data_config¶

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

quant_mode¶

dynamic quantization mode

type: str

default_value: dynamic

searchable_values: None

weight_type¶

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize¶

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize¶

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude¶

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

per_channel¶

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

reduce_range¶

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

optimize_model¶

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_preprocess¶

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

extra.Sigmoid.nnapi¶

type: bool

default_value: False

searchable_values: None

ActivationSymmetric¶

symmetrize calibration data for activations

type: bool

default_value: False

searchable_values: None

WeightSymmetric¶

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

EnableSubgraph¶

If enabled, subgraph will be quantized. Dynamic mode currently is supported.

type: bool

default_value: False

searchable_values: None

ForceQuantizeNoInputCheck¶

By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.

type: bool

default_value: False

searchable_values: None

MatMulConstBOnly¶

If enabled, only MatMul with const B will be quantized.

type: bool

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: None

extra_options¶

Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxStaticQuantization¶

ONNX Static Quantization Pass

Input: ONNXModel

Output: ONNXModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_config¶

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

quant_mode¶

static quantization mode

type: str

default_value: static

searchable_values: None

weight_type¶

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize¶

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize¶

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude¶

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

per_channel¶

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

reduce_range¶

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

optimize_model¶

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_preprocess¶

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

data_dir¶

Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size¶

Batch size for calibration, required if quant_mode is ‘static’.

type: int

default_value: 1

searchable_values: None

dataloader_func¶

Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’

type: Callable | str

default_value: None

searchable_values: None

calibrate_method¶

Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.

type: str

default_value: MinMax

searchable_values: Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])

quant_format¶

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: QDQ

searchable_values: Categorical([‘QOperator’, ‘QDQ’])

activation_type¶

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default_value: QInt8

searchable_values: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))

extra.Sigmoid.nnapi¶

type: bool

default_value: False

searchable_values: None

ActivationSymmetric¶

symmetrize calibration data for activations

type: bool

default_value: False

searchable_values: None

WeightSymmetric¶

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

EnableSubgraph¶

If enabled, subgraph will be quantized. Dynamic mode currently is supported.

type: bool

default_value: False

searchable_values: None

ForceQuantizeNoInputCheck¶

By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.

type: bool

default_value: False

searchable_values: None

MatMulConstBOnly¶

If enabled, only MatMul with const B will be quantized.

type: bool

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: None

extra_options¶

Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxQuantization¶

Quantize ONNX model with onnxruntime where we can search for best parameters for static/dynamic quantization at same time.

Input: ONNXModel

Output: ONNXModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_config¶

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

quant_mode¶

Onnx Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.

type: str

default_value: static

searchable_values: Categorical([‘dynamic’, ‘static’])

weight_type¶

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize¶

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize¶

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude¶

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

per_channel¶

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

reduce_range¶

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

optimize_model¶

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_preprocess¶

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

data_dir¶

Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size¶

Batch size for calibration, required if quant_mode is ‘static’.

type: int

default_value: 1

searchable_values: None

dataloader_func¶

Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’

type: Callable | str

default_value: None

searchable_values: None

calibrate_method¶

Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘MinMax’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

quant_format¶

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QDQ’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

activation_type¶

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QInt8’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: Conditional(parents: (‘quant_mode’, ‘quant_format’, ‘weight_type’), support: {(‘static’, ‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘static’, ‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

extra.Sigmoid.nnapi¶

type: bool

default_value: False

searchable_values: None

ActivationSymmetric¶

symmetrize calibration data for activations

type: bool

default_value: False

searchable_values: None

WeightSymmetric¶

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

EnableSubgraph¶

If enabled, subgraph will be quantized. Dynamic mode currently is supported.

type: bool

default_value: False

searchable_values: None

ForceQuantizeNoInputCheck¶

By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.

type: bool

default_value: False

searchable_values: None

MatMulConstBOnly¶

If enabled, only MatMul with const B will be quantized.

type: bool

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: None

extra_options¶

Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

IncDynamicQuantization¶

Intel® Neural Compressor Dynamic Quantization Pass

Input: ONNXModel

Output: ONNXModel

data_config¶

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

approach¶

dynamic quantization mode

type: str

default_value: dynamic

searchable_values: None

device¶

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

searchable_values: None

backend¶

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

searchable_values: None

domain¶

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

searchable_values: None

recipes¶

Recipes for Intel® Neural Compressor quantiztaion, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocess and postprocess ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only vaild for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only vaild for onnxrt_trt_ep

type: dict

default_value: {}

searchable_values: None

reduce_range¶

Whether use 7 bit to quantization.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_level¶

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details

type: str

default_value: auto

searchable_values: None

excluded_precisions¶

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

searchable_values: None

use_distributed_tuning¶

Intel® Neural Compressor provides distributed tuning to speed up the tuning process by leveraging the multi-node cluster. Prerequisites: A working MPI implementation and installed mpi4py.

type: bool

default_value: False

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

IncStaticQuantization¶

Intel® Neural Compressor Static Quantization Pass

Input: ONNXModel

Output: ONNXModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_config¶

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

approach¶

static quantization mode

type: str

default_value: static

searchable_values: None

device¶

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

searchable_values: None

backend¶

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

searchable_values: None

domain¶

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

searchable_values: None

recipes¶

Recipes for Intel® Neural Compressor quantiztaion, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocess and postprocess ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only vaild for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only vaild for onnxrt_trt_ep

type: dict

default_value: {}

searchable_values: None

reduce_range¶

Whether use 7 bit to quantization.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_level¶

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details

type: str

default_value: auto

searchable_values: None

excluded_precisions¶

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

searchable_values: None

use_distributed_tuning¶

Intel® Neural Compressor provides distributed tuning to speed up the tuning process by leveraging the multi-node cluster. Prerequisites: A working MPI implementation and installed mpi4py.

type: bool

default_value: False

searchable_values: None

data_dir¶

Path to the directory containing the dataset. For local data, it is required if approach is ‘static’.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size¶

Batch size for calibration, required if approach is ‘static’.

type: int

default_value: 1

searchable_values: None

dataloader_func¶

Function/function name to generate dataloader for calibration, required if approach is ‘static’

type: Callable | str

required: True

quant_format¶

Quantization format. Support ‘QDQ’ and ‘QOperator’.

type: str

default_value: QOperator

searchable_values: Categorical([‘QOperator’, ‘QDQ’])

calibration_sampling_size¶

Number of calibration sample.

type: list | int

default_value: [100]

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

IncQuantization¶

Quantize ONNX model with Intel® Neural Compressor.

Input: ONNXModel

Output: ONNXModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_config¶

Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.

type: olive.data.config.DataConfig | str

default_value: None

searchable_values: None

approach¶

Intel® Neural Compressor Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.

type: str

default_value: static

searchable_values: Categorical([‘dynamic’, ‘static’])

device¶

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

searchable_values: None

backend¶

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

searchable_values: None

domain¶

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

searchable_values: None

recipes¶

Recipes for Intel® Neural Compressor quantiztaion, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocess and postprocess ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only vaild for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only vaild for onnxrt_trt_ep

type: dict

default_value: {}

searchable_values: None

reduce_range¶

Whether use 7 bit to quantization.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_level¶

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details

type: str

default_value: auto

searchable_values: None

excluded_precisions¶

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

searchable_values: None

use_distributed_tuning¶

Intel® Neural Compressor provides distributed tuning to speed up the tuning process by leveraging the multi-node cluster. Prerequisites: A working MPI implementation and installed mpi4py.

type: bool

default_value: False

searchable_values: None

data_dir¶

Path to the directory containing the dataset. For local data, it is required if approach is ‘static’.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size¶

Batch size for calibration, required if approach is ‘static’.

type: int

default_value: 1

searchable_values: None

dataloader_func¶

Function/function name to generate dataloader for calibration, required if approach is ‘static’

type: Callable | str

required: True

quant_format¶

Quantization format. Support ‘QDQ’ and ‘QOperator’.

type: str

default_value: QOperator

searchable_values: Conditional(parents: (‘approach’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([‘default’]))

calibration_sampling_size¶

Number of calibration sample.

type: list | int

default_value: [100]

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

AppendPrePostProcessingOps¶

Add Pre/Post nodes to the input model

Input: ONNXModel

Output: ONNXModel

pre¶

List of pre-processing commands to add.

type: List[str]

default_value: None

searchable_values: None

post¶

List of post-processing commands to add.

type: List[str]

default_value: None

searchable_values: None

tool_command¶

Composited tool commands to invoke.

type: str

default_value: None

searchable_values: None

tool_command_args¶

Arguments to pass to tool command.

type: Dict[str, Any]

default_value: None

searchable_values: None

target_opset¶

The version of the default (ai.onnx) opset to target.

type: int

default_value: 16

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

InsertBeamSearch¶

Insert Beam Search Op.

Input: OliveModel

Output: ONNXModel

no_repeat_ngram_size¶

If set to int > 0, all ngrams of that size can only occur once.

type: int

default_value: 3

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

QuantizationAwareTraining¶

Run quantization aware training on PyTorch model.

Input: PyTorchModel

Output: PyTorchModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

train_data_dir¶

Directory of training data.

type: str

default_value: None

searchable_values: None

val_data_dir¶

Directory of validation data.

type: str

default_value: None

searchable_values: None

train_dataloader_func¶

Dataloader function to load training data from given train_data_dir with given train_batch_size.

type: Callable | str

default_value: None

searchable_values: None

training_loop_func¶

Customized training loop function.

type: Callable | str

default_value: None

searchable_values: None

ptl_module¶

LightningModule for PyTorch Lightning trainer. It is a way of encapsulating all the logic related to the training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html for more details.

type: Callable | str

default_value: None

searchable_values: None

ptl_data_module¶

LightningDataModule for PyTorch Lightning trainer. It is a way of encapsulating all the data-related logic for training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html for more details.

type: Callable | str

default_value: None

searchable_values: None

train_batch_size¶

Batch size for training.

type: int

default_value: None

searchable_values: None

num_epochs¶

Maximum number of epochs for training.

type: int

default_value: None

searchable_values: None

num_steps¶

Maximum number of steps for training.

type: int

default_value: -1

searchable_values: None

do_validate¶

Whether perform one evaluation epoch over the validation set after training.

type: bool

default_value: False

searchable_values: None

modules_to_fuse¶

List of list of module names to fuse.

type: List[List[str]]

default_value: None

searchable_values: None

qconfig_func¶

Customized function to create a QConfig for QAT. Please refer to https://pytorch.org/docs/stable/generated/torch.quantization.qconfig.QConfig.html for details.

type: Callable | str

default_value: None

searchable_values: None

logger¶

Logger for training.

type: pytorch_lightning.loggers.logger.Logger | Iterable[pytorch_lightning.loggers.logger.Logger] | Callable | bool

default_value: False

searchable_values: None

gpus¶

Number of GPUs to use.

type: int

default_value: None

searchable_values: None

seed¶

Random seed for training.

type: int

default_value: None

searchable_values: None

checkpoint_path¶

Path to save checkpoints.

type: str

default_value: None

searchable_values: None

OpenVINOConversion¶

Converts PyTorch, ONNX or TensorFlow Model to OpenVino Model.

Input: PyTorchModel | ONNXModel

Output: OpenVINOModel

input¶

Input can be set by passing a list of tuples. Each tuple should contain input name and optionally input type or input shape.

type: List[Tuple]

default_value: None

searchable_values: None

input_shape¶

Input shape(s) that should be fed to an input node(s) of the model. Shape is defined as a comma-separated list of integer numbers enclosed in parentheses or square brackets, for example [1,3,227,227].

type: List[int]

default_value: None

searchable_values: None

extra_config¶

Extra configurations for OpenVINO model conversion. extra_config can be set by passing a dictionary where key is the parameter name, and the value is the parameter value. Please check ‘mo’ command usage instruction for available parameters: https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html

type: Dict

default_value: None

searchable_values: None

OpenVINOQuantization¶

Post-training quantization for OpenVINO model. Please refer to https://docs.openvino.ai/latest/pot_introduction.html for more details.

Input: OpenVINOModel

Output: OpenVINOModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

engine_config¶

Specific config for openvino.tools.pot.IEEngine. ‘engine_config’ can be set by passing a dictonary, for example engine_config: {‘device’: ‘CPU’}

type: Dict

required: True

dataloader_func¶

A callable function or a str of the function name from ‘user_script’ for the instance of the dataloader.

type: Callable | str

default_value: None

searchable_values: None

data_dir¶

Dataset path. ‘data_dir’ can be by a str or Pathlib.Path.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size¶

Batch size for the dataloader.

type: int

default_value: 1

searchable_values: None

metric_func¶

A callable function or a str of the function name from ‘user_script’ for Metric instance to calculate the accuracy metric of the model.

type: Callable | str

default_value: None

searchable_values: None

algorithms¶

A list defining optimization algorithms and their parameters included in the optimization pipeline. The order in which they are applied to the model in the optimization pipeline is determined by the order in the list. example: algorithms: [{‘name’: ‘DefaultQuantization’, ‘params’: {‘preset’: ‘performance’, ‘stat_subset_size’: 500},}]

type: List[Dict]

required: True

SNPEConversion¶

Convert ONNX or TensorFlow model to SNPE DLC. Uses snpe-tensorflow-to-dlc or snpe-onnx-to-dlc tools from the SNPE SDK.

Input: ONNXModel | TensorFlowModel

Output: SNPEModel

input_names¶

List of input names.

type: List[str]

required: True

input_shapes¶

List of input shapes. Must be the same length as input_names.

type: List[List[int]]

required: True

output_names¶

List of output names.

type: List[str]

required: True

output_shapes¶

List of output shapes. Must be the same length as output_names.

type: List[List[int]]

required: True

input_types¶

List of input types. If not None, it must be a list of the same length as input_names. List members can be None to use default value. Refer to olive.snpe.constants.InputType for valid values.

type: List[str | None]

default_value: None

searchable_values: None

input_layouts¶

List of input layouts. If not None, it must be a list of the same length as input_names. List members can be None to use infered value. Refer to olive.snpe.constants.InputLayout for valid values.

type: List[str | None]

default_value: None

searchable_values: None

extra_args¶

Extra arguments to pass to snpe conversion tool. Refer to snpe-onnx-to-dlc and snpe-tensorflow-to-dlc at https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.

type: str

default_value: None

searchable_values: None

SNPEQuantization¶

Quantize SNPE model. Uses snpe-dlc-quantize tool from the SNPE SDK.

Input: SNPEModel

Output: SNPEModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_dir¶

Path to the data directory.

type: str

required: True

dataloader_func¶

Function or function name to create dataloader for quantization. Function should take data directory as an argument and return a olive.snpe.SNPEDataLoader object.

type: Callable[[str], olive.snpe.data_loader.SNPEDataLoader] | str

required: True

use_enhanced_quantizer¶

Use the enhanced quantizer feature when quantizing the model. Uses an algorithm to determine optimal range instead of min and max range of data. It can be useful for quantizing models that have long tails in the distribution of the data being quantized.

type: bool

default_value: False

searchable_values: Categorical([True, False])

enable_htp¶

Pack HTP information in quantized DLC.

type: bool

default_value: False

searchable_values: Categorical([True, False])

htp_socs¶

List of SoCs to generate HTP Offline cache for.

type: List[str]

default_value: None

searchable_values: None

extra_args¶

Extra arguments to pass to snpe conversion tool. Refer to https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html#tools_snpe-dlc-quantize for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.

type: str

default_value: None

searchable_values: None

SNPEtoONNXConversion¶

Convert a SNPE DLC to ONNX to use with SNPE Execution Provider. Creates a ONNX graph with the SNPE DLC as a node.

Input: SNPEModel

Output: ONNXModel

target_device¶

Target device for the ONNX model. Refer to olive.snpe.SNPEDevice for valid values.

type: str

default_value: cpu

searchable_values: None

target_opset¶

Target ONNX opset version.

type: int

default_value: 12

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

VitisAIQuantization¶

Quantize ONNX model with onnxruntime where we can search for best parameters for vai_q_onnx quantization at same time.

Input: ONNXModel

Output: ONNXModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

quant_mode¶

Onnx Quantization mode. , ‘static’ for vitis ai quantization.

type: str

default_value: static

searchable_values: Categorical([‘static’])

data_dir¶

Path to the directory containing the dataset.

type: pathlib.Path | str

default_value: None

searchable_values: None

batch_size¶

Batch size for calibration, required.

type: int

default_value: 1

searchable_values: None

dataloader_func¶

Function/function name to generate dataloader for calibration, required’

type: Callable | str

required: True

weight_type¶

Data type for quantizing weights which is used in vai_q_onnx quantization. ‘QInt8’ for signed 8-bit integer,

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’])

input_nodes¶

Start node that needs quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

output_nodes¶

End node that needs quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

op_types_to_quantize¶

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize¶

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude¶

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

optimize_model¶

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default_value: False

searchable_values: Categorical([True, False])

use_external_data_format¶

option used for large size (>2GB) model. Set to False by default.

type: bool

default_value: False

searchable_values: None

quant_preprocess¶

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

calibrate_method¶

Current calibration methods supported are NonOverflow and MinMSE, Please use NonOverflow or MinMSE as options.

type: str

default_value: MinMSE

searchable_values: Categorical([‘NonOverflow’, ‘MinMSE’])

quant_format¶

QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: QDQ

searchable_values: Categorical([‘QDQ’])

activation_type¶

Quantization data type of activation.

type: str

default_value: QInt8

searchable_values: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))

ActivationSymmetric¶

symmetrize calibration data for activations

type: bool

default_value: True

searchable_values: None

WeightSymmetric¶

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

AddQDQPairToWeight¶

remains floating-point weight and inserts both QuantizeLinear/DeQuantizeLinear nodes to weight

type: bool

default_value: True

searchable_values: None

extra_options¶

Key value pair dictionary for extra_options in quantization. If an option is one of [‘ActivationSymmetric’, ‘WeightSymmetric’, ‘AddQDQPairToWeight’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

OptimumConversion¶

Convert a Optimum model to ONNX model using the Optimum export function.

Input: OptimumModel

Output: ONNXModel | CompositeOnnxModel

script_dir¶

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

target_opset¶

The version of the default (ai.onnx) opset to target.

type: int

default_value: 14

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OptimumMerging¶

Merges a decoder_model with its decoder_with_past_model via the Optimum library.

Input: CompositeOnnxModel

Output: ONNXModel | CompositeOnnxModel

execution_provider¶

Target execution provider. This parameter will be removed when accelerators/targets are visible to passes.

type: str

default_value: None

searchable_values: None

save_as_external_data¶

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file¶

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name¶

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None