Passes

The following passes are available in Olive.

Each pass is followed by a description of the pass and a list of the pass’s configuration options.

OnnxConversion

Convert a PyTorch model to ONNX model using torch.onnx.export.

Input: PyTorchModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

input_names

List of input names.

type: List[str]

required: True

input_shapes

List of input shapes. Must be provided if input_tensor_func is not provided. It is used to create dummy inputs for the model during onnx export.

type: List[List[int]]

default: None

default_search: None

input_types

List of input types. If provided, must be the same length as input_shapes. Otherwise, defaults to float32 for all inputs. Used with input_shapes to create dummy inputs for the model during onnx export.

type: List[str]

default: None

default_search: None

input_tensor_func

Function (no input) to create dummy inputs for the model. Can be a function (local use) or name of a function to be imported from user script. If provided, input_shapes and input_types will be ignored. Refer to ‘args’ at https://pytorch.org/docs/stable/onnx.html#torch.onnx.export for more details.

type: Callable | str

default: None

default_search: None

output_names

List of output names.

type: List[str]

required: True

dynamic_axes

Dynamic axes for the model. Refer to ‘dynamic_axes’ at https://pytorch.org/docs/stable/onnx.html#torch.onnx.export for more details.

type: dict

default: None

default_search: None

target_opset

The version of the default (ai.onnx) opset to target.

type: int

default: 14

default_search: None

OnnxModelOptimizer

Optimize ONNX model by fusing nodes.

Input: ONNXModel

Output: ONNXModel

OrtTransformersOptimization

Optimize transformer based models in scenarios where ONNX Runtime does not apply the optimization at load time. It is based on onnxruntime.transformers.optimizer.

Input: ONNXModel

Output: ONNXModel

model_type

Transformer based model type, includig bert (exported by PyTorch), gpt2 (exported by PyTorch), bert_tf (BERT exported by tf2onnx), bert_keras (BERT exported by keras2onnx).

type: str

required: True

num_heads

Number of attention heads.

type: int

default: 0

default_search: None

hidden_size

Number of hidden nodes.

type: int

default: 0

default_search: None

optimization_options

Optimization options that turn on/off some fusions.

type: Any

default: None

default_search: None

opt_level

Graph optimization level of Onnx Runtime: 0 - disable all (default), 1 - basic, 2 - extended, 99 - all.

type: Any

default: None

default_search: None

use_gpu

Flag for GPU inference.

type: bool

default: False

default_search: None

only_onnxruntime

Whether only use onnxruntime to optimize model, and no python fusion.

type: bool

default: False

default_search: None

float16

Whether half-precision float will be used.

type: bool

default: False

default_search: None

input_int32

Whether int32 tensors will be used as input.

type: bool

default: False

default_search: None

use_external_data_format

Whether use external data format to store large model (>2GB)

type: bool

default: False

default_search: None

OrtPerfTuning

Optimize ONNX Runtime inference settings.

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

data_dir

Directory of sample inference data.

type: pathlib.Path | str

default: None

default_search: None

dataloader_func

Dataloader function to load data from given data_dir with given batch size.

type: Callable | str

required: True

batch_size

Batch size for inference.

type: int

required: True

device

Device selected for tuning process.

type: str

default: cpu

default_search: None

cpu_cores

CPU cores used for thread tuning.

type: int

default: None

default_search: None

io_bind

Whether enable IOBingding for ONNX Runimte infernece.

type: bool | List[bool]

default: False

default_search: None

providers_list

Execution providers framework list to execute the ONNX models.

type: list

default: None

default_search: None

execution_mode_list

Parallelism list between operators.

type: list

default: None

default_search: None

opt_level_list

Optimization level list for ONNX model.

type: list

default: None

default_search: None

trt_fp16_enable

Whether enable FP16 mode for TensorRT execution provider.

type: bool

default: False

default_search: None

intra_thread_num_list

List of intra thread number for test.

type: list

default: [None]

default_search: None

inter_thread_num_list

List of inter thread number for test.

type: list

default: [None]

default_search: None

extra_session_config

Extra customized session options during tuning process.

type: Dict[str, Any]

default: None

default_search: None

OnnxDynamicQuantization

ONNX Dynamic Quantization Pass

Input: ONNXModel

Output: ONNXModel

quant_mode

dynamic quantization mode

type: str

default: dynamic

default_search: None

weight_type

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default: QInt8

default_search: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default: None

default_search: None

per_channel

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

reduce_range

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

optimize_model

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default: False

default_search: Categorical([True, False])

use_external_data_format

option used for large size (>2GB) model. Set to False by default.

type: bool

default: False

default_search: None

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default: True

default_search: Categorical([True, False])

OnnxStaticQuantization

ONNX Static Quantization Pass

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

quant_mode

static quantization mode

type: str

default: static

default_search: None

weight_type

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default: QInt8

default_search: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default: None

default_search: None

per_channel

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

reduce_range

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

optimize_model

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default: False

default_search: Categorical([True, False])

use_external_data_format

option used for large size (>2GB) model. Set to False by default.

type: bool

default: False

default_search: None

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default: True

default_search: Categorical([True, False])

data_dir

Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.

type: pathlib.Path | str

default: None

default_search: None

batch_size

Batch size for calibration, required if quant_mode is ‘static’.

type: int

default: 1

default_search: None

dataloader_func

Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’

type: Callable | str

required: True

calibrate_method

Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.

type: str

default: MinMax

default_search: Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])

quant_format

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default: QDQ

default_search: Categorical([‘QOperator’, ‘QDQ’])

activation_type

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default: QInt8

default_search: Conditional(parents: (‘quant_format’,), support: {(‘QDQ’,): Categorical([‘QInt8’, ‘QUInt8’]), (‘QOperator’,): Categorical([‘QInt8’])}, default: Categorical([None]))

OnnxQuantization

Quantize ONNX model with onnxruntime where we can search for best parameters for static/dynamic quantization at same time.

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

quant_mode

Onnx Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.

type: str

default: static

default_search: Categorical([‘dynamic’, ‘static’])

weight_type

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default: QInt8

default_search: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default: None

default_search: None

per_channel

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

reduce_range

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

optimize_model

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default: False

default_search: Categorical([True, False])

use_external_data_format

option used for large size (>2GB) model. Set to False by default.

type: bool

default: False

default_search: None

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default: True

default_search: Categorical([True, False])

data_dir

Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.

type: pathlib.Path | str

default: None

default_search: None

batch_size

Batch size for calibration, required if quant_mode is ‘static’.

type: int

default: 1

default_search: None

dataloader_func

Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’

type: Callable | str

required: True

calibrate_method

Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.

type: str

default: MinMax

default_search: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])}, default: Categorical([‘Invalid’]))

quant_format

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default: QDQ

default_search: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([‘Invalid’]))

activation_type

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default: QInt8

default_search: Conditional(parents: (‘quant_mode’, ‘quant_format’), support: {(‘static’, ‘QDQ’): Categorical([‘QInt8’, ‘QUInt8’]), (‘static’, ‘QOperator’): Categorical([‘QInt8’])}, default: Categorical([‘Invalid’]))

QuantizationAwareTraining

Run quantization aware training on PyTorch model.

Input: PyTorchModel

Output: PyTorchModel

script_dir

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

train_data_dir

Directory of training data.

type: str

default: None

default_search: None

val_data_dir

Directory of validation data.

type: str

default: None

default_search: None

train_dataloader_func

Dataloader function to load training data from given train_data_dir with given train_batch_size.

type: Callable | str

default: None

default_search: None

training_loop_func

Customized training loop function.

type: Callable | str

default: None

default_search: None

ptl_module

LightningModule for PyTorch Lightning trainer. It is a way of encapsulating all the logic related to the training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html for more details.

type: Callable | str

default: None

default_search: None

ptl_data_module

LightningDataModule for PyTorch Lightning trainer. It is a way of encapsulating all the data-related logic for training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html for more details.

type: Callable | str

default: None

default_search: None

train_batch_size

Batch size for training.

type: int

default: None

default_search: None

num_epochs

Maximum number of epochs for training.

type: int

default: None

default_search: None

num_steps

Maximum number of steps for training.

type: int

default: -1

default_search: None

do_validate

Whether perform one evaluation epoch over the validation set after training.

type: bool

default: False

default_search: None

modules_to_fuse

List of list of module names to fuse.

type: List[List[str]]

default: None

default_search: None

input_shapes

List ot input shapes. It is used to create dummy input for PyTorch model tracing.

type: List[List[int]]

required: True

input_types

List ot input types. It is used to create dummy input for PyTorch model tracing.

type: List[str]

default: None

default_search: None

qconfig_func

Customized function to create a QConfig for QAT. Please refer to https://pytorch.org/docs/stable/generated/torch.quantization.qconfig.QConfig.html for details.

type: Callable | str

default: None

default_search: None

logger

Logger for training.

type: pytorch_lightning.loggers.logger.Logger | Iterable[pytorch_lightning.loggers.logger.Logger] | Callable | bool

default: False

default_search: None

gpus

Number of GPUs to use.

type: int

default: None

default_search: None

seed

Random seed for training.

type: int

default: None

default_search: None

OpenVINOConversion

Converts PyTorch, ONNX or TensorFlow Model to OpenVino Model.

Input: PyTorchModel | ONNXModel

Output: OpenVINOModel

input

Input can be set by passing a list of tuples. Each tuple should contain input name and optionally input type or input shape.

type: List[Tuple]

default: None

default_search: None

input_shape

Input shape(s) that should be fed to an input node(s) of the model. Shape is defined as a comma-separated list of integer numbers enclosed in parentheses or square brackets, for example [1,3,227,227].

type: List[int]

default: None

default_search: None

extra_config

Extra configurations for OpenVINO model conversion. extra_config can be set by passing a dictionary where key is the parameter name, and the value is the parameter value. Please check ‘mo’ command usage instruction for available parameters: https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html

type: Dict

default: None

default_search: None

OpenVINOQuantization

Post-training quantization for OpenVINO model. Please refer to https://docs.openvino.ai/latest/pot_introduction.html for more details.

Input: OpenVINOModel

Output: OpenVINOModel

script_dir

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

engine_config

Specific config for openvino.tools.pot.IEEngine. ‘engine_config’ can be set by passing a dictonary, for example engine_config = {‘device’: ‘CPU’}

type: Dict

required: True

dataloader_func

A callable function or a str of the function name from ‘user_script’ for the instance of the dataloader.

type: Callable | str

default: None

default_search: None

data_dir

Dataset path. ‘data_dir’ can be by a str or Pathlib.Path.

type: pathlib.Path | str

default: None

default_search: None

batch_size

Batch size for the dataloader.

type: int

default: 1

default_search: None

metric_func

A callable function or a str of the function name from ‘user_script’ for Metric instance to calculate the accuracy metric of the model.

type: Callable | str

default: None

default_search: None

algorithms

A list defining optimization algorithms and their parameters included in the optimization pipeline. The order in which they are applied to the model in the optimization pipeline is determined by the order in the list. example: algorithms = [{‘name’: ‘DefaultQuantization’, ‘params’: {‘preset’: ‘performance’, ‘stat_subset_size’: 500},}]

type: List[Dict]

required: True

SNPEConversion

Convert ONNX or TensorFlow model to SNPE DLC. Uses snpe-tensorflow-to-dlc or snpe-onnx-to-dlc tools from the SNPE SDK.

Input: ONNXModel | TensorFlowModel

Output: SNPEModel

input_names

List of input names.

type: List[str]

required: True

input_shapes

List of input shapes. Must be the same length as input_names.

type: List[List[int]]

required: True

output_names

List of output names.

type: List[str]

required: True

output_shapes

List of output shapes. Must be the same length as output_names.

type: List[List[int]]

required: True

input_types

List of input types. If not None, it must be a list of the same length as input_names. List members can be None to use default value. Refer to olive.snpe.constants.InputType for valid values.

type: List[str | None]

default: None

default_search: None

input_layouts

List of input layouts. If not None, it must be a list of the same length as input_names. List members can be None to use infered value. Refer to olive.snpe.constants.InputLayout for valid values.

type: List[str | None]

default: None

default_search: None

extra_args

Extra arguments to pass to snpe conversion tool. Refer to snpe-onnx-to-dlc and snpe-tensorflow-to-dlc at https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.

type: str

default: None

default_search: None

SNPEQuantization

Quantize SNPE model. Uses snpe-dlc-quantize tool from the SNPE SDK.

Input: SNPEModel

Output: SNPEModel

script_dir

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

data_dir

Path to the data directory.

type: str

required: True

dataloader_func

Function or function name to create dataloader for quantization. Function should take data directory as an argument and return a olive.snpe.SNPEDataLoader object.

type: Callable[[str], olive.snpe.data_loader.SNPEDataLoader] | str

required: True

use_enhanced_quantizer

Use the enhanced quantizer feature when quantizing the model. Uses an algorithm to determine optimal range instead of min and max range of data. It can be useful for quantizing models that have long tails in the distribution of the data being quantized.

type: bool

default: False

default_search: Categorical([True, False])

enable_htp

Pack HTP information in quantized DLC.

type: bool

default: False

default_search: Categorical([True, False])

htp_socs

List of SoCs to generate HTP Offline cache for.

type: List[str]

default: None

default_search: None

extra_args

Extra arguments to pass to snpe conversion tool. Refer to https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html#tools_snpe-dlc-quantize for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.

type: str

default: None

default_search: None

SNPEtoONNXConversion

Convert a SNPE DLC to ONNX to use with SNPE Execution Provider. Creates a ONNX graph with the SNPE DLC as a node.

Input: SNPEModel

Output: ONNXModel

target_device

Target device for the ONNX model. Refer to olive.snpe.SNPEDevice for valid values.

type: str

default: cpu

default_search: None

target_opset

Target ONNX opset version.

type: int

default: 12

default_search: None