Passes¶

The following passes are available in Olive.

Each pass is followed by a description of the pass and a list of the pass’s configuration options.

OnnxConversion¶

Convert a PyTorch model to ONNX model using torch.onnx.export.

Input: PyTorchModel

Output: ONNXModel

script_dir¶

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

input_names¶

List of input names.

type: List[str]

required: True

input_shapes¶

List of input shapes. Must be provided if input_tensor_func is not provided. It is used to create dummy inputs for the model during onnx export.

type: List[List[int]]

default: None

default_search: None

input_types¶

List of input types. If provided, must be the same length as input_shapes. Otherwise, defaults to float32 for all inputs. Used with input_shapes to create dummy inputs for the model during onnx export.

type: List[str]

default: None

default_search: None

input_tensor_func¶

Function (no input) to create dummy inputs for the model. Can be a function (local use) or name of a function to be imported from user script. If provided, input_shapes and input_types will be ignored. Refer to ‘args’ at https://pytorch.org/docs/stable/onnx.html#torch.onnx.export for more details.

type: Callable | str

default: None

default_search: None

output_names¶

List of output names.

type: List[str]

required: True

dynamic_axes¶

Dynamic axes for the model. Refer to ‘dynamic_axes’ at https://pytorch.org/docs/stable/onnx.html#torch.onnx.export for more details.

type: dict

default: None

default_search: None

target_opset¶

The version of the default (ai.onnx) opset to target.

type: int

default: 14

default_search: None

OnnxModelOptimizer¶

Optimize ONNX model by fusing nodes.

Input: ONNXModel

Output: ONNXModel

OrtTransformersOptimization¶

Optimize transformer based models in scenarios where ONNX Runtime does not apply the optimization at load time. It is based on onnxruntime.transformers.optimizer.

Input: ONNXModel

Output: ONNXModel

model_type¶

Transformer based model type, includig bert (exported by PyTorch), gpt2 (exported by PyTorch), bert_tf (BERT exported by tf2onnx), bert_keras (BERT exported by keras2onnx).

type: str

required: True

num_heads¶

Number of attention heads.

type: int

default: 0

default_search: None

hidden_size¶

Number of hidden nodes.

type: int

default: 0

default_search: None

optimization_options¶

Optimization options that turn on/off some fusions.

type: Any

default: None

default_search: None

opt_level¶

Graph optimization level of Onnx Runtime: 0 - disable all (default), 1 - basic, 2 - extended, 99 - all.

type: Any

default: None

default_search: None

use_gpu¶

Flag for GPU inference.

type: bool

default: False

default_search: None

only_onnxruntime¶

Whether only use onnxruntime to optimize model, and no python fusion.

type: bool

default: False

default_search: None

float16¶

Whether half-precision float will be used.

type: bool

default: False

default_search: None

input_int32¶

Whether int32 tensors will be used as input.

type: bool

default: False

default_search: None

use_external_data_format¶

Whether use external data format to store large model (>2GB)

type: bool

default: False

default_search: None

OrtPerfTuning¶

Optimize ONNX Runtime inference settings.

Input: ONNXModel

Output: ONNXModel

script_dir¶

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

data_dir¶

Directory of sample inference data.

type: pathlib.Path | str

default: None

default_search: None

dataloader_func¶

Dataloader function to load data from given data_dir with given batch size.

type: Callable | str

required: True

batch_size¶

Batch size for inference.

type: int

required: True

device¶

Device selected for tuning process.

type: str

default: cpu

default_search: None

cpu_cores¶

CPU cores used for thread tuning.

type: int

default: None

default_search: None

io_bind¶

Whether enable IOBingding for ONNX Runimte infernece.

type: bool | List[bool]

default: False

default_search: None

providers_list¶

Execution providers framework list to execute the ONNX models.

type: list

default: None

default_search: None

execution_mode_list¶

Parallelism list between operators.

type: list

default: None

default_search: None

opt_level_list¶

Optimization level list for ONNX model.

type: list

default: None

default_search: None

trt_fp16_enable¶

Whether enable FP16 mode for TensorRT execution provider.

type: bool

default: False

default_search: None

intra_thread_num_list¶

List of intra thread number for test.

type: list

default: [None]

default_search: None

inter_thread_num_list¶

List of inter thread number for test.

type: list

default: [None]

default_search: None

extra_session_config¶

Extra customized session options during tuning process.

type: Dict[str, Any]

default: None

default_search: None

OnnxDynamicQuantization¶

ONNX Dynamic Quantization Pass

Input: ONNXModel

Output: ONNXModel

quant_mode¶

dynamic quantization mode

type: str

default: dynamic

default_search: None

weight_type¶

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default: QInt8

default_search: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize¶

List of operator types to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_quantize¶

List of node names to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_exclude¶

List of node names to exclude from quantization. If None, all quantizable.

type: list

default: None

default_search: None

per_channel¶

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

reduce_range¶

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

optimize_model¶

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default: False

default_search: Categorical([True, False])

use_external_data_format¶

option used for large size (>2GB) model. Set to False by default.

type: bool

default: False

default_search: None

quant_preprocess¶

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default: True

default_search: Categorical([True, False])

OnnxStaticQuantization¶

ONNX Static Quantization Pass

Input: ONNXModel

Output: ONNXModel

script_dir¶

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

quant_mode¶

static quantization mode

type: str

default: static

default_search: None

weight_type¶

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default: QInt8

default_search: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize¶

List of operator types to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_quantize¶

List of node names to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_exclude¶

List of node names to exclude from quantization. If None, all quantizable.

type: list

default: None

default_search: None

per_channel¶

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

reduce_range¶

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

optimize_model¶

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default: False

default_search: Categorical([True, False])

use_external_data_format¶

option used for large size (>2GB) model. Set to False by default.

type: bool

default: False

default_search: None

quant_preprocess¶

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default: True

default_search: Categorical([True, False])

data_dir¶

Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.

type: pathlib.Path | str

default: None

default_search: None

batch_size¶

Batch size for calibration, required if quant_mode is ‘static’.

type: int

default: 1

default_search: None

dataloader_func¶

Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’

type: Callable | str

required: True

calibrate_method¶

Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.

type: str

default: MinMax

default_search: Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])

quant_format¶

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default: QDQ

default_search: Categorical([‘QOperator’, ‘QDQ’])

activation_type¶

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default: QInt8

default_search: Conditional(parents: (‘quant_format’,), support: {(‘QDQ’,): Categorical([‘QInt8’, ‘QUInt8’]), (‘QOperator’,): Categorical([‘QInt8’])}, default: Categorical([None]))

OnnxQuantization¶

Quantize ONNX model with onnxruntime where we can search for best parameters for static/dynamic quantization at same time.

Input: ONNXModel

Output: ONNXModel

script_dir¶

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

quant_mode¶

Onnx Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.

type: str

default: static

default_search: Categorical([‘dynamic’, ‘static’])

weight_type¶

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default: QInt8

default_search: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize¶

List of operator types to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_quantize¶

List of node names to quantize. If None, all quantizable.

type: list

default: None

default_search: None

nodes_to_exclude¶

List of node names to exclude from quantization. If None, all quantizable.

type: list

default: None

default_search: None

per_channel¶

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

reduce_range¶

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default: False

default_search: Categorical([True, False])

optimize_model¶

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default: False

default_search: Categorical([True, False])

use_external_data_format¶

option used for large size (>2GB) model. Set to False by default.

type: bool

default: False

default_search: None

quant_preprocess¶

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default: True

default_search: Categorical([True, False])

data_dir¶

Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.

type: pathlib.Path | str

default: None

default_search: None

batch_size¶

Batch size for calibration, required if quant_mode is ‘static’.

type: int

default: 1

default_search: None

dataloader_func¶

Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’

type: Callable | str

required: True

calibrate_method¶

Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.

type: str

default: MinMax

default_search: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])}, default: Categorical([‘Invalid’]))

quant_format¶

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default: QDQ

default_search: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([‘Invalid’]))

activation_type¶

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default: QInt8

default_search: Conditional(parents: (‘quant_mode’, ‘quant_format’), support: {(‘static’, ‘QDQ’): Categorical([‘QInt8’, ‘QUInt8’]), (‘static’, ‘QOperator’): Categorical([‘QInt8’])}, default: Categorical([‘Invalid’]))

QuantizationAwareTraining¶

Run quantization aware training on PyTorch model.

Input: PyTorchModel

Output: PyTorchModel

script_dir¶

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

train_data_dir¶

Directory of training data.

type: str

default: None

default_search: None

val_data_dir¶

Directory of validation data.

type: str

default: None

default_search: None

train_dataloader_func¶

Dataloader function to load training data from given train_data_dir with given train_batch_size.

type: Callable | str

default: None

default_search: None

training_loop_func¶

Customized training loop function.

type: Callable | str

default: None

default_search: None

ptl_module¶

LightningModule for PyTorch Lightning trainer. It is a way of encapsulating all the logic related to the training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html for more details.

type: Callable | str

default: None

default_search: None

ptl_data_module¶

LightningDataModule for PyTorch Lightning trainer. It is a way of encapsulating all the data-related logic for training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html for more details.

type: Callable | str

default: None

default_search: None

train_batch_size¶

Batch size for training.

type: int

default: None

default_search: None

num_epochs¶

Maximum number of epochs for training.

type: int

default: None

default_search: None

num_steps¶

Maximum number of steps for training.

type: int

default: -1

default_search: None

do_validate¶

Whether perform one evaluation epoch over the validation set after training.

type: bool

default: False

default_search: None

modules_to_fuse¶

List of list of module names to fuse.

type: List[List[str]]

default: None

default_search: None

input_shapes¶

List ot input shapes. It is used to create dummy input for PyTorch model tracing.

type: List[List[int]]

required: True

input_types¶

List ot input types. It is used to create dummy input for PyTorch model tracing.

type: List[str]

default: None

default_search: None

qconfig_func¶

Customized function to create a QConfig for QAT. Please refer to https://pytorch.org/docs/stable/generated/torch.quantization.qconfig.QConfig.html for details.

type: Callable | str

default: None

default_search: None

logger¶

Logger for training.

type: pytorch_lightning.loggers.logger.Logger | Iterable[pytorch_lightning.loggers.logger.Logger] | Callable | bool

default: False

default_search: None

gpus¶

Number of GPUs to use.

type: int

default: None

default_search: None

seed¶

Random seed for training.

type: int

default: None

default_search: None

OpenVINOConversion¶

Converts PyTorch, ONNX or TensorFlow Model to OpenVino Model.

Input: PyTorchModel | ONNXModel

Output: OpenVINOModel

input¶

Input can be set by passing a list of tuples. Each tuple should contain input name and optionally input type or input shape.

type: List[Tuple]

default: None

default_search: None

input_shape¶

Input shape(s) that should be fed to an input node(s) of the model. Shape is defined as a comma-separated list of integer numbers enclosed in parentheses or square brackets, for example [1,3,227,227].

type: List[int]

default: None

default_search: None

extra_config¶

Extra configurations for OpenVINO model conversion. extra_config can be set by passing a dictionary where key is the parameter name, and the value is the parameter value. Please check ‘mo’ command usage instruction for available parameters: https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html

type: Dict

default: None

default_search: None

OpenVINOQuantization¶

Post-training quantization for OpenVINO model. Please refer to https://docs.openvino.ai/latest/pot_introduction.html for more details.

Input: OpenVINOModel

Output: OpenVINOModel

script_dir¶

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

engine_config¶

Specific config for openvino.tools.pot.IEEngine. ‘engine_config’ can be set by passing a dictonary, for example engine_config = {‘device’: ‘CPU’}

type: Dict

required: True

dataloader_func¶

A callable function or a str of the function name from ‘user_script’ for the instance of the dataloader.

type: Callable | str

default: None

default_search: None

data_dir¶

Dataset path. ‘data_dir’ can be by a str or Pathlib.Path.

type: pathlib.Path | str

default: None

default_search: None

batch_size¶

Batch size for the dataloader.

type: int

default: 1

default_search: None

metric_func¶

A callable function or a str of the function name from ‘user_script’ for Metric instance to calculate the accuracy metric of the model.

type: Callable | str

default: None

default_search: None

algorithms¶

A list defining optimization algorithms and their parameters included in the optimization pipeline. The order in which they are applied to the model in the optimization pipeline is determined by the order in the list. example: algorithms = [{‘name’: ‘DefaultQuantization’, ‘params’: {‘preset’: ‘performance’, ‘stat_subset_size’: 500},}]

type: List[Dict]

required: True

SNPEConversion¶

Convert ONNX or TensorFlow model to SNPE DLC. Uses snpe-tensorflow-to-dlc or snpe-onnx-to-dlc tools from the SNPE SDK.

Input: ONNXModel | TensorFlowModel

Output: SNPEModel

input_names¶

List of input names.

type: List[str]

required: True

input_shapes¶

List of input shapes. Must be the same length as input_names.

type: List[List[int]]

required: True

output_names¶

List of output names.

type: List[str]

required: True

output_shapes¶

List of output shapes. Must be the same length as output_names.

type: List[List[int]]

required: True

input_types¶

List of input types. If not None, it must be a list of the same length as input_names. List members can be None to use default value. Refer to olive.snpe.constants.InputType for valid values.

type: List[str | None]

default: None

default_search: None

input_layouts¶

List of input layouts. If not None, it must be a list of the same length as input_names. List members can be None to use infered value. Refer to olive.snpe.constants.InputLayout for valid values.

type: List[str | None]

default: None

default_search: None

extra_args¶

Extra arguments to pass to snpe conversion tool. Refer to snpe-onnx-to-dlc and snpe-tensorflow-to-dlc at https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.

type: str

default: None

default_search: None

SNPEQuantization¶

Quantize SNPE model. Uses snpe-dlc-quantize tool from the SNPE SDK.

Input: SNPEModel

Output: SNPEModel

script_dir¶

Directory containing user script dependencies.

type: str

default: None

default_search: None

user_script¶

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default: None

default_search: None

data_dir¶

Path to the data directory.

type: str

required: True

dataloader_func¶

Function or function name to create dataloader for quantization. Function should take data directory as an argument and return a olive.snpe.SNPEDataLoader object.

type: Callable[[str], olive.snpe.data_loader.SNPEDataLoader] | str

required: True

use_enhanced_quantizer¶

Use the enhanced quantizer feature when quantizing the model. Uses an algorithm to determine optimal range instead of min and max range of data. It can be useful for quantizing models that have long tails in the distribution of the data being quantized.

type: bool

default: False

default_search: Categorical([True, False])

enable_htp¶

Pack HTP information in quantized DLC.

type: bool

default: False

default_search: Categorical([True, False])

htp_socs¶

List of SoCs to generate HTP Offline cache for.

type: List[str]

default: None

default_search: None

extra_args¶

Extra arguments to pass to snpe conversion tool. Refer to https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html#tools_snpe-dlc-quantize for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.

type: str

default: None

default_search: None

SNPEtoONNXConversion¶

Convert a SNPE DLC to ONNX to use with SNPE Execution Provider. Creates a ONNX graph with the SNPE DLC as a node.

Input: SNPEModel

Output: ONNXModel

target_device¶

Target device for the ONNX model. Refer to olive.snpe.SNPEDevice for valid values.

type: str

default: cpu

default_search: None

target_opset¶

Target ONNX opset version.

type: int

default: 12

default_search: None