Passes

The following passes are available in Olive.

Each pass is followed by a description of the pass and a list of the pass’s configuration options.

OnnxConversion

Convert a PyTorch model to ONNX model using torch.onnx.export on CPU.

Input: PyTorchModel

Output: ONNXModel | CompositeOnnxModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

target_opset

The version of the default (ai.onnx) opset to target.

type: int

default_value: 13

searchable_values: None

use_dynamo_exporter

Whether to use dynamo_export API to export ONNX model.

type: bool

default_value: False

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

DeviceSpecificOnnxConversion

Convert a PyTorch model to ONNX model using torch.onnx.export by using specific hardware device.

Input: PyTorchModel

Output: ONNXModel | CompositeOnnxModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

target_opset

The version of the default (ai.onnx) opset to target.

type: int

default_value: 13

searchable_values: None

use_dynamo_exporter

Whether to use dynamo_export API to export ONNX model.

type: bool

default_value: False

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxModelOptimizer

Optimize ONNX model by fusing nodes.

Input: ONNXModel

Output: ONNXModel

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OrtTransformersOptimization

Optimize transformer based models in scenarios where ONNX Runtime does not apply the optimization at load time. It is based on onnxruntime.transformers.optimizer.

Input: ONNXModel

Output: ONNXModel

model_type

Transformer based model type, including bert (exported by PyTorch), gpt2 (exported by PyTorch), bert_tf (BERT exported by tf2onnx), bert_keras (BERT exported by keras2onnx), and unet/vae/clip (stable diffusion).

type: str

default_value: None

searchable_values: None

num_heads

Number of attention heads.

type: int

default_value: 0

searchable_values: None

hidden_size

Number of hidden nodes.

type: int

default_value: 0

searchable_values: None

optimization_options

Optimization options that turn on/off some fusions.

type: Dict[str, Any] | onnxruntime.transformers.fusion_options.FusionOptions

default_value: None

searchable_values: None

opt_level

Graph optimization level of Onnx Runtime: 0 - disable all (default), 1 - basic, 2 - extended, 99 - all.

type: Any

default_value: None

searchable_values: Categorical([0, 1, 2, 99])

use_gpu

Flag for GPU inference.

type: bool

default_value: False

searchable_values: None

only_onnxruntime

Whether only use onnxruntime to optimize model, and no python fusion.

type: bool

default_value: False

searchable_values: Categorical([True, False])

float16

Whether half-precision float will be used.

type: bool

default_value: False

searchable_values: None

input_int32

Whether int32 tensors will be used as input.

type: bool

default_value: False

searchable_values: None

keep_io_types

Keep input and output tensors in their original data type

type: bool

default_value: True

searchable_values: None

force_fp32_ops

Operators that are forced to run in float32

type: List[str]

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OrtPerfTuning

Optimize ONNX Runtime inference settings.

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_dir

Directory of sample inference data.

type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None

default_value: None

searchable_values: None

dataloader_func

Dataloader function to load data from given data_dir with given batch size.

type: Callable | str

default_value: None

searchable_values: None

batch_size

Batch size for inference.

type: int

default_value: None

searchable_values: None

data_config

Data config to load data for computing latency.

type: olive.data.config.DataConfig | Dict

default_value: None

searchable_values: None

input_names

Input names list for ONNX model.

type: list

default_value: None

searchable_values: None

input_shapes

Input shapes list for ONNX model.

type: list

default_value: None

searchable_values: None

input_types

Input types list for ONNX model.

type: list

default_value: None

searchable_values: None

device

Device selected for tuning process.

type: str

default_value: cpu

searchable_values: None

cpu_cores

CPU cores used for thread tuning.

type: int

default_value: None

searchable_values: None

io_bind

Whether enable IOBinding Search for ONNX Runtime inference.

type: bool

default_value: False

searchable_values: None

enable_cuda_graph

Whether enable CUDA Graph for CUDA execution provider.

type: bool

default_value: False

searchable_values: None

providers_list

Execution providers framework list to execute the ONNX models.

type: list

default_value: None

searchable_values: None

execution_mode_list

Parallelism list between operators.

type: list

default_value: None

searchable_values: None

opt_level_list

Optimization level list for ONNX model.

type: list

default_value: None

searchable_values: None

trt_fp16_enable

Whether enable FP16 mode for TensorRT execution provider.

type: bool

default_value: False

searchable_values: None

intra_thread_num_list

List of intra thread number for test.

type: list

default_value: [None]

searchable_values: None

inter_thread_num_list

List of inter thread number for test.

type: list

default_value: [None]

searchable_values: None

extra_session_config

Extra customized session options during tuning process.

type: Dict[str, Any]

default_value: None

searchable_values: None

OnnxFloatToFloat16

Converts a model to float16. It is based on onnxconverter-common.convert_float_to_float16. See https://onnxruntime.ai/docs/performance/model-optimizations/float16.html#float16-conversion

Input: ONNXModel

Output: ONNXModel

min_positive_val

Constant values will be clipped against this value

type: float

default_value: 1e-07

searchable_values: None

max_finite_val

Constant values will be clipped against this value

type: float

default_value: 10000.0

searchable_values: None

keep_io_types

Whether model inputs/outputs should be left as float32

type: bool

default_value: False

searchable_values: None

disable_shape_infer

Skips running onnx shape/type inference.

type: bool

default_value: False

searchable_values: None

op_block_list

List of op types to leave as float32

type: List[str]

default_value: None

searchable_values: None

node_block_list

List of node names to leave as float32

type: List[str]

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OrtMixedPrecision

Convert model to mixed precision.

Input: ONNXModel

Output: ONNXModel

op_block_list

List of op types to leave as float32

type: List[str]

default_value: [‘SimplifiedLayerNormalization’, ‘SkipSimplifiedLayerNormalization’, ‘Relu’, ‘Add’]

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxDynamicQuantization

ONNX Dynamic Quantization Pass

Input: ONNXModel

Output: ONNXModel

quant_mode

dynamic quantization mode

type: str

default_value: dynamic

searchable_values: None

weight_type

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

per_channel

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

reduce_range

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

extra.Sigmoid.nnapi

type: bool

default_value: False

searchable_values: None

ActivationSymmetric

symmetrize calibration data for activations

type: bool

default_value: False

searchable_values: None

WeightSymmetric

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

EnableSubgraph

If enabled, subgraph will be quantized. Dynamic mode currently is supported.

type: bool

default_value: False

searchable_values: None

ForceQuantizeNoInputCheck

By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.

type: bool

default_value: False

searchable_values: None

MatMulConstBOnly

If enabled, only MatMul with const B will be quantized.

type: bool

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: None

extra_options

Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxStaticQuantization

ONNX Static Quantization Pass

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

quant_mode

static quantization mode

type: str

default_value: static

searchable_values: None

weight_type

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

per_channel

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

reduce_range

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

data_dir

Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’ and dataloader_func is provided.

type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None

default_value: None

searchable_values: None

batch_size

Batch size for calibration, only used if dataloader_func is provided.

type: int

default_value: 1

searchable_values: None

dataloader_func

Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’ and data_config is None.

type: Callable | str

default_value: None

searchable_values: None

data_config

Data config for calibration, required if quant_mode is ‘static’ and dataloader_func is None.

type: olive.data.config.DataConfig | Dict

default_value: None

searchable_values: None

calibrate_method

Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.

type: str

default_value: MinMax

searchable_values: Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])

quant_format

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: QDQ

searchable_values: Categorical([‘QOperator’, ‘QDQ’])

activation_type

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default_value: QInt8

searchable_values: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))

extra.Sigmoid.nnapi

type: bool

default_value: False

searchable_values: None

ActivationSymmetric

symmetrize calibration data for activations

type: bool

default_value: False

searchable_values: None

WeightSymmetric

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

EnableSubgraph

If enabled, subgraph will be quantized. Dynamic mode currently is supported.

type: bool

default_value: False

searchable_values: None

ForceQuantizeNoInputCheck

By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.

type: bool

default_value: False

searchable_values: None

MatMulConstBOnly

If enabled, only MatMul with const B will be quantized.

type: bool

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: None

extra_options

Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OnnxQuantization

Quantize ONNX model with onnxruntime where we can search for best parameters for static/dynamic quantization at same time.

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

quant_mode

Onnx Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.

type: str

default_value: static

searchable_values: Categorical([‘dynamic’, ‘static’])

weight_type

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

per_channel

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

reduce_range

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

data_dir

Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’ and dataloader_func is provided.

type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None

default_value: None

searchable_values: None

batch_size

Batch size for calibration, only used if dataloader_func is provided.

type: int

default_value: 1

searchable_values: None

dataloader_func

Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’ and data_config is None.

type: Callable | str

default_value: None

searchable_values: None

data_config

Data config for calibration, required if quant_mode is ‘static’ and dataloader_func is None.

type: olive.data.config.DataConfig | Dict

default_value: None

searchable_values: None

calibrate_method

Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘MinMax’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

quant_format

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QDQ’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

activation_type

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QInt8’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: Conditional(parents: (‘quant_mode’, ‘quant_format’, ‘weight_type’), support: {(‘static’, ‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘static’, ‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

extra.Sigmoid.nnapi

type: bool

default_value: False

searchable_values: None

ActivationSymmetric

symmetrize calibration data for activations

type: bool

default_value: False

searchable_values: None

WeightSymmetric

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

EnableSubgraph

If enabled, subgraph will be quantized. Dynamic mode currently is supported.

type: bool

default_value: False

searchable_values: None

ForceQuantizeNoInputCheck

By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.

type: bool

default_value: False

searchable_values: None

MatMulConstBOnly

If enabled, only MatMul with const B will be quantized.

type: bool

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)

searchable_values: None

extra_options

Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

IncDynamicQuantization

Intel® Neural Compressor Dynamic Quantization Pass

Input: ONNXModel

Output: ONNXModel

approach

dynamic quantization mode

type: str

default_value: dynamic

searchable_values: None

device

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

searchable_values: None

backend

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

searchable_values: None

domain

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

searchable_values: None

recipes

Recipes for Intel® Neural Compressor quantization, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocessing and postprocessing ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only valid for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only valid for onnxrt_trt_ep

type: dict

default_value: {}

searchable_values: None

reduce_range

Whether use 7 bit to quantization.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_level

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details

type: str

default_value: auto

searchable_values: None

excluded_precisions

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

searchable_values: None

tuning_criterion

Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs, timeout, max_trials and objective.

type: dict

default_value: {‘strategy’: ‘basic’, ‘strategy_kwargs’: None, ‘timeout’: 0, ‘max_trials’: 5, ‘objective’: ‘performance’}

searchable_values: None

metric

Accuracy metric to generate an evaluation function for Intel® Neural Compressor accuracy aware tuning.

type: olive.evaluator.metric.Metric | None

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

IncStaticQuantization

Intel® Neural Compressor Static Quantization Pass

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

approach

static quantization mode

type: str

default_value: static

searchable_values: None

device

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

searchable_values: None

backend

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

searchable_values: None

domain

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

searchable_values: None

recipes

Recipes for Intel® Neural Compressor quantization, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocessing and postprocessing ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only valid for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only valid for onnxrt_trt_ep

type: dict

default_value: {}

searchable_values: None

reduce_range

Whether use 7 bit to quantization.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_level

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details

type: str

default_value: auto

searchable_values: None

excluded_precisions

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

searchable_values: None

tuning_criterion

Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs, timeout, max_trials and objective.

type: dict

default_value: {‘strategy’: ‘basic’, ‘strategy_kwargs’: None, ‘timeout’: 0, ‘max_trials’: 5, ‘objective’: ‘performance’}

searchable_values: None

metric

Accuracy metric to generate an evaluation function for Intel® Neural Compressor accuracy aware tuning.

type: olive.evaluator.metric.Metric | None

default_value: None

searchable_values: None

data_dir

Path to the directory containing the dataset. For local data, it is required if approach is ‘static’ and dataloader_func is provided.

type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None

default_value: None

searchable_values: None

batch_size

Batch size for calibration, only used if dataloader_func is provided.

type: int

default_value: 1

searchable_values: None

dataloader_func

Function/function name to generate dataloader for calibration, required if approach is ‘static’ and data_config is None.

type: Callable | str

default_value: None

searchable_values: None

data_config

Data config for calibration, required if approach is ‘static’ and dataloader_func is None.

type: olive.data.config.DataConfig | Dict

default_value: None

searchable_values: None

quant_format

Quantization format. Support ‘QDQ’ and ‘QOperator’.

type: str

default_value: QOperator

searchable_values: Categorical([‘QOperator’, ‘QDQ’])

calibration_sampling_size

Number of calibration sample.

type: list | int

default_value: [100]

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

IncQuantization

Quantize ONNX model with Intel® Neural Compressor.

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

approach

Intel® Neural Compressor Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.

type: str

default_value: static

searchable_values: Categorical([‘dynamic’, ‘static’])

device

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

searchable_values: None

backend

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

searchable_values: None

domain

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

searchable_values: None

recipes

Recipes for Intel® Neural Compressor quantization, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocessing and postprocessing ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only valid for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only valid for onnxrt_trt_ep

type: dict

default_value: {}

searchable_values: None

reduce_range

Whether use 7 bit to quantization.

type: bool

default_value: False

searchable_values: Categorical([True, False])

quant_level

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details

type: str

default_value: auto

searchable_values: None

excluded_precisions

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

searchable_values: None

tuning_criterion

Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs, timeout, max_trials and objective.

type: dict

default_value: {‘strategy’: ‘basic’, ‘strategy_kwargs’: None, ‘timeout’: 0, ‘max_trials’: 5, ‘objective’: ‘performance’}

searchable_values: None

metric

Accuracy metric to generate an evaluation function for Intel® Neural Compressor accuracy aware tuning.

type: olive.evaluator.metric.Metric | None

default_value: None

searchable_values: None

data_dir

Path to the directory containing the dataset. For local data, it is required if approach is ‘static’ and dataloader_func is provided.

type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None

default_value: None

searchable_values: None

batch_size

Batch size for calibration, only used if dataloader_func is provided.

type: int

default_value: 1

searchable_values: None

dataloader_func

Function/function name to generate dataloader for calibration, required if approach is ‘static’ and data_config is None.

type: Callable | str

default_value: None

searchable_values: None

data_config

Data config for calibration, required if approach is ‘static’ and dataloader_func is None.

type: olive.data.config.DataConfig | Dict

default_value: None

searchable_values: None

quant_format

Quantization format. Support ‘QDQ’ and ‘QOperator’.

type: str

default_value: QOperator

searchable_values: Conditional(parents: (‘approach’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([‘default’]))

calibration_sampling_size

Number of calibration sample.

type: list | int

default_value: [100]

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

AppendPrePostProcessingOps

Add Pre/Post nodes to the input model

Input: ONNXModel

Output: ONNXModel

pre

List of pre-processing commands to add.

type: List[Dict[str, Any]]

default_value: None

searchable_values: None

post

List of post-processing commands to add.

type: List[Dict[str, Any]]

default_value: None

searchable_values: None

tool_command

Composited tool commands to invoke.

type: str

default_value: None

searchable_values: None

tool_command_args

Arguments to pass to tool command or to PrePostProcessor. If it is used for PrePostProcessor, the schema would like: { “name”: “image”, “data_type”: “uint8”, “shape”: [“num_bytes”],

type: Dict[str, Any] | List[olive.passes.onnx.append_pre_post_processing_ops.PrePostProcessorInput]

default_value: None

searchable_values: None

target_opset

The version of the default (ai.onnx) opset to target.

type: int

default_value: 16

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

InsertBeamSearch

Insert Beam Search Op.

Input: OliveModel

Output: ONNXModel

no_repeat_ngram_size

If set to int > 0, all ngrams of that size can only occur once.

type: int

default_value: 0

searchable_values: None

use_forced_decoder_ids

Use decoder_input_ids as an extra graph input to the beam search op. Only supported in ORT >= 1.16.0

type: bool

default_value: False

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

QLoRA

Run QLoRA fine-tuning on a Hugging Face PyTorch model. See https://arxiv.org/abs/2305.14314 for more details on the method. This pass only supports PyTorchModel with hf_config.

Input: PyTorchModel

Output: PyTorchModel

compute_dtype

The computation data type used by the quantized model. It is also the data type used for the LoRA weights. Should be one of bfloat16, float16 or float32.

type: str

default_value: bfloat16

searchable_values: None

double_quant

Whether tonested quantization where the quantization constants from the first quantization are quantized again

type: bool

default_value: True

searchable_values: None

quant_type

Quantization data type to use. Should be one of fp4 or nf4.

type: str

default_value: nf4

searchable_values: None

lora_r

Lora r

type: int

default_value: 64

searchable_values: None

lora_alpha

Lora alpha

type: float

default_value: 16

searchable_values: None

lora_dropout

Lora dropout

type: float

default_value: 0.0

searchable_values: None

train_data_config

Data config for fine-tuning training. If eval_data_config is not provided and eval_dataset_size is not None, the data will be split into train and eval. Otherwise, the data will be used for training only.

type: olive.data.config.DataConfig | Dict

required: True

eval_data_config

Data config for fine-tuning evaluation. Optional if eval_dataset_size is provided or evaluation is not needed.

type: olive.data.config.DataConfig | Dict

default_value: None

searchable_values: None

eval_dataset_size

Size of the validation dataset. Should be either positive and smaller than the number of train sample or a float in the (0, 1) range. If eval_data_config is provided, this parameter will be ignored.

type: float

default_value: None

searchable_values: None

training_args

Training arguments. If None, will use default arguments. See HFTrainingArguments for more details.

type: olive.passes.pytorch.qlora.HFTrainingArguments | Dict

default_value: None

searchable_values: None

QLoRA HFTrainingArguments

pydantic settings olive.passes.pytorch.qlora.HFTrainingArguments[source]

Training arguments for transformers.Trainer.

Has the same fields as transformers.TrainingArguments with recommended default values for QLoRA fine-tuning.

field seed: int = 42

Random seed for initialization.

field data_seed: int = 42

Random seed to be used with data samplers.

field optim: str = 'paged_adamw_32bit'

The optimizer to use.

field per_device_train_batch_size: int = 1

The batch size per GPU for training.

field per_device_eval_batch_size: int = 1

The batch size per GPU for evaluation.

field gradient_accumulation_steps: int = 16

Number of updates steps to accumulate the gradients for, before performing a backward/update pass.

field max_steps: int = 10000

The total number of training steps to perform.

field weight_decay: float = 0.0

The L2 weight decay rate of AdamW

field learning_rate: float = 0.0002

The initial learning rate for AdamW.

field gradient_checkpointing: bool = True

Use gradient checkpointing. Recommended.

field lr_scheduler_type: str = 'constant'

Learning rate schedule. Constant a bit better than cosine, and has advantage for analysis.

field warmup_ratio: float = 0.03

Fraction of steps to do a warmup for.

field logging_steps: int = 10

Number of update steps between two logs.

field evaluation_strategy: str = 'no'

The evaluation strategy to use. Will be forced to ‘no’ if there is no eval dataset.

field eval_steps: float = None

Number of update steps between two evaluations if evaluation_strategy=’steps’. Will default to the same value as logging_steps if not set

field group_by_length: bool = True

Whether or not to group samples of roughly the same length together when batching.

field report_to: str | List[str] = 'none'

The list of integrations to report the results and logs to.

field output_dir: str = None

The output dir for logs and checkpoints. If None, will use a temp dir.

field extra_args: Dict[str, Any] = None

Extra arguments to pass to the trainer. Values can be provided directly to this field as a dict or as keyword arguments to the config. See transformers.TrainingArguments for more details on the available arguments.

create_training_args() TrainingArguments[source]

QuantizationAwareTraining

Run quantization aware training on PyTorch model.

Input: PyTorchModel

Output: PyTorchModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

train_data_dir

Directory of training data.

type: str

default_value: None

searchable_values: None

val_data_dir

Directory of validation data.

type: str

default_value: None

searchable_values: None

train_dataloader_func

Dataloader function to load training data from given train_data_dir with given train_batch_size.

type: Callable | str

default_value: None

searchable_values: None

training_loop_func

Customized training loop function.

type: Callable | str

default_value: None

searchable_values: None

ptl_module

LightningModule for PyTorch Lightning trainer. It is a way of encapsulating all the logic related to the training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html for more details.

type: Callable | str

default_value: None

searchable_values: None

ptl_data_module

LightningDataModule for PyTorch Lightning trainer. It is a way of encapsulating all the data-related logic for training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html for more details.

type: Callable | str

default_value: None

searchable_values: None

train_batch_size

Batch size for training.

type: int

default_value: None

searchable_values: None

num_epochs

Maximum number of epochs for training.

type: int

default_value: None

searchable_values: None

num_steps

Maximum number of steps for training.

type: int

default_value: -1

searchable_values: None

do_validate

Whether perform one evaluation epoch over the validation set after training.

type: bool

default_value: False

searchable_values: None

modules_to_fuse

List of list of module names to fuse.

type: List[List[str]]

default_value: None

searchable_values: None

qconfig_func

Customized function to create a QConfig for QAT. Please refer to https://pytorch.org/docs/stable/generated/torch.ao.quantization.qconfig.QConfig.html for details.

type: Callable | str

default_value: None

searchable_values: None

logger

Logger for training.

type: pytorch_lightning.loggers.logger.Logger | Iterable[pytorch_lightning.loggers.logger.Logger] | Callable | bool

default_value: False

searchable_values: None

gpus

Number of GPUs to use.

type: int

default_value: None

searchable_values: None

seed

Random seed for training.

type: int

default_value: None

searchable_values: None

checkpoint_path

Path to save checkpoints.

type: str

default_value: None

searchable_values: None

OpenVINOConversion

Converts PyTorch, ONNX or TensorFlow Model to OpenVino Model.

Input: PyTorchModel | ONNXModel

Output: OpenVINOModel

input

Input can be set by passing a list of tuples. Each tuple should contain input name and optionally input type or input shape.

type: List[Tuple]

default_value: None

searchable_values: None

input_shape

Input shape(s) that should be fed to an input node(s) of the model. Shape is defined as a comma-separated list of integer numbers enclosed in parentheses or square brackets, for example [1,3,227,227].

type: List[int]

default_value: None

searchable_values: None

extra_config

Extra configurations for OpenVINO model conversion. extra_config can be set by passing a dictionary where key is the parameter name, and the value is the parameter value. Please check ‘mo’ command usage instruction for available parameters: https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html

type: Dict

default_value: None

searchable_values: None

OpenVINOQuantization

Post-training quantization for OpenVINO model. Please refer to https://docs.openvino.ai/latest/pot_introduction.html for more details.

Input: OpenVINOModel

Output: OpenVINOModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

engine_config

Specific config for openvino.tools.pot.IEEngine. ‘engine_config’ can be set by passing a dictionary, for example engine_config: {‘device’: ‘CPU’}

type: Dict

required: True

dataloader_func

Function/function name to generate dataloader for calibration, required if data_config is None.

type: Callable | str

default_value: None

searchable_values: None

data_dir

Path to the directory containing the dataset. For local data, it is required if dataloader_func is provided.

type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None

default_value: None

searchable_values: None

batch_size

Data config for calibration, required if dataloader_func is None.

type: int

default_value: 1

searchable_values: None

data_config

Data config for calibration, required if dataloader_func is None.

type: olive.data.config.DataConfig | Dict

default_value: None

searchable_values: None

metric_func

A callable function or a str of the function name from ‘user_script’ for Metric instance to calculate the accuracy metric of the model.

type: Callable | str

default_value: None

searchable_values: None

algorithms

A list defining optimization algorithms and their parameters included in the optimization pipeline. The order in which they are applied to the model in the optimization pipeline is determined by the order in the list. example: algorithms: [{‘name’: ‘DefaultQuantization’, ‘params’: {‘preset’: ‘performance’, ‘stat_subset_size’: 500},}]

type: List[Dict]

required: True

SNPEConversion

Convert ONNX or TensorFlow model to SNPE DLC. Uses snpe-tensorflow-to-dlc or snpe-onnx-to-dlc tools from the SNPE SDK.

Input: ONNXModel | TensorFlowModel

Output: SNPEModel

input_names

List of input names.

type: List[str]

required: True

input_shapes

List of input shapes. Must be the same length as input_names.

type: List[List[int]]

required: True

output_names

List of output names.

type: List[str]

required: True

input_types

List of input types. If not None, it must be a list of the same length as input_names. List members can be None to use default value. Refer to olive.snpe.constants.InputType for valid values.

type: List[str | None]

default_value: None

searchable_values: None

input_layouts

List of input layouts. If not None, it must be a list of the same length as input_names. List members can be None to use inferred value. Refer to olive.snpe.constants.InputLayout for valid values.

type: List[str | None]

default_value: None

searchable_values: None

extra_args

Extra arguments to pass to snpe conversion tool. Refer to snpe-onnx-to-dlc and snpe-tensorflow-to-dlc at https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.

type: str

default_value: None

searchable_values: None

SNPEQuantization

Quantize SNPE model. Uses snpe-dlc-quantize tool from the SNPE SDK.

Input: SNPEModel

Output: SNPEModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

data_dir

Path to the data directory. Required is data_config is None.

type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None

default_value: None

searchable_values: None

dataloader_func

Function or function name to create dataloader for quantization. Function should take data directory as an argument and return a olive.snpe.SNPEDataLoader or torch.data.DataLoader-like object. Required if data_config is None.

type: Callable | str

default_value: None

searchable_values: None

data_config

Data config for quantization, required if dataloader_func is None

type: olive.data.config.DataConfig | Dict

required: True

use_enhanced_quantizer

Use the enhanced quantizer feature when quantizing the model. Uses an algorithm to determine optimal range instead of min and max range of data. It can be useful for quantizing models that have long tails in the distribution of the data being quantized.

type: bool

default_value: False

searchable_values: Categorical([True, False])

enable_htp

Pack HTP information in quantized DLC.

type: bool

default_value: False

searchable_values: Categorical([True, False])

htp_socs

List of SoCs to generate HTP Offline cache for.

type: List[str]

default_value: None

searchable_values: None

extra_args

Extra arguments to pass to snpe conversion tool. Refer to https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html#tools_snpe-dlc-quantize for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.

type: str

default_value: None

searchable_values: None

SNPEtoONNXConversion

Convert a SNPE DLC to ONNX to use with SNPE Execution Provider. Creates a ONNX graph with the SNPE DLC as a node.

Input: SNPEModel

Output: ONNXModel

target_device

Target device for the ONNX model. Refer to olive.snpe.SNPEDevice for valid values.

type: str

default_value: cpu

searchable_values: None

target_opset

Target ONNX opset version.

type: int

default_value: 12

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

SparseGPT

Run SparseGPT on a Hugging Face PyTorch model. See https://arxiv.org/abs/2301.00774 for more details on the algorithm. This pass only supports PyTorchModel with hf_config. The transformers model type must be one of [bloom, gpt2, gpt_neox, llama, opt].

Input: PyTorchModel

Output: PyTorchModel

sparsity

Target sparsity. This can be a float or a list of two integers. Float is the target sparsity per layer. List [n,m] applies semi-structured (n:m) sparsity patterns. Refer to https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/ for more details on 2:4 sparsity pattern.

type: float | List[int]

default_value: None

searchable_values: None

blocksize

Blocksize to use for adaptive mask selection.

type: int

default_value: 128

searchable_values: None

percdamp

Percentage of the average Hessian diagonal to use for dampening. Must be in [0,1].

type: float

default_value: 0.01

searchable_values: None

min_layer

Prune all layers with id >= min_layer.

type: int

default_value: None

searchable_values: None

max_layer

Prune all layers with id < max_layer.

type: int

default_value: None

searchable_values: None

layer_name_filter

Only prune layers whose name contains the given string(s).

type: str | List[str]

default_value: None

searchable_values: None

compute_device

Device to use for performing computations. Can be ‘auto, ‘cpu’, ‘cuda’, ‘cuda:0’, etc. If ‘auto’, will use cuda if available. Does not affect the final model.

type: str

default_value: auto

searchable_values: None

data_config

Data config to use for pruning weights. All samples in the data are expected to be of the same length, most likely the max sequence length of the model.

type: olive.data.config.DataConfig | Dict

required: True

TorchTRTConversion

Convert torch.nn.Linear modules in the transformer layers of a Hugging Face PyTorch model to TensorRT modules with fp16 precision and sparse weights, if applicable. The entire model is saved using torch.save and can be loaded using torch.load. Loading the model requires torch-tensorrt and Olive to be installed. This pass only supports PyTorchModel with hf_config. The transformers model type must be one of [bloom, gpt2, gpt_neox, llama, opt].

Input: PyTorchModel

Output: PyTorchModel

min_layer

Convert all layers with id >= min_layer.

type: int

default_value: None

searchable_values: None

max_layer

Convert all layers with id < max_layer.

type: int

default_value: None

searchable_values: None

layer_name_filter

Only convert layers whose name contains the given string(s).

type: str | List[str]

default_value: None

searchable_values: None

float16

Convert entire model to fp16. If False, only the sparse modules are converted to fp16.

type: bool

default_value: False

searchable_values: None

data_config

Data config to use for compiling module to TensorRT. The batch size of the compiled module is set to the batch size of the first batch of the dataloader.

type: olive.data.config.DataConfig | Dict

required: True

VitisAIQuantization

Quantize ONNX model with onnxruntime where we can search for best parameters for vai_q_onnx quantization at same time.

Input: ONNXModel

Output: ONNXModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

quant_mode

Onnx Quantization mode. ‘static’ for vitis ai quantization.

type: str

default_value: static

searchable_values: Categorical([‘static’])

data_dir

Path to the directory containing the dataset.

type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None

default_value: None

searchable_values: None

batch_size

Batch size for calibration, required.

type: int

default_value: 1

searchable_values: None

dataloader_func

Function/function name to generate dataloader for calibration, required’

type: Callable | str

required: True

weight_type

Data type for quantizing weights which is used in vai_q_onnx quantization. ‘QInt8’ for signed 8-bit integer,

type: str

default_value: QInt8

searchable_values: Categorical([‘QInt8’])

input_nodes

Start node that needs quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

output_nodes

End node that needs quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

op_types_to_quantize

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_quantize

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

searchable_values: None

nodes_to_exclude

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

searchable_values: None

per_channel

Quantize weights per channel.

type: bool

default_value: False

searchable_values: Categorical([True, False])

optimize_model

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default_value: False

searchable_values: Categorical([True, False])

use_external_data_format

option used for large size (>2GB) model. Set to True by default.

type: bool

default_value: True

searchable_values: None

quant_preprocess

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

searchable_values: Categorical([True, False])

calibrate_method

Current calibration methods supported are NonOverflow and MinMSE, Please use NonOverflow or MinMSE as options.

type: str

default_value: MinMSE

searchable_values: Categorical([‘NonOverflow’, ‘MinMSE’])

quant_format

QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: QDQ

searchable_values: Categorical([‘QDQ’, ‘QOperator’])

need_layer_fusing

Perform layer fusion for conv-relu type operations

type: bool

default_value: False

searchable_values: Categorical([True, False])

activation_type

Quantization data type of activation.

type: str

default_value: QUInt8

searchable_values: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))

enable_dpu

Use QDQ format optimized specifically for DPU.

type: bool

default_value: False

searchable_values: Categorical([True, False])

ActivationSymmetric

symmetrize calibration data for activations

type: bool

default_value: False

searchable_values: None

WeightSymmetric

symmetrize calibration data for weights

type: bool

default_value: True

searchable_values: None

AddQDQPairToWeight

remains floating-point weight and inserts both QuantizeLinear/DeQuantizeLinear nodes to weight

type: bool

default_value: False

searchable_values: None

extra_options

Key value pair dictionary for extra_options in quantization. If an option is one of [‘ActivationSymmetric’, ‘WeightSymmetric’, ‘AddQDQPairToWeight’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OptimumConversion

Convert a Optimum model to ONNX model using the Optimum export function.

Input: OptimumModel

Output: ONNXModel | CompositeOnnxModel

script_dir

Directory containing user script dependencies.

type: str

default_value: None

searchable_values: None

user_script

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: str

default_value: None

searchable_values: None

target_opset

The version of the default (ai.onnx) opset to target.

type: int

default_value: 14

searchable_values: None

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None

OptimumMerging

Merges a decoder_model with its decoder_with_past_model via the Optimum library.

Input: CompositeOnnxModel

Output: ONNXModel | CompositeOnnxModel

save_as_external_data

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

searchable_values: None

all_tensors_to_one_file

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

searchable_values: None

external_data_name

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

searchable_values: None