Passes#

The following passes are available in Olive.

Each pass is followed by a description of the pass and a list of the pass’s configuration options.

ONNX#

OnnxConversion#

Convert a PyTorch model to ONNX model using torch.onnx.export on CPU.

Input: handler.hf.DistributedHfModelHandler | handler.hf.HfModelHandler | handler.pytorch.PyTorchModelHandler

Output: handler.onnx.DistributedOnnxModelHandler | handler.onnx.ONNXModelHandler

user_script#

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: pathlib.Path | str

default_value: None

search_defaults: None

script_dir#

Directory containing user script dependencies.

type: pathlib.Path | str

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

target_opset#

The version of the default (ai.onnx) opset to target.

type: int

default_value: 20

search_defaults: None

use_dynamo_exporter#

Whether to use dynamo_export API to export ONNX model.

type: bool

default_value: False

search_defaults: None

past_key_value_name#

The arguments name to point to past key values. For model loaded from huggingface, it is ‘past_key_values’. Basically, it is used only when use_dynamo_exporter is True.

type: str

default_value: past_key_values

search_defaults: None

device#

The device to use for conversion, e.g., ‘cuda’ or ‘cpu’. If not specified, will use ‘cpu’ for PyTorch model and ‘cuda’ for DistributedHfModel.

type: str

default_value: None

search_defaults: None

torch_dtype#

The dtype to cast the model to before conversion, e.g., ‘float32’ or ‘float16’. If not specified, will use the model as is.

type: str

default_value: None

search_defaults: None

parallel_jobs#

Number of parallel jobs. Defaulted to number of CPUs. Set it to 0 to disable.

type: int

default_value: None

search_defaults: None

merge_adapter_weights#

Whether to merge adapter weights before conversion. After merging, the model structure is consistent with base model. That is useful if you cannot run conversion for some fine-tuned models with adapter weights

type: bool

default_value: False

search_defaults: None

save_metadata_for_token_generation#

Whether to save metadata for token generation or not. Includes config.json, generation_config.json, and tokenizer related files.

type: bool

default_value: False

search_defaults: None

optimize#

Whether to export the model with constant folding and redundancies elimination.

type: bool

default_value: True

search_defaults: None

dynamic#

Whether to export the model with dynamic axes/shapes.

type: bool

default_value: True

search_defaults: None

OnnxOpVersionConversion#

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

target_opset#

The version of the default (ai.onnx) opset to target. Default: latest opset version.

type: int

default_value: 22

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OnnxScriptFusion#

Fuse Ops using onnxscript.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OnnxPeepholeOptimizer#

Optimize ONNX model by fusing nodes.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OrtTransformersOptimization#

Use ONNX Transformer Optimizer to optimize transformer based models. Optimize transformer based models in scenarios where ONNX Runtime does not apply the optimization at load time. It is based on onnxruntime.transformers.optimizer.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

model_type#

Transformer based model type, including bert (exported by PyTorch), gpt2 (exported by PyTorch), bert_tf (BERT exported by tf2onnx), bert_keras (BERT exported by keras2onnx), and unet/vae/clip (stable diffusion).

type: str

default_value: None

search_defaults: None

num_heads#

Number of attention heads.

type: int

default_value: 0

search_defaults: None

num_key_value_heads#

Number of key/value attention heads.

type: int

default_value: 0

search_defaults: None

hidden_size#

Number of hidden nodes.

type: int

default_value: 0

search_defaults: None

optimization_options#

Optimization options that turn on/off some fusions.

type: dict[str, Any] | onnxruntime.transformers.fusion_options.FusionOptions

default_value: None

search_defaults: None

opt_level#

Graph optimization level of Onnx Runtime: 0 - disable all (default), 1 - basic, 2 - extended, 99 - all.

type: int

default_value: None

search_defaults: None

use_gpu#

Flag for GPU inference.

type: bool

default_value: False

search_defaults: None

only_onnxruntime#

Whether only use onnxruntime to optimize model, and no python fusion. Disable some optimizers that might cause failure in symbolic shape inference or attention fusion, when opt_level > 1.

type: bool

default_value: False

search_defaults: None

float16#

Whether half-precision float will be used.

type: bool

default_value: False

search_defaults: None

keep_io_types#

Keep input and output tensors in their original data type. Only used when float16 is True.

type: bool

default_value: True

search_defaults: None

force_fp32_ops#

Operators that are forced to run in float32. Only used when float16 is True.

type: list[str]

default_value: None

search_defaults: None

force_fp32_nodes#

Nodes that are forced to run in float32. Only used when float16 is True.

type: list[str]

default_value: None

search_defaults: None

force_fp16_inputs#

Force the conversion of the inputs of some operators to float16, even if ‘convert_float_to_float16` tool prefers it to keep them in float32.

type: dict[str, list[int]]

default_value: None

search_defaults: None

use_gqa#

Replace MultiHeadAttention with GroupQueryAttention. True is only supported when float16 is True.

type: bool

default_value: False

search_defaults: None

input_int32#

Whether int32 tensors will be used as input.

type: bool

default_value: False

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OrtSessionParamsTuning#

Optimize ONNX Runtime inference settings.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

user_script#

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: pathlib.Path | str

default_value: None

search_defaults: None

script_dir#

Directory containing user script dependencies.

type: pathlib.Path | str

default_value: None

search_defaults: None

data_config#

Data config to load data for computing latency.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

device#

Device selected for tuning process.

type: str

default_value: cpu

search_defaults: None

cpu_cores#

CPU cores used for thread tuning.

type: int

default_value: None

search_defaults: None

io_bind#

Whether enable IOBinding Search for ONNX Runtime inference.

type: bool

default_value: False

search_defaults: None

enable_cuda_graph#

Whether enable CUDA Graph for CUDA execution provider.

type: bool

default_value: False

search_defaults: None

providers_list#

Execution providers framework list to execute the ONNX models.

type: str

default_value: CPUExecutionProvider

search_defaults: Categorical([‘CPUExecutionProvider’])

provider_options_list#

Execution provider options to execute the ONNX models.

type: dict[str, Any]

default_value: {}

search_defaults: Categorical([{}])

execution_mode_list#

Parallelism list between operators.

type: int

default_value: None

search_defaults: Categorical([None])

opt_level_list#

Optimization level list for ONNX model.

type: int

default_value: None

search_defaults: Categorical([None])

trt_fp16_enable#

Whether enable FP16 mode for TensorRT execution provider.

type: bool

default_value: False

search_defaults: None

intra_thread_num_list#

List of intra thread number for test.

type: int

default_value: None

search_defaults: Categorical([None])

inter_thread_num_list#

List of inter thread number for test.

type: int

default_value: None

search_defaults: Categorical([None])

extra_session_config#

Extra customized session options during tuning process.

type: dict[str, Any]

default_value: None

search_defaults: None

force_evaluate_other_eps#

Whether force to evaluate all execution providers which are different with the associated execution provider.

type: bool

default_value: False

search_defaults: None

enable_profiling#

Whether enable profiling for ONNX Runtime inference.

type: bool

default_value: False

search_defaults: None

OnnxFloatToFloat16#

Converts a model to float16. It uses the float16 converter from onnxruntime to convert the model to float16.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

min_positive_val#

Constant values will be clipped against this value

type: float

default_value: 1e-07

search_defaults: None

max_finite_val#

Constant values will be clipped against this value

type: float

default_value: 10000.0

search_defaults: None

keep_io_types#

Whether model inputs/outputs should be left as float32

type: bool

default_value: False

search_defaults: None

use_symbolic_shape_infer#

Use symbolic shape inference instead of onnx shape inference. Defaults to True.

type: bool

default_value: True

search_defaults: None

op_block_list#

List of op types to leave as float32

type: list[str]

default_value: None

search_defaults: None

node_block_list#

List of node names to leave as float32

type: list[str]

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OnnxIODataTypeConverter#

Converts model inputs/outputs from a source dtype to a target dtype based on a name pattern.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

name_pattern#

Only convert inputs/outputs whose name matches this pattern. By defaultlooking for logits names

type: str

default_value: logits

search_defaults: None

source_dtype#

Source data type int value to convert from (default: FLOAT16). Check onnx/onnx details.

type: int

default_value: 10

search_defaults: None

target_dtype#

Target data type int value to convert to (default: FLOAT). Check onnx/onnx details.

type: int

default_value: 1

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OrtMixedPrecision#

Convert model to mixed precision.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

op_block_list#

List of op types to leave as float32

type: list[str]

default_value: [‘SimplifiedLayerNormalization’, ‘SkipSimplifiedLayerNormalization’, ‘Relu’, ‘Add’]

search_defaults: None

atol#

Absolute tolerance for checking float16 conversion

type: float

default_value: 1e-06

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

QNNPreprocess#

Preprocess ONNX model for quantization targeting QNN Execution Provider.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

fuse_layernorm#

Whether to fuse ReduceMean sequence into a single LayerNormalization node.

type: bool

default_value: False

search_defaults: None

inputs_to_make_channel_last#

inputs_to_make_channel_last: List of graph input names to transpose to be “channel-last”. For example, if “input0” originally has the shape (N, C, D1, D2, …, Dn), the resulting model will change input0’s shape to (N, D1, D2, …, Dn, C) and add a transpose node after it. Original: input0 (N, C, D1, D2, …, Dn) –> <Nodes> Updated: input0 (N, D1, D2, …, Dn, C) –> Transpose –> input0_chanfirst (N, C, D1, D2, …, Dn) –> <Nodes> This can potentially improve inference latency for QDQ models running on QNN EP because the additional transpose node may allow other transpose nodes inserted during ORT layout transformation to cancel out.

type: list

default_value: None

search_defaults: None

outputs_to_make_channel_last#

List of graph output names to transpose to be “channel-last”. For example, if “output0” originally has the shape (N, C, D1, D2, …, Dn), the resulting model will change output0’s shape to (N, D1, D2, …, Dn, C) and add a transpose node before it. Original: <Nodes> –> output0 (N, C, D1, D2, …, Dn) Updated: <Nodes> –> output0_chanfirst (N, C, D1, D2, …, Dn) –> Transpose –> output0 (N, D1, D2, …, Dn, C) This can potentially improve inference latency for QDQ models running on QNN EP because the additional transpose node may allow other transpose nodes inserted during ORT layout transformation to cancel out.

type: list

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OnnxQuantizationPreprocess#

ONNX Quantization Preprocess Pass. Same as OnnxQuantization quant_preprocess.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

skip_optimization#

Skip model optimization step if true. This may result in ONNX shape inference failure for some models.

type: bool

default_value: False

search_defaults: None

skip_onnx_shape#

Skip ONNX shape inference. Symbolic shape inference is most effective with transformer based models. Skipping all shape inferences may reduce the effectiveness of quantization, as a tensor with unknown shape can not be quantized.

type: bool

default_value: False

search_defaults: None

skip_symbolic_shape#

Skip symbolic shape inference. Symbolic shape inference is most effective with transformer based models. Skipping all shape inferences may reduce the effectiveness of quantization, as a tensor with unknown shape can not be quantized.

type: bool

default_value: False

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

MixedPrecisionOverrides#

Qnn mixed precision overrides pass. Pre-processes the model for mixed precision quantization by resolving constraints that each operator has when being converted to QNN operator Constraints refer to situations where certain tensor cannot be quantized to 16 bits standalone but rather neighboring tensors as well in order to have valid operators. Specific problem that arises here is the situation where certain tensor can be input to multiple nodes and each node requires different precision NOTE: This pass handles just initializer tensors as activation tensors are handled by onnxruntime

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

overrides_config#

Path/Dict to mixed precision overrides json, with the format of {tensor_name: quant_type}

type: str | dict

required: True

element_wise_binary_ops#

List of element wise binary ops, if not provided defaults to [‘Add’, ‘Sub’, ‘Mul’, ‘Div’]

type: list

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OnnxDynamicQuantization#

ONNX Dynamic Quantization Pass.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

quant_mode#

dynamic quantization mode

type: str

default_value: dynamic

search_defaults: None

weight_type#

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

search_defaults: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize#

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

search_defaults: None

op_types_to_exclude#

List of operator types to exclude from quantization. If None, all quantizable. op_types_to_quantize takes precedence over op_types_to_exclude. If both are set, op_types_to_quantize will be used.

type: list

default_value: None

search_defaults: None

nodes_to_quantize#

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

search_defaults: None

nodes_to_exclude#

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

search_defaults: None

per_channel#

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

search_defaults: Categorical([True, False])

reduce_range#

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

search_defaults: Categorical([True, False])

quant_preprocess#

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

search_defaults: Categorical([True, False])

activation_symmetric#

Symmetric quantization for activations.

type: bool

default_value: False

search_defaults: None

weight_symmetric#

Symmetric quantization for weights. Defaults to None. If set to None, it is assumed true if weight_type is signed, false otherwise.

type: bool

default_value: None

search_defaults: None

extra_options#

Key value pair dictionary for extra_options in quantization. Please refer to microsoft/onnxruntime for details about the supported options. If an option is one of ActivationSymmetric, WeightSymmetric, MinimumRealRange or TensorQuantOverrides, it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OnnxStaticQuantization#

ONNX Static Quantization Pass.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

quant_mode#

static quantization mode

type: str

default_value: static

search_defaults: None

weight_type#

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

search_defaults: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize#

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

search_defaults: None

op_types_to_exclude#

List of operator types to exclude from quantization. If None, all quantizable. op_types_to_quantize takes precedence over op_types_to_exclude. If both are set, op_types_to_quantize will be used.

type: list

default_value: None

search_defaults: None

nodes_to_quantize#

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

search_defaults: None

nodes_to_exclude#

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

search_defaults: None

per_channel#

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

search_defaults: Categorical([True, False])

reduce_range#

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

search_defaults: Categorical([True, False])

quant_preprocess#

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

search_defaults: Categorical([True, False])

activation_symmetric#

Symmetric quantization for activations.

type: bool

default_value: False

search_defaults: None

weight_symmetric#

Symmetric quantization for weights. Defaults to None. If set to None, it is assumed true if weight_type is signed, false otherwise.

type: bool

default_value: None

search_defaults: None

data_config#

Data config for calibration, required if quant_mode is ‘static’

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

calibrate_method#

Supported calibration methods are MinMax, Entropy and Percentile.

type: str

default_value: MinMax

search_defaults: Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])

calibration_providers#

Execution providers to run the session during calibration. Default is None which uses [ “CPUExecutionProvider” ].

type: list

default_value: None

search_defaults: None

quant_format#

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: QDQ

search_defaults: Categorical([‘QOperator’, ‘QDQ’])

activation_type#

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default_value: QInt8

search_defaults: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))

min_real_range#

Minimum real range for quantization. If set, enforces the minimum range between rmin and rmax.

type: float

default_value: None

search_defaults: None

tensor_quant_overrides#

tensor-level quantization overrides.

type: dict

default_value: None

search_defaults: None

prepare_qdq_config#

Generate a quantization configuration for a full integer QDQ model. Otherwise, only a limited set of operators are quantized. Only supported after onnxruntime 1.21.0 for EPs other than QNN.

type: bool

default_value: True

search_defaults: None

extra_options#

Key value pair dictionary for extra_options in quantization. Please refer to microsoft/onnxruntime for details about the supported options. If an option is one of ActivationSymmetric, WeightSymmetric, MinimumRealRange or TensorQuantOverrides, it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OnnxQuantization#

Quantize ONNX model with static/dynamic quantization techniques.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

quant_mode#

Onnx Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.

type: str

default_value: static

search_defaults: Categorical([‘dynamic’, ‘static’])

weight_type#

Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.

type: str

default_value: QInt8

search_defaults: Categorical([‘QInt8’, ‘QUInt8’])

op_types_to_quantize#

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

search_defaults: None

op_types_to_exclude#

List of operator types to exclude from quantization. If None, all quantizable. op_types_to_quantize takes precedence over op_types_to_exclude. If both are set, op_types_to_quantize will be used.

type: list

default_value: None

search_defaults: None

nodes_to_quantize#

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

search_defaults: None

nodes_to_exclude#

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

search_defaults: None

per_channel#

Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

search_defaults: Categorical([True, False])

reduce_range#

Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization

type: bool

default_value: False

search_defaults: Categorical([True, False])

quant_preprocess#

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

search_defaults: Categorical([True, False])

activation_symmetric#

Symmetric quantization for activations.

type: bool

default_value: False

search_defaults: None

weight_symmetric#

Symmetric quantization for weights. Defaults to None. If set to None, it is assumed true if weight_type is signed, false otherwise.

type: bool

default_value: None

search_defaults: None

data_config#

Data config for calibration, required if quant_mode is ‘static’

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

calibrate_method#

Supported calibration methods are MinMax, Entropy and Percentile.

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘MinMax’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

search_defaults: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

calibration_providers#

Execution providers to run the session during calibration. Default is None which uses [ “CPUExecutionProvider” ].

type: list

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): None, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

search_defaults: None

quant_format#

QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QDQ’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

search_defaults: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

activation_type#

Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection

type: str

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QInt8’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

search_defaults: Conditional(parents: (‘quant_mode’, ‘quant_format’, ‘weight_type’), support: {(‘static’, ‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘static’, ‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))

min_real_range#

Minimum real range for quantization. If set, enforces the minimum range between rmin and rmax.

type: float

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): None, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

search_defaults: None

tensor_quant_overrides#

tensor-level quantization overrides.

type: dict

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): None, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

search_defaults: None

prepare_qdq_config#

Generate a quantization configuration for a full integer QDQ model. Otherwise, only a limited set of operators are quantized. Only supported after onnxruntime 1.21.0 for EPs other than QNN.

type: bool

default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): True, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)

search_defaults: None

extra_options#

Key value pair dictionary for extra_options in quantization. Please refer to microsoft/onnxruntime for details about the supported options. If an option is one of ActivationSymmetric, WeightSymmetric, MinimumRealRange or TensorQuantOverrides, it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OnnxMatMul4Quantizer#

Quantize ONNX models’ MatMul operations to 4-bit weights.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

block_size#

Block size for quantization. Default value is 32.

type: int

default_value: 32

search_defaults: None

is_symmetric#

Symmetric quantization. Default value is True.

type: bool

default_value: True

search_defaults: None

nodes_to_exclude#

List of node names to exclude from quantization.

type: list

default_value: None

search_defaults: None

nodes_to_include#

List of node names to include in quantization.

type: list

default_value: None

search_defaults: None

op_types_to_quantize#

List of operator types to quantize. Default value is None = [“MatMul”]. Supported op types are: MatMul, Gather.

type: list

default_value: None

search_defaults: None

quant_axes#

op:axis, which axis to quantize for an op. Default is None = {“MatMul”: 0, “Gather”: 1}

type: dict[str, int]

default_value: None

search_defaults: None

accuracy_level#

Accuracy level of the 4-bit quantized MatMul computation. Refer to the MatMulNBits contrib op’s ‘accuracy_level’ attribute for details (microsoft/onnxruntime).

type: olive.passes.onnx.quantization.OnnxMatMul4Quantizer.AccuracyLevel

default_value: None

search_defaults: None

algorithm#

The algorithm used to quantize weight. If None, the default algorithm is used with quant config created from the pass configuration.

type: olive.passes.onnx.quantization.OnnxMatMul4Quantizer.Algorithm

default_value: None

search_defaults: None

weight_only_quant_configs#

If ‘algorithm’ is provided and this is None, the config is constructed from the pass configuration. If provided, the it takes precedence. Refer to microsoft/onnxruntime for details.

type: dict

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

data_config#

Data config for calibration, required if quant_mode is ‘static’

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

GraphSurgeries#

ONNX graph surgeries collections. This pass applies a list of surgeries to the ONNX model. Each surgery is a transformation on the ONNX graph. Example: surgeries: { type: “GraphSurgeries”, surgeries: [ { “surgeon”: “RenameInputs”, “old_names”: [“input1”, “input2”] “new_names”: [“renamed_input1”, “renamed_input2”] }, { “surgeon”: “RenameOutputs”, “old_names”: [“output1”, “output2”] “new_names”: [“renamed_output1”, “renamed_output2”] } ] }

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

surgeries#

List of surgeries to apply, each with its type and parameters

type: list[dict[str, Any]]

required: True

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

MatMulNBitsToQDQ#

Convert ONNX MatMulNBits nodes to standard ONNX quantized-dequantized (QDQ) format.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

use_transpose_op#

Whether to use a Transpose operator after the DequantizeLinear operator. If False, the weight initializer will be transposed instead. Default is False. True might be more efficient on some EPs such as DirectML.

type: bool

default_value: False

search_defaults: None

use_int4#

Whether to use int4 data type for the quantized weight. Default is False and uses uint4 data type.

type: bool

default_value: False

search_defaults: None

add_zero_point#

Whether to add zero point for symmetric quantized weights, i.e., DQ zero point is 0. Default is False.

type: bool

default_value: False

search_defaults: None

nodes_to_exclude#

List of node names to exclude from the conversion. The node names should be the names of the MatMulNBits nodes. Default is None.

type: list

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

DynamicToFixedShape#

Convert dynamic shape to fixed shape for ONNX model.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

dim_param#

Symbolic parameter name. Provide dim_value if specified.

type: list[str]

default_value: None

search_defaults: None

dim_value#

Value to replace dim_param with in the model. Must be > 0.

type: list[int]

default_value: None

search_defaults: None

input_name#

Model input name to replace shape of. Provide input_shape if specified.

type: list[str]

default_value: None

search_defaults: None

input_shape#

Shape to use for input_shape. Provide comma separated list for the shape. All values must be > 0. e.g. [1,3,256,256]

type: list[list[int]]

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

IncDynamicQuantization#

Intel® Neural Compressor Dynamic Quantization Pass.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

approach#

dynamic quantization mode

type: str

default_value: dynamic

search_defaults: None

device#

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

search_defaults: None

backend#

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

search_defaults: None

domain#

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

search_defaults: None

workspace#

Workspace for Intel® Neural Compressor quantization where intermediate files and tuning history file are stored. Default value is: “./nc_workspace/{}/”.format(datetime.datetime.now().strftime(“%Y-%m-%d_%H-%M-%S”))

type: str

default_value: None

search_defaults: None

recipes#

Recipes for Intel® Neural Compressor quantization, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocessing and postprocessing ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only valid for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only valid for onnxrt_trt_ep

type: dict

default_value: {}

search_defaults: None

reduce_range#

Whether use 7 bit to quantization.

type: bool

default_value: False

search_defaults: Categorical([True, False])

quant_level#

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to intel/neural-compressor intel/neural-compressor for more details

type: str

default_value: auto

search_defaults: None

excluded_precisions#

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

search_defaults: None

tuning_criterion#

Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs, timeout, max_trials and objective.

type: dict

default_value: {‘strategy’: ‘basic’, ‘strategy_kwargs’: None, ‘timeout’: 0, ‘max_trials’: 5, ‘objective’: ‘performance’}

search_defaults: None

metric#

Accuracy metric to generate an evaluation function for Intel® Neural Compressor accuracy aware tuning.

type: olive.evaluator.metric.Metric | None

default_value: None

search_defaults: None

weight_only_config#

INC weight only quantization config.

type: dict

default_value: {}

search_defaults: None

op_type_dict#

INC weight only quantization config.

type: dict

default_value: {}

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

IncStaticQuantization#

Intel® Neural Compressor Static Quantization Pass.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

approach#

static quantization mode

type: str

default_value: static

search_defaults: None

device#

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

search_defaults: None

backend#

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

search_defaults: None

domain#

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

search_defaults: None

workspace#

Workspace for Intel® Neural Compressor quantization where intermediate files and tuning history file are stored. Default value is: “./nc_workspace/{}/”.format(datetime.datetime.now().strftime(“%Y-%m-%d_%H-%M-%S”))

type: str

default_value: None

search_defaults: None

recipes#

Recipes for Intel® Neural Compressor quantization, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocessing and postprocessing ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only valid for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only valid for onnxrt_trt_ep

type: dict

default_value: {}

search_defaults: None

reduce_range#

Whether use 7 bit to quantization.

type: bool

default_value: False

search_defaults: Categorical([True, False])

quant_level#

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to intel/neural-compressor intel/neural-compressor for more details

type: str

default_value: auto

search_defaults: None

excluded_precisions#

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

search_defaults: None

tuning_criterion#

Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs, timeout, max_trials and objective.

type: dict

default_value: {‘strategy’: ‘basic’, ‘strategy_kwargs’: None, ‘timeout’: 0, ‘max_trials’: 5, ‘objective’: ‘performance’}

search_defaults: None

metric#

Accuracy metric to generate an evaluation function for Intel® Neural Compressor accuracy aware tuning.

type: olive.evaluator.metric.Metric | None

default_value: None

search_defaults: None

weight_only_config#

INC weight only quantization config.

type: dict

default_value: {‘bits’: 4, ‘group_size’: 4, ‘scheme’: ‘asym’, ‘algorithm’: ‘RTN’}

search_defaults: None

op_type_dict#

INC weight only quantization config.

type: dict

default_value: {}

search_defaults: None

data_config#

Data config for calibration, required if approach is ‘static’.

type: olive.data.config.DataConfig | dict

required: True

quant_format#

Quantization format. Support ‘QDQ’ and ‘QOperator’.

type: str

default_value: QOperator

search_defaults: Categorical([‘QOperator’, ‘QDQ’])

calibration_sampling_size#

Number of calibration sample.

type: list | int

default_value: [100]

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

IncQuantization#

Quantize ONNX model with Intel® Neural Compressor.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

approach#

Intel® Neural Compressor Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization, “weight_only” for 4-bits weight-only quantization.

type: str

default_value: static

search_defaults: Categorical([‘dynamic’, ‘static’, ‘weight_only’])

device#

Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.

type: str

default_value: cpu

search_defaults: None

backend#

Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’

type: str

default_value: default

search_defaults: None

domain#

Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

type: str

default_value: auto

search_defaults: None

workspace#

Workspace for Intel® Neural Compressor quantization where intermediate files and tuning history file are stored. Default value is: “./nc_workspace/{}/”.format(datetime.datetime.now().strftime(“%Y-%m-%d_%H-%M-%S”))

type: str

default_value: None

search_defaults: None

recipes#

Recipes for Intel® Neural Compressor quantization, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocessing and postprocessing ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only valid for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only valid for onnxrt_trt_ep

type: dict

default_value: {}

search_defaults: None

reduce_range#

Whether use 7 bit to quantization.

type: bool

default_value: False

search_defaults: Categorical([True, False])

quant_level#

Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to intel/neural-compressor intel/neural-compressor for more details

type: str

default_value: auto

search_defaults: None

excluded_precisions#

Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].

type: list

default_value: []

search_defaults: None

tuning_criterion#

Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs, timeout, max_trials and objective.

type: dict

default_value: {‘strategy’: ‘basic’, ‘strategy_kwargs’: None, ‘timeout’: 0, ‘max_trials’: 5, ‘objective’: ‘performance’}

search_defaults: None

metric#

Accuracy metric to generate an evaluation function for Intel® Neural Compressor accuracy aware tuning.

type: olive.evaluator.metric.Metric | None

default_value: None

search_defaults: None

weight_only_config#

INC weight only quantization config.

type: dict

default_value: {‘bits’: 4, ‘group_size’: 4, ‘scheme’: ‘asym’, ‘algorithm’: ‘RTN’}

search_defaults: None

op_type_dict#

INC weight only quantization config.

type: dict

default_value: {}

search_defaults: None

data_config#

Data config for calibration, required if approach is ‘static’.

type: olive.data.config.DataConfig | dict

required: True

quant_format#

Quantization format. Support ‘QDQ’ and ‘QOperator’.

type: str

default_value: QOperator

search_defaults: Conditional(parents: (‘approach’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([‘default’]))

calibration_sampling_size#

Number of calibration sample.

type: list | int

default_value: [100]

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

VitisAIQuantization#

Quantize ONNX model with onnxruntime. We can search for best parameters for vai_q_onnx quantization at same time.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

quant_mode#

Onnx Quantization mode. ‘static’ for vitis ai quantization.

type: str

default_value: static

search_defaults: Categorical([‘static’])

data_config#

Data config for calibration.

type: olive.data.config.DataConfig | dict

required: True

weight_type#

Data type for quantizing weights which is used in vai_q_onnx quantization. ‘QInt8’ for signed 8-bit integer,

type: str

default_value: QInt8

search_defaults: Categorical([‘QInt8’])

input_nodes#

Start node that needs quantization. If None, all quantizable.

type: list

default_value: None

search_defaults: None

output_nodes#

End node that needs quantization. If None, all quantizable.

type: list

default_value: None

search_defaults: None

op_types_to_quantize#

List of operator types to quantize. If None, all quantizable.

type: list

default_value: None

search_defaults: None

nodes_to_quantize#

List of node names to quantize. If None, all quantizable.

type: list

default_value: None

search_defaults: None

nodes_to_exclude#

List of node names to exclude from quantization. If None, all quantizable.

type: list

default_value: None

search_defaults: None

per_channel#

Quantize weights per channel.

type: bool

default_value: False

search_defaults: Categorical([True, False])

optimize_model#

Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.

type: bool

default_value: False

search_defaults: Categorical([True, False])

use_external_data_format#

option used for large size (>2GB) model. Set to True by default.

type: bool

default_value: True

search_defaults: None

quant_preprocess#

Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing

type: bool

default_value: True

search_defaults: Categorical([True, False])

calibrate_method#

Current calibration methods supported are NonOverflow and MinMSE, Please use NonOverflow or MinMSE as options.

type: str

default_value: MinMSE

search_defaults: Categorical([‘NonOverflow’, ‘MinMSE’])

quant_format#

QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.

type: str

default_value: QDQ

search_defaults: Categorical([‘QDQ’, ‘QOperator’])

need_layer_fusing#

Perform layer fusion for conv-relu type operations

type: bool

default_value: False

search_defaults: Categorical([True, False])

activation_type#

Quantization data type of activation.

type: str

default_value: QUInt8

search_defaults: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))

enable_dpu#

Use QDQ format optimized specifically for DPU.

type: bool

default_value: False

search_defaults: Categorical([True, False])

ActivationSymmetric#

symmetrize calibration data for activations

type: bool

default_value: False

search_defaults: None

WeightSymmetric#

symmetrize calibration data for weights

type: bool

default_value: True

search_defaults: None

AddQDQPairToWeight#

remains floating-point weight and inserts both QuantizeLinear/DeQuantizeLinear nodes to weight

type: bool

default_value: False

search_defaults: None

extra_options#

Key value pair dictionary for extra_options in quantization. If an option is one of [‘ActivationSymmetric’, ‘WeightSymmetric’, ‘AddQDQPairToWeight’], it will be overwritten by the corresponding config parameter value.

type: dict

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

AppendPrePostProcessingOps#

Add Pre/Post nodes to the input model.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

pre#

List of pre-processing commands to add.

type: list[dict[str, Any]]

default_value: None

search_defaults: None

post#

List of post-processing commands to add.

type: list[dict[str, Any]]

default_value: None

search_defaults: None

tool_command#

Composited tool commands to invoke.

type: str

default_value: None

search_defaults: None

tool_command_args#

Arguments to pass to tool command or to PrePostProcessor. If it is used for PrePostProcessor, the schema would like: { “name”: “image”, “data_type”: “uint8”, “shape”: [“num_bytes”],

type: dict[str, Any] | list[olive.passes.onnx.append_pre_post_processing_ops.PrePostProcessorInput]

default_value: None

search_defaults: None

target_opset#

The version of the default (ai.onnx) opset to target.

type: int

default_value: 16

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

InsertBeamSearch#

Insert Beam Search Op. Only used for whisper models. Uses WhisperBeamSearch contrib op if ORT version >= 1.17.1, else uses BeamSearch contrib op.

Input: handler.base.OliveModelHandler

Output: handler.onnx.ONNXModelHandler

no_repeat_ngram_size#

If set to int > 0, all ngrams of that size can only occur once.

type: int

default_value: 0

search_defaults: None

use_vocab_mask#

Use vocab_mask as an extra graph input to the beam search op. Only supported in ORT >= 1.16.0

type: bool

default_value: False

search_defaults: None

use_prefix_vocab_mask#

Use prefix_vocab_mask as an extra graph input to the beam search op. Only supported in ORT >= 1.16.0

type: bool

default_value: False

search_defaults: None

use_forced_decoder_ids#

Use decoder_input_ids as an extra graph input to the beam search op. Only supported in ORT >= 1.16.0

type: bool

default_value: False

search_defaults: None

use_logits_processor#

Use logits_processor as an extra graph input to the beam search op. Only supported in ORT >= 1.16.0

type: bool

default_value: False

search_defaults: None

use_temperature#

Use temperature as an extra graph input to the beam search op. Only supported in ORT >= 1.17.1

type: bool

default_value: False

search_defaults: None

fp16#

Is the model in fp16 precision.

type: bool

default_value: False

search_defaults: None

use_gpu#

Use GPU for beam search op.

type: bool

default_value: False

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

ExtractAdapters#

Extract adapter weights from ONNX model and save them as external weights file. If make_inputs is False, model proto is invalid after this pass as the adapter weights point to non-existent external files. Inference session must be created by first loading the adapter weights using SessionOptions.add_external_initializers. If make_inputs is True, the adapter weights are inputs to the model and must be provided during inference.

Input: handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

make_inputs#

Convert adapter weights to inputs. If false, the adapter weights will be set as initializers with external data.

type: bool

default_value: True

search_defaults: None

dynamic_lora_r#

Whether the model uses dynamic shape for lora_r. Only used if make_inputs is True. Valid only for float modules.

type: bool

default_value: True

search_defaults: None

optional_inputs#

Create default initializers (empty tensor with lora_r dimension set to 0) for the adapter weights, if inputs not provided during inference. Only used if make_inputs is True. Valid only for float modules.

type: bool

default_value: True

search_defaults: None

save_format#

Format to save the weights in.

type: olive.common.utils.WeightsFileFormat

default_value: onnx_adapter

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

SplitModel#

Split an ONNX model into multiple smaller sub-models based on predefined assignments.

Input: handler.onnx.ONNXModelHandler

Output: handler.composite.CompositeModelHandler

split_assignments#

Set split assignments in the format of name1=0;name2=1 etc. Overwrite the one from CaptureSplitInfo pass.

type: dict[str, int] | str

default_value: None

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

StaticLLM#

Convert a dynamic shaped LLM into a static shaped LLM. Expects a CompositeModelHandler with at least 3 components: embeddings, transformer layers, and lm_head. transformer layers can be split into multiple components. Each transformer layers component produces two new components: - context model (sequence length = context_length) - iterator model (sequence length = 1) embeddings and lm_head keep their original shapes. The output model has an attribute “llm_pipeline” that contains the mapping of the components with keys: - embeddings: name of the embeddings model - context: list of context model names - iterator: list of iterator model names - lm_head: name of the lm_head model

Input: handler.composite.CompositeModelHandler

Output: handler.composite.CompositeModelHandler

batch_size#

Batch size of the model.

type: int

default_value: 1

search_defaults: None

context_length#

Input length of the context model.

type: int

default_value: 64

search_defaults: None

EPContextBinaryGenerator#

Generate EP specific context binary for the model.

Input: handler.onnx.ONNXModelHandler | handler.composite.CompositeModelHandler

Output: handler.onnx.ONNXModelHandler | handler.composite.CompositeModelHandler

embed_context#

Whether to embed context bin into the model.

type: bool

default_value: False

search_defaults: None

weight_sharing#

Whether to enable weight sharing between the component models. Only applicable to composite models.

type: bool

default_value: False

search_defaults: None

provider_options#

Provider options for the EP.

type: dict

default_value: None

search_defaults: None

session_options#

Session options for the EP.

type: dict

default_value: None

search_defaults: None

disable_cpu_fallback#

Whether to disable CPU fallback.

type: bool

default_value: False

search_defaults: None

ComposeOnnxModels#

Compose multiple ONNX models into a single model. This pass chains multiple ONNX models together by itertively connecting the output of the preceding model to the input of the next model. The final inputs and outputs are the set of all inputs and outputs of the models excluding those used to connect the models together. It also handles llm_pipeline models: - embeddings: the embeddings model is saved as is - context: the context model is composed of all models in the context group - iterator: the iterator model is composed of all models in the iterator group - lm_head: the lm_head model is saved as is

Input: handler.composite.CompositeModelHandler

Output: handler.onnx.ONNXModelHandler | handler.composite.CompositeModelHandler

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

OptimumConversion#

Convert a Hugging Face PyTorch model to ONNX model using the Optimum export function.

Input: handler.hf.HfModelHandler

Output: handler.onnx.ONNXModelHandler | handler.composite.CompositeModelHandler

user_script#

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: pathlib.Path | str

default_value: None

search_defaults: None

script_dir#

Directory containing user script dependencies.

type: pathlib.Path | str

default_value: None

search_defaults: None

target_opset#

The version of the default (ai.onnx) opset to target.

type: int

default_value: 14

search_defaults: None

components#

List of component models to export. E.g. [‘decoder_model’, ‘decoder_with_past_model’]. None means export all components.

type: list[str]

default_value: None

search_defaults: None

fp16#

Whether to use fp16 precision to load torch model and then convert it to onnx.

type: bool

default_value: False

search_defaults: None

device#

The device to use to do the export. Defaults to ‘cpu’.

type: str

default_value: cpu

search_defaults: None

extra_args#

Extra arguments to pass to the optimum.exporters.onnx.main_export function.

type: dict

default_value: None

search_defaults: None

OptimumMerging#

Merges a decoder_model with its decoder_with_past_model via the Optimum library.

Input: handler.composite.CompositeModelHandler

Output: handler.onnx.ONNXModelHandler

strict#

When set, the decoder and decoder_with_past are expected to have strictly the same number of outputs. When False, the decoder is allowed to have more outputs that decoder_with_past, in which case constant outputs are added to match the number of outputs.

type: bool

default_value: True

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

ModelBuilder#

Converts a Huggingface generative PyTorch model to ONNX model using the Generative AI builder. See microsoft/onnxruntime-genai

Input: handler.hf.HfModelHandler | handler.onnx.ONNXModelHandler

Output: handler.onnx.ONNXModelHandler

precision#

Precision of model.

type: olive.passes.onnx.model_builder.ModelBuilder.Precision

required: True

metadata_only#

Whether to export the model or generate required metadata only.

type: bool

default_value: False

search_defaults: None

search#

Search options to use for generate loop.

type: dict[str, Any]

default_value: None

search_defaults: None

use_qdq#

Use this option when you want to use quantize-dequantize ops. For example, you will have a quantized MatMul op instead of the MatMulNBits op.

type: bool

default_value: False

search_defaults: None

int4_block_size#

Specify the block_size for int4 quantization. Acceptable values: 16/32/64/128/256.

type: olive.passes.onnx.model_builder.ModelBuilder.BlockSize

default_value: None

search_defaults: None

int4_accuracy_level#

Specify the minimum accuracy level for activation of MatMul in int4 quantization.

type: olive.passes.onnx.model_builder.ModelBuilder.AccuracyLevel

default_value: None

search_defaults: None

int4_op_types_to_quantize#

Specify the op types to quantize for int4 quantization. Default is None (= [ “MatMul” ]). Example: [“MatMul”, “Gemm”]

type: list[str]

default_value: None

search_defaults: None

exclude_embeds#

Remove embedding layer from your ONNX model.

type: bool

default_value: False

search_defaults: None

exclude_lm_head#

Remove language modeling head from your ONNX model.

type: bool

default_value: False

search_defaults: None

enable_cuda_graph#

The model can use CUDA graph capture for CUDA execution provider. If enabled, all nodes being placed on the CUDA EP is the prerequisite for the CUDA graph to be used correctly.

type: bool

default_value: None

search_defaults: None

Pytorch#

CaptureSplitInfo#

Capture the split information of the model layers. Only splits the transformer layers.

Input: handler.hf.HfModelHandler | handler.pytorch.PyTorchModelHandler

Output: handler.hf.HfModelHandler | handler.pytorch.PyTorchModelHandler

num_splits#

Number of splits to divide the model layers into.

type: int

default_value: None

search_defaults: None

block_to_split#

Names of the model blocks to split. Children of the block will be divided into the splits. For supported transformers models, the default value is the transformers layer block name.

type: str | list[str]

default_value: None

search_defaults: None

cost_model#

Path to the cost model csv file. One of num_splits or cost_model is required. Must be a csv with headers module,num_params,num_bytes,num_flops where each row corresponds to the name or a module (with no children), the number of parameters, the number of bytes, and the number of FLOPs(batch_size=1, seqlen=1) the module uses when in the desired precision.

type: str | pathlib.Path

default_value: None

search_defaults: None

unique_embeds_lm_head_splits#

Assign embeddings and lm_head layers to their own splits.

type: bool

default_value: False

search_defaults: None

LoRA#

Run LoRA fine-tuning on a Hugging Face PyTorch model.

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

r#

R dimension.

type: int

default_value: 64

search_defaults: Categorical([16, 32, 64])

alpha#

The alpha parameter for scaling.

type: float

default_value: 16

search_defaults: None

lora_dropout#

The dropout probability for Lora layers.

type: float

default_value: 0.05

search_defaults: None

target_modules#

Target modules

type: list[str]

default_value: None

search_defaults: None

modules_to_save#

List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.

type: None

default_value: None

search_defaults: None

torch_dtype#

Data type to use for training. Should be one of bfloat16, float16 or float32. If float16 will use fp16 mixed-precision training.

type: str

default_value: bfloat16

search_defaults: None

device_map#

Device map to use to load the model.

type: olive.passes.pytorch.lora.DeviceMap | None

default_value: auto

search_defaults: None

allow_tf32#

Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see ‘https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices’

type: bool

default_value: True

search_defaults: None

ephemeral_gpu_offload#

Ephemeral GPU offload

type: bool

default_value: False

search_defaults: None

train_data_config#

Data config for fine-tuning training.

type: olive.data.config.DataConfig | dict

required: True

eval_data_config#

Data config for fine-tuning evaluation. Optional if evaluation is not needed.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

training_args#

Training arguments. If None, will use default arguments. See HFTrainingArguments for more details.

type: olive.passes.pytorch.lora.HFTrainingArguments | dict

default_value: None

search_defaults: None

LoHa#

Run LoHa fine-tuning on a Hugging Face PyTorch model.

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

rank_dropout#

The dropout probability for rank dimension during training.

type: float

default_value: 0.0

search_defaults: None

module_dropout#

The dropout probability for disabling modules during training.

type: float

default_value: 0.0

search_defaults: None

use_effective_conv2d#

Use parameter effective decomposition for Conv2d with ksize > 1.

type: bool

default_value: True

search_defaults: None

exclude_modules#

Modules to exclude from tuning.

type: list[str] | str | None

default_value: None

search_defaults: None

init_weights#

Whether to perform initialization of adapter weights.

type: bool

default_value: True

search_defaults: None

layers_to_transform#

The layer indices to transform.

type: list[int]

default_value: None

search_defaults: None

layers_pattern#

The layer pattern name, used only if layers_to_transform is different from None.

type: list[str]

default_value: None

search_defaults: None

rank_pattern#

The mapping from layer names or regexp expression to ranks which are different from the default rank specified by r.

type: dict

default_value: {}

search_defaults: None

alpha_pattern#

The mapping from layer names or regexp expression to alphas which are different from the default alpha specified by alpha.

type: dict

default_value: {}

search_defaults: None

r#

R dimension.

type: int

default_value: 64

search_defaults: Categorical([16, 32, 64])

alpha#

The alpha parameter for scaling.

type: float

default_value: 16

search_defaults: None

lora_dropout#

The dropout probability for Lora layers.

type: float

default_value: 0.05

search_defaults: None

target_modules#

Target modules

type: list[str]

default_value: None

search_defaults: None

modules_to_save#

List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.

type: None

default_value: None

search_defaults: None

torch_dtype#

Data type to use for training. Should be one of bfloat16, float16 or float32. If float16 will use fp16 mixed-precision training.

type: str

default_value: bfloat16

search_defaults: None

device_map#

Device map to use to load the model.

type: olive.passes.pytorch.lora.DeviceMap | None

default_value: auto

search_defaults: None

allow_tf32#

Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see ‘https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices’

type: bool

default_value: True

search_defaults: None

ephemeral_gpu_offload#

Ephemeral GPU offload

type: bool

default_value: False

search_defaults: None

train_data_config#

Data config for fine-tuning training.

type: olive.data.config.DataConfig | dict

required: True

eval_data_config#

Data config for fine-tuning evaluation. Optional if evaluation is not needed.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

training_args#

Training arguments. If None, will use default arguments. See HFTrainingArguments for more details.

type: olive.passes.pytorch.lora.HFTrainingArguments | dict

default_value: None

search_defaults: None

LoKr#

Run LoKr fine-tuning on a Hugging Face PyTorch model.

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

decompose_both#

Perform rank decomposition of left kronecker product matrix.

type: bool

default_value: False

search_defaults: None

decompose_factor#

Kronecker product decomposition factor.

type: int

default_value: -1

search_defaults: None

rank_dropout_scale#

Whether to scale the rank dropout while training.

type: bool

default_value: False

search_defaults: None

rank_dropout#

The dropout probability for rank dimension during training.

type: float

default_value: 0.0

search_defaults: None

module_dropout#

The dropout probability for disabling modules during training.

type: float

default_value: 0.0

search_defaults: None

use_effective_conv2d#

Use parameter effective decomposition for Conv2d with ksize > 1.

type: bool

default_value: True

search_defaults: None

exclude_modules#

Modules to exclude from tuning.

type: list[str] | str | None

default_value: None

search_defaults: None

init_weights#

Whether to perform initialization of adapter weights.

type: bool

default_value: True

search_defaults: None

layers_to_transform#

The layer indices to transform.

type: list[int]

default_value: None

search_defaults: None

layers_pattern#

The layer pattern name, used only if layers_to_transform is different from None.

type: list[str]

default_value: None

search_defaults: None

rank_pattern#

The mapping from layer names or regexp expression to ranks which are different from the default rank specified by r.

type: dict

default_value: {}

search_defaults: None

alpha_pattern#

The mapping from layer names or regexp expression to alphas which are different from the default alpha specified by alpha.

type: dict

default_value: {}

search_defaults: None

r#

R dimension.

type: int

default_value: 64

search_defaults: Categorical([16, 32, 64])

alpha#

The alpha parameter for scaling.

type: float

default_value: 16

search_defaults: None

lora_dropout#

The dropout probability for Lora layers.

type: float

default_value: 0.05

search_defaults: None

target_modules#

Target modules

type: list[str]

default_value: None

search_defaults: None

modules_to_save#

List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.

type: None

default_value: None

search_defaults: None

torch_dtype#

Data type to use for training. Should be one of bfloat16, float16 or float32. If float16 will use fp16 mixed-precision training.

type: str

default_value: bfloat16

search_defaults: None

device_map#

Device map to use to load the model.

type: olive.passes.pytorch.lora.DeviceMap | None

default_value: auto

search_defaults: None

allow_tf32#

Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see ‘https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices’

type: bool

default_value: True

search_defaults: None

ephemeral_gpu_offload#

Ephemeral GPU offload

type: bool

default_value: False

search_defaults: None

train_data_config#

Data config for fine-tuning training.

type: olive.data.config.DataConfig | dict

required: True

eval_data_config#

Data config for fine-tuning evaluation. Optional if evaluation is not needed.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

training_args#

Training arguments. If None, will use default arguments. See HFTrainingArguments for more details.

type: olive.passes.pytorch.lora.HFTrainingArguments | dict

default_value: None

search_defaults: None

QLoRA#

Run QLoRA fine-tuning on a Hugging Face PyTorch model.

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

double_quant#

Whether to use nested quantization where the quantization constants from the first quantization are quantized again.

type: bool

default_value: False

search_defaults: None

quant_type#

Quantization data type to use. Should be one of fp4 or nf4.

type: str

default_value: nf4

search_defaults: None

compute_dtype#

Computation data type for the quantized modules. If not provided, will use the same dtype as torch_dtype

type: str

default_value: None

search_defaults: None

save_quant_config#

Whether to save the output model with the bitsandbytes quantization config. If False, the base model will be in the original precision. If True, the base model will be quantized on load.

type: bool

default_value: True

search_defaults: None

r#

R dimension.

type: int

default_value: 64

search_defaults: Categorical([16, 32, 64])

alpha#

The alpha parameter for scaling.

type: float

default_value: 16

search_defaults: None

lora_dropout#

The dropout probability for Lora layers.

type: float

default_value: 0.05

search_defaults: None

target_modules#

Target modules

type: list[str]

default_value: None

search_defaults: None

modules_to_save#

List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.

type: None

default_value: None

search_defaults: None

torch_dtype#

Data type to use for training. Should be one of bfloat16, float16 or float32. If float16 will use fp16 mixed-precision training.

type: str

default_value: bfloat16

search_defaults: None

device_map#

Device map to use to load the model.

type: olive.passes.pytorch.lora.DeviceMap | None

default_value: auto

search_defaults: None

allow_tf32#

Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see ‘https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices’

type: bool

default_value: True

search_defaults: None

ephemeral_gpu_offload#

Ephemeral GPU offload

type: bool

default_value: False

search_defaults: None

train_data_config#

Data config for fine-tuning training.

type: olive.data.config.DataConfig | dict

required: True

eval_data_config#

Data config for fine-tuning evaluation. Optional if evaluation is not needed.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

training_args#

Training arguments. If None, will use default arguments. See HFTrainingArguments for more details.

type: olive.passes.pytorch.lora.HFTrainingArguments | dict

default_value: None

search_defaults: None

DoRA#

Run DoRA fine-tuning on a Hugging Face PyTorch model.

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

r#

R dimension.

type: int

default_value: 64

search_defaults: Categorical([16, 32, 64])

alpha#

The alpha parameter for scaling.

type: float

default_value: 16

search_defaults: None

lora_dropout#

The dropout probability for Lora layers.

type: float

default_value: 0.05

search_defaults: None

target_modules#

Target modules

type: list[str]

default_value: None

search_defaults: None

modules_to_save#

List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.

type: None

default_value: None

search_defaults: None

torch_dtype#

Data type to use for training. Should be one of bfloat16, float16 or float32. If float16 will use fp16 mixed-precision training.

type: str

default_value: bfloat16

search_defaults: None

device_map#

Device map to use to load the model.

type: olive.passes.pytorch.lora.DeviceMap | None

default_value: auto

search_defaults: None

allow_tf32#

Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see ‘https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices’

type: bool

default_value: True

search_defaults: None

ephemeral_gpu_offload#

Ephemeral GPU offload

type: bool

default_value: False

search_defaults: None

train_data_config#

Data config for fine-tuning training.

type: olive.data.config.DataConfig | dict

required: True

eval_data_config#

Data config for fine-tuning evaluation. Optional if evaluation is not needed.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

training_args#

Training arguments. If None, will use default arguments. See HFTrainingArguments for more details.

type: olive.passes.pytorch.lora.HFTrainingArguments | dict

default_value: None

search_defaults: None

LoftQ#

Run LoftQ fine-tuning on a Hugging Face PyTorch model.

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

loftq_iter#

Number of LoftQ iterations.

type: int

default_value: 1

search_defaults: None

compute_dtype#

Computation data type for the quantized modules. If not provided, will use the same dtype as torch_dtype

type: str

default_value: None

search_defaults: None

save_quant_config#

Whether to save the output model with the bitsandbytes quantization config. If False, the base model will be in the original precision. If True, the base model will be quantized on load.

type: bool

default_value: True

search_defaults: None

r#

R dimension.

type: int

default_value: 64

search_defaults: Categorical([16, 32, 64])

alpha#

The alpha parameter for scaling.

type: float

default_value: 16

search_defaults: None

lora_dropout#

The dropout probability for Lora layers.

type: float

default_value: 0.05

search_defaults: None

target_modules#

Target modules

type: list[str]

default_value: None

search_defaults: None

modules_to_save#

List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.

type: None

default_value: None

search_defaults: None

torch_dtype#

Data type to use for training. Should be one of bfloat16, float16 or float32. If float16 will use fp16 mixed-precision training.

type: str

default_value: bfloat16

search_defaults: None

device_map#

Device map to use to load the model.

type: olive.passes.pytorch.lora.DeviceMap | None

default_value: auto

search_defaults: None

allow_tf32#

Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see ‘https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices’

type: bool

default_value: True

search_defaults: None

ephemeral_gpu_offload#

Ephemeral GPU offload

type: bool

default_value: False

search_defaults: None

train_data_config#

Data config for fine-tuning training.

type: olive.data.config.DataConfig | dict

required: True

eval_data_config#

Data config for fine-tuning evaluation. Optional if evaluation is not needed.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

training_args#

Training arguments. If None, will use default arguments. See HFTrainingArguments for more details.

type: olive.passes.pytorch.lora.HFTrainingArguments | dict

default_value: None

search_defaults: None

LoRA/QLoRA/LoftQ HFTrainingArguments#

pydantic settings olive.passes.pytorch.lora.HFTrainingArguments[source]#

Training arguments for transformers.Trainer.

Has the same fields as transformers.TrainingArguments with recommended default values for QLoRA fine-tuning.

field optim: str = 'paged_adamw_32bit'#: The optimizer to use.

field learning_rate: float = 0.0002#: The initial learning rate for AdamW.

field lr_scheduler_type: str = 'constant'#: Learning rate schedule. Constant a bit better than cosine, and has advantage for analysis.

field warmup_ratio: float = 0.03#: Fraction of steps to do a warmup for.

field evaluation_strategy: str = None#: The evaluation strategy to use. Forced to ‘no’ if eval_dataset is not provided. Otherwise, ‘steps’ unless set to ‘epoch’.

field overwrite_output_dir: bool = False#: If True, overwrite the content of output_dir. Otherwise, will continue training if output_dir points to a checkpoint directory.

field resume_from_checkpoint: str = None#: The path to a folder with a valid checkpoint for the model. Supercedes any checkpoint found in output_dir.

QuantizationAwareTraining#

Run quantization aware training on PyTorch model.

Input: handler.pytorch.PyTorchModelHandler

Output: handler.pytorch.PyTorchModelHandler

user_script#

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: pathlib.Path | str

default_value: None

search_defaults: None

script_dir#

Directory containing user script dependencies.

type: pathlib.Path | str

default_value: None

search_defaults: None

train_data_config#

Data config for training.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

val_data_config#

Data config for validation.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

training_loop_func#

Customized training loop function.

type: Callable | str

default_value: None

search_defaults: None

ptl_module#

LightningModule for PyTorch Lightning trainer. It is a way of encapsulating all the logic related to the training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html for more details.

type: Callable | str

default_value: None

search_defaults: None

ptl_data_module#

LightningDataModule for PyTorch Lightning trainer. It is a way of encapsulating all the data-related logic for training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html for more details.

type: Callable | str

default_value: None

search_defaults: None

num_epochs#

Maximum number of epochs for training.

type: int

default_value: None

search_defaults: None

num_steps#

Maximum number of steps for training.

type: int

default_value: -1

search_defaults: None

do_validate#

Whether perform one evaluation epoch over the validation set after training.

type: bool

default_value: False

search_defaults: None

modules_to_fuse#

List of list of module names to fuse.

type: list[list[str]]

default_value: None

search_defaults: None

qconfig_func#

Customized function to create a QConfig for QAT. Please refer to https://pytorch.org/docs/stable/generated/torch.ao.quantization.qconfig.QConfig.html for details.

type: Callable | str

default_value: None

search_defaults: None

logger#

Logger for training.

type: pytorch_lightning.loggers.logger.Logger | collections.abc.Iterable[pytorch_lightning.loggers.logger.Logger] | Callable | bool

default_value: False

search_defaults: None

gpus#

Number of GPUs to use.

type: int

default_value: None

search_defaults: None

seed#

Random seed for training.

type: int

default_value: None

search_defaults: None

checkpoint_path#

Path to save checkpoints.

type: str

default_value: None

search_defaults: None

MergeAdapterWeights#

Merge adapter weights into the base model.

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

SparseGPT#

Run SparseGPT on a Hugging Face PyTorch model. See https://arxiv.org/abs/2301.00774 for more details on the algorithm. This pass only supports HfModelHandler. The transformers model type must be one of [bloom, gpt2, gpt_neox, llama, opt].

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

sparsity#

Target sparsity. This can be a float or a list of two integers. Float is the target sparsity per layer. List [n,m] applies semi-structured (n:m) sparsity patterns. Refer to https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/ for more details on 2:4 sparsity pattern.

type: float | list[int]

default_value: None

search_defaults: None

blocksize#

Blocksize to use for adaptive mask selection.

type: int

default_value: 128

search_defaults: None

percdamp#

Percentage of the average Hessian diagonal to use for dampening. Must be in [0,1].

type: float

default_value: 0.01

search_defaults: None

min_layer#

Prune all layers with id >= min_layer.

type: int

default_value: None

search_defaults: None

max_layer#

Prune all layers with id < max_layer.

type: int

default_value: None

search_defaults: None

layer_name_filter#

Only prune layers whose name contains the given string(s).

type: str | list[str]

default_value: None

search_defaults: None

device#

Device to use for performing computations. Can be ‘auto, ‘cpu’, ‘cuda’, ‘cuda:0’, etc. If ‘auto’, will use cuda if available. Does not affect the final model.

type: str

default_value: auto

search_defaults: None

data_config#

Data config to use for pruning weights. All samples in the data are expected to be of the same length, most likely the max sequence length of the model.

type: olive.data.config.DataConfig | dict

required: True

SliceGPT#

Run SliceGPT on a Hugging Face PyTorch model. See https://arxiv.org/pdf/2401.15024.pdf for more details on the algorithm. This pass only supports HfModelHandler.

Input: handler.hf.HfModelHandler

Output: handler.pytorch.PyTorchModelHandler

calibration_data_config#

Data config for Dataset to calibrate and calculate perplexity on.

type: olive.data.config.DataConfig | dict

required: True

calibration_nsamples#

Number of samples of the calibration data to load.

type: int

default_value: 128

search_defaults: None

calibration_batch_size#

Batch size for loading the calibration data.

type: int

default_value: 16

search_defaults: None

seed#

Seed for sampling the calibration data.

type: int

default_value: 42

search_defaults: None

sparsity#

A measure of how much slicing is applied (in the range [0, 1))

type: float

default_value: 0.0

search_defaults: None

round_interval#

Interval for rounding the weights (the best value may depend on your hardware)

type: int

default_value: 8

search_defaults: None

final_orientation#

Final orientation of the sliced weights. Choices are random or pca.

type: str

default_value: random

search_defaults: None

QuaRot#

Rotate model using QuaRot. See https://arxiv.org/pdf/2404.00456 for more details on the algorithm. Only offline weight rotation is supported. Can be followed by a pass such as GPTQ to quantize the rotated model weights. This pass only supports HfModelHandler.

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

seed#

Random seed for rotation. Default value is 0.

type: int

default_value: 0

search_defaults: None

rotate_mode#

Rotation method to use. Default value is ‘hadamard’.

type: olive.passes.pytorch.rotate.RotateBase.RotateMode

default_value: hadamard

search_defaults: None

SpinQuant#

Rotate model using SpinQuant. See https://arxiv.org/pdf/2405.16406 for more details on the algorithm. Only offline weight rotation is supported. Can be followed by a pass such as GPTQ to quantize the rotated model weights. This pass only supports HfModelHandler.

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

seed#

Random seed for rotation. Default value is 0.

type: int

default_value: 0

search_defaults: None

rotate_mode#

Rotation method to use. Default value is ‘hadamard’.

type: olive.passes.pytorch.rotate.RotateBase.RotateMode

default_value: hadamard

search_defaults: None

a_bits#

Number of bits for dynamic quantization of activations.

type: int

default_value: 16

search_defaults: None

a_symmetric#

Whether to use symmetric quantization for activations.

type: bool

default_value: True

search_defaults: None

a_per_token#

Whether to quantize activations per token. If False, quantize activations per tensor.

type: bool

default_value: True

search_defaults: None

training_args#

Training arguments. If None, will use default arguments.

type: olive.passes.pytorch.rotate.HFTrainingArguments | dict

default_value: None

search_defaults: None

GptqQuantizer#

GPTQ quantization using Hugging Face Optimum and export model with onnxruntime optimized kernel.

Input: handler.hf.HfModelHandler | handler.pytorch.PyTorchModelHandler

Output: handler.pytorch.PyTorchModelHandler

user_script#

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: pathlib.Path | str

default_value: None

search_defaults: None

script_dir#

Directory containing user script dependencies.

type: pathlib.Path | str

default_value: None

search_defaults: None

bits#

quantization bits. Default value is 4

type: int

default_value: 4

search_defaults: None

layers_block_name#

Block name to quantize. For models can’t be auto filled, you can refer this link to fill these parameters. AutoGPTQ/AutoGPTQ

type: str

default_value: None

search_defaults: None

outside_layer_modules#

Names of other nn modules that in the same level as the transformer layer block. Default value is None.

type: list[str]

default_value: None

search_defaults: None

inside_layer_modules#

Names of linear layers in transformer layer module. Default value is None.

type: list[list[str]]

default_value: None

search_defaults: None

group_size#

Block size for quantization. Default value is 128.

type: int

default_value: 128

search_defaults: None

damp_percent#

Damping factor for quantization. Default value is 0.01.

type: float

default_value: 0.01

search_defaults: None

static_groups#

Use static groups for quantization. Default value is False.

type: bool

default_value: False

search_defaults: None

true_sequential#

Use true sequential for quantization. Default value is False.

type: bool

default_value: False

search_defaults: None

desc_act#

Use descriptive activation for quantization. Default value is False.

type: bool

default_value: False

search_defaults: None

sym#

Symmetric quantization. Default value is False.

type: bool

default_value: False

search_defaults: None

data_config#

Data config for quantization. If not provided, wikitest train data will be used for HfModels. Required for PyTorch models.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

AutoAWQQuantizer#

AWQ quantization.

Input: handler.hf.HfModelHandler

Output: handler.hf.HfModelHandler

user_script#

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: pathlib.Path | str

default_value: None

search_defaults: None

script_dir#

Directory containing user script dependencies.

type: pathlib.Path | str

default_value: None

search_defaults: None

input_model_dtype#

The input model data type.

type: olive.passes.pytorch.autoawq.AutoAWQQuantizer.ModelDtype

default_value: fp16

search_defaults: None

zero_point#

Whether to use zero point quantization to calculate the scales and zeros. If False, it use the symmetric quantization.

type: bool

default_value: True

search_defaults: None

q_group_size#

The group size to use for quantization. Recommended value is 128 and -1 uses per-column quantization.

type: int

default_value: 128

search_defaults: None

w_bit#

The number of bits to quantize to.

type: int

default_value: 4

search_defaults: None

version#

The version of the quantization algorithm to use. gemm is better for big batch_size (e.g. >= 8) otherwise, gemv is better (e.g. < 8 ). gemm models are compatible with Exllama kernels.

type: str

default_value: gemm

search_defaults: None

duo_scaling#

Whether to scale using both w/x(True) or just x(False).

type: bool

default_value: True

search_defaults: None

modules_to_not_convert#

The list of modules to not quantize, useful for quantizing models that explicitly require to have some modules left in their original precision (e.g. Whisper encoder, Llava encoder, Mixtral gate layers). Please refer to AutoAWQ documentation for quantizing HF models.

type: list

default_value: []

search_defaults: None

export_compatible#

If True, this argument avoids real quantization by only applying the scales quantizing down to FP16.

type: bool

default_value: False

search_defaults: None

data_config#

Data config for quantization. If not provided, pile validation data will be used.

type: olive.data.config.DataConfig | dict

default_value: None

search_defaults: None

TorchTRTConversion#

Convert torch.nn.Linear modules in the transformer layers of a HuggingFace PyTorch model to TensorRT modules. The conversion would include fp16 precision and sparse weights, if applicable. The entire model is saved using torch.save and can be loaded using torch.load. Loading the model requires torch-tensorrt and Olive to be installed. This pass only supports HfModelHandler. The transformers model type must be one of [bloom, gpt2, gpt_neox, llama, opt].

Input: handler.hf.HfModelHandler

Output: handler.pytorch.PyTorchModelHandler

min_layer#

Convert all layers with id >= min_layer.

type: int

default_value: None

search_defaults: None

max_layer#

Convert all layers with id < max_layer.

type: int

default_value: None

search_defaults: None

layer_name_filter#

Only convert layers whose name contains the given string(s).

type: str | list[str]

default_value: None

search_defaults: None

float16#

Convert entire model to fp16. If False, only the sparse modules are converted to fp16.

type: bool

default_value: False

search_defaults: None

data_config#

Data config to use for compiling module to TensorRT. The batch size of the compiled module is set to the batch size of the first batch of the dataloader.

type: olive.data.config.DataConfig | dict

required: True

OpenVINO#

OpenVINOConversion#

Converts PyTorch, ONNX or TensorFlow Model to OpenVino Model.

Input: handler.hf.HfModelHandler | handler.pytorch.PyTorchModelHandler | handler.onnx.ONNXModelHandler

Output: handler.openvino.OpenVINOModelHandler

user_script#

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: pathlib.Path | str

default_value: None

search_defaults: None

script_dir#

Directory containing user script dependencies.

type: pathlib.Path | str

default_value: None

search_defaults: None

input_shapes#

Set or override shapes for model inputs.It configures dynamic and static dimensions in model inputsdepending on your inference requirements.Static parameter is required if static models are required.

type: Callable | str | list

default_value: None

search_defaults: None

example_input_func#

Function/function name to generate sample of model input in original framework.For PyTorch it can be torch.Tensor.For Tensorflow it can be tf.Tensor or numpy.ndarray.By default a pytorch float tensor is created.

type: Callable | str

default_value: None

search_defaults: None

compress_to_fp16#

Compress weights in output OpenVINO model to FP16. Default is True.

type: bool

default_value: True

search_defaults: None

extra_configs#

Extra configurations for OpenVINO model conversion. extra_config can be set by passing a dictionary where key is the parameter name, and the value is the parameter value. Please check Conversion Parameters documentation for more details: https://docs.openvino.ai/2023.3/openvino_docs_OV_Converter_UG_Conversion_Options.html

type: dict

default_value: None

search_defaults: None

model_name#

Name of output openVINO model.

type: str

default_value: ov_model

search_defaults: None

static#

Create a static model instead of a dynamic model.Enabled by default.

type: bool

default_value: True

search_defaults: None

OpenVINOIoUpdate#

Converts dynamic OpenVINO Model to static OpenVino Model and updates IO names.

Input: handler.openvino.OpenVINOModelHandler

Output: handler.openvino.OpenVINOModelHandler

user_script#

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: pathlib.Path | str

default_value: None

search_defaults: None

script_dir#

Directory containing user script dependencies.

type: pathlib.Path | str

default_value: None

search_defaults: None

extra_configs#

Extra configurations for OpenVINO model conversion. extra_config can be set by passing a dictionary where key is the parameter name, and the value is the parameter value. Please check Conversion Parameters documentation for more details: https://docs.openvino.ai/2025/openvino-workflow/model-preparation/conversion-parameters.html

type: dict

default_value: None

search_defaults: None

input_shapes#

Reshapes the model with given inputs. It configures dynamic and static dimensions in model inputs depending on your inference requirements. Static parameter is required to be enabled if static dimensions are required.

type: list

default_value: None

search_defaults: None

static#

Create a static model instead of a dynamic model.Enabled by default.

type: bool

default_value: False

search_defaults: None

OpenVINOQuantization#

Input: handler.openvino.OpenVINOModelHandler

Output: handler.openvino.OpenVINOModelHandler

user_script#

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: pathlib.Path | str

default_value: None

search_defaults: None

script_dir#

Directory containing user script dependencies.

type: pathlib.Path | str

default_value: None

search_defaults: None

data_config#

Data config for calibration.

type: olive.data.config.DataConfig | dict

required: True

model_type#

Used to specify quantization scheme required for specific type of the model. ‘TRANSFORMER’ is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, DistilBERT, etc.). None is default.

type: olive.passes.openvino.quantization.ModelTypeEnum

default_value: None

search_defaults: None

preset#

Defines quantization scheme for the model. Supported values: ‘PERFORMANCE’, ‘MIXED’.

type: olive.passes.openvino.quantization.PresetEnum

default_value: PERFORMANCE

search_defaults: None

ignored_scope#

This parameter can be used to exclude some layers from the quantization process to preserve the model accuracy. Please refer to https://docs.openvino.ai/2023.3/basic_quantization_flow.html#tune-quantization-parameters.

type: str | list[str]

default_value: None

search_defaults: None

ignored_scope_type#

Defines the type of the ignored scope. Supported values: ‘names’, ‘types’, ‘patterns’.

type: olive.passes.openvino.quantization.IgnoreScopeTypeEnum

default_value: None

search_defaults: None

target_device#

Target device for the model. Supported values: ‘any’, ‘cpu’, ‘gpu’, ‘cpu_spr’, ‘npu’. Default value is the same as the accelerator type of this workflow run.

type: olive.hardware.accelerator.Device

default_value: cpu

search_defaults: None

transform_fn#

Transform function for the input data.

type: Callable | str

default_value: None

search_defaults: None

extra_configs#

Extra configurations for OpenVINO model quantization. Please refer to https://docs.openvino.ai/2023.3/basic_quantization_flow.html#tune-quantization-parameters.

type: list[dict]

default_value: None

search_defaults: None

OpenVINOEncapsulation#

Encapsulates OpenVINO models with onnx context nodes.

Input: handler.openvino.OpenVINOModelHandler

Output: handler.onnx.ONNXModelHandler

target_device#

Device the encapsulated model should run on.Available devices are cpu, gpu, npu.

type: olive.hardware.accelerator.Device

default_value: cpu

search_defaults: None

ov_version#

Name of the OpenVINO version to override in model SDK version.Requires a minimum version of OpenVINO 2025.1

type: str

default_value: None

search_defaults: None

opset_imports#

Opset name and version to be add in the generate context model

type: list

default_value: [[‘com.microsoft.nchwc’, 1], [‘’, 11], [‘ai.onnx.ml’, 5], [‘com.ms.internal.nhwc’, 11], [‘ai.onnx.training’, 1], [‘ai.onnx.preview.training’, 1], [‘com.microsoft.experimental’, 1], [‘com.microsoft’, 1], [‘org.pytorch.aten’, 1]]

search_defaults: None

OpenVINOOptimumConversion#

Convert a Hugging Face PyTorch model to OpenVINO model using the Optimum export function.

Input: handler.hf.HfModelHandler

Output: handler.openvino.OpenVINOModelHandler | handler.composite.CompositeModelHandler

user_script#

Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.

type: pathlib.Path | str

default_value: None

search_defaults: None

script_dir#

Directory containing user script dependencies.

type: pathlib.Path | str

default_value: None

search_defaults: None

components#

List of component models to export. E.g. [‘decoder_model’, ‘decoder_with_past_model’]. None means export all components.

type: list[str]

default_value: None

search_defaults: None

device#

The device to use to do the export. Defaults to ‘cpu’.This is the parameter that is directly passed to Optimum Intel export function in certain cases.

type: olive.hardware.accelerator.Device

default_value: cpu

search_defaults: None

extra_args#

Extra arguments to pass to the optimum.exporters.openvino.main_export function.

type: dict

default_value: None

search_defaults: None

ov_quant_config#

Parameters for optimum OpenVINO quantization. Please refer to https://huggingface.co/docs/optimum/main/intel/openvino/optimization#4-bit

type: dict

default_value: None

search_defaults: None

SNPE#

SNPEConversion#

Convert ONNX or TensorFlow model to SNPE DLC. Uses snpe-tensorflow-to-dlc or snpe-onnx-to-dlc tools from the SNPE SDK.

Input: handler.onnx.ONNXModelHandler | handler.tensorflow.TensorFlowModelHandler

Output: handler.snpe.SNPEModelHandler

input_names#

List of input names.

type: list[str]

required: True

input_shapes#

List of input shapes. Must be the same length as input_names.

type: list[list[int]]

required: True

output_names#

List of output names.

type: list[str]

required: True

input_types#

List of input types. If not None, it must be a list of the same length as input_names. List members can be None to use default value. Refer to olive.platform_sdk.qualcomm.constants.InputType for valid values.

type: list[str | None]

default_value: None

search_defaults: None

input_layouts#

List of input layouts. If not None, it must be a list of the same length as input_names. List members can be None to use inferred value. Refer to olive.platform_sdk.qualcomm.constants.InputLayout for valid values.

type: list[str | None]

default_value: None

search_defaults: None

extra_args#

Extra arguments to pass to snpe conversion tool. Refer to snpe-onnx-to-dlc and snpe-tensorflow-to-dlc at https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html for more additional arguments. The value is a string that will be passed as is to the tool. e.g.: –enable_cpu_fallback –priority_hint low

type: str

default_value: None

search_defaults: None

SNPEQuantization#

Quantize SNPE model. Uses snpe-dlc-quantize tool from the SNPE SDK.

Input: handler.snpe.SNPEModelHandler

Output: handler.snpe.SNPEModelHandler

data_config#

Data config for quantization

type: olive.data.config.DataConfig | dict

required: True

use_enhanced_quantizer#

Use the enhanced quantizer feature when quantizing the model. Uses an algorithm to determine optimal range instead of min and max range of data. It can be useful for quantizing models that have long tails in the distribution of the data being quantized.

type: bool

default_value: False

search_defaults: Categorical([True, False])

enable_htp#

Pack HTP information in quantized DLC, which is not available in Windows.

type: bool

default_value: False

search_defaults: Categorical([True, False])

htp_socs#

List of SoCs to generate HTP Offline cache for.

type: list[str]

default_value: None

search_defaults: None

extra_args#

Extra arguments to pass to snpe conversion tool. Refer to https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html#tools_snpe-dlc-quantize for more additional arguments. The value is a string that will be passed as is to the tool. e.g.: –bias_bitwidth 16 –overwrite_cache_records

type: str

default_value: None

search_defaults: None

SNPEtoONNXConversion#

Convert a SNPE DLC to ONNX to use with SNPE Execution Provider. Creates a ONNX graph with the SNPE DLC as a node.

Input: handler.snpe.SNPEModelHandler

Output: handler.onnx.ONNXModelHandler

target_device#

Target device for the ONNX model. Refer to oliveolive.platform_sdk.qualcomm.constants.SNPEDevice for valid values.

type: str

default_value: cpu

search_defaults: None

target_opset#

Target ONNX opset version.

type: int

default_value: 12

search_defaults: None

save_as_external_data#

Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.

type: bool

default_value: False

search_defaults: None

all_tensors_to_one_file#

Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.

type: bool

default_value: True

search_defaults: None

external_data_name#

Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data

type: str

default_value: None

search_defaults: None

size_threshold#

Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.

type: int

default_value: 1024

search_defaults: None

convert_attribute#

Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data

type: bool

default_value: False

search_defaults: None

QNN#

QNNConversion#

Convert ONNX, TensorFlow, or PyTorch model to QNN C++ model. Quantize the model if –input_list is provided as extra_args. Uses qnn-[framework]-converter tool from the QNN SDK.

Input: handler.tensorflow.TensorFlowModelHandler | handler.pytorch.PyTorchModelHandler | handler.onnx.ONNXModelHandler

Output: handler.qnn.QNNModelHandler

input_dim#

The names and dimensions of the network input layers specified in the format [input_name comma-separated-dimensions], for example: [“data 1,224,224,3”] Note that the quotes should always be included in order to handle special characters, spaces, etc. For multiple inputs specify multiple –input_dim on the command line like: [“data 1,224,224,3”, “data2 1,224,224,3”] If –input_dim is not specified, the input dimensions will be inferred from the model. If –input_dim is specified, the input dimensions will be used as-is.

type: list[str]

default_value: None

search_defaults: None

out_node#

The name of the output node. If not specified, the output node will be inferred from the model. If specified, the output node will be used as-is. Example: [“out_1”, “out_2”]

type: list[str]

default_value: None

search_defaults: None

extra_args#

Extra arguments to pass to qnn-[framework]-converter tool, e.g. –show_unconsumed_nodes –custom_io CUSTOM_IO. See the documentation for more details: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/tools.html

type: str

default_value: None

search_defaults: None

QNNModelLibGenerator#

Compile QNN C++ model source code into QNN model library for a specific target. Uses qnn-model-lib-generator tool from the QNN SDK.

Input: handler.qnn.QNNModelHandler

Output: handler.qnn.QNNModelHandler

lib_targets#

Specifies the targets to build the models for. Default: aarch64-android x86_64-linux-clang

type: str

default_value: None

search_defaults: None

lib_name#

Specifies the name to use for libraries. Default: uses name in <model.bin> if provided, else generic qnn_model.so

type: str

default_value: None

search_defaults: None

QNNContextBinaryGenerator#

Create QNN context binary from a QNN model library using a particular backend. Uses qnn-context-binary-generator tool from the QNN SDK.

Input: handler.qnn.QNNModelHandler | handler.snpe.SNPEModelHandler

Output: handler.qnn.QNNModelHandler

backend#

Path to a QNN backend .so library to create the context binary.

type: str

required: True

binary_file#

Name of the binary file to save the context binary to. Saved in the same path as –output_dir option with .bin as the binary file extension. If not provided, no backend binary is created.

type: str

default_value: None

search_defaults: None

extra_args#

Extra arguments to qnn-context-binary-generator

type: str

default_value: None

search_defaults: None