Passes¶
The following passes are available in Olive.
Each pass is followed by a description of the pass and a list of the pass’s configuration options.
OnnxConversion¶
Convert a PyTorch model to ONNX model using torch.onnx.export.
Input: PyTorchModel
Output: ONNXModel | CompositeOnnxModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- target_opset¶
The version of the default (ai.onnx) opset to target.
type: int
default_value: 14
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
OnnxModelOptimizer¶
Optimize ONNX model by fusing nodes.
Input: ONNXModel
Output: ONNXModel
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
OrtTransformersOptimization¶
Optimize transformer based models in scenarios where ONNX Runtime does not apply the optimization at load time. It is based on onnxruntime.transformers.optimizer.
Input: ONNXModel
Output: ONNXModel
- model_type¶
Transformer based model type, including bert (exported by PyTorch), gpt2 (exported by PyTorch), bert_tf (BERT exported by tf2onnx), bert_keras (BERT exported by keras2onnx), and unet/vae/clip (stable diffusion).
type: str
required: True
- num_heads¶
Number of attention heads.
type: int
default_value: 0
searchable_values: None
Number of hidden nodes.
type: int
default_value: 0
searchable_values: None
- optimization_options¶
Optimization options that turn on/off some fusions.
type: Dict[str, Any] | onnxruntime.transformers.fusion_options.FusionOptions
default_value: None
searchable_values: None
- opt_level¶
Graph optimization level of Onnx Runtime: 0 - disable all (default), 1 - basic, 2 - extended, 99 - all.
type: Any
default_value: None
searchable_values: None
- use_gpu¶
Flag for GPU inference.
type: bool
default_value: False
searchable_values: None
- only_onnxruntime¶
Whether only use onnxruntime to optimize model, and no python fusion.
type: bool
default_value: False
searchable_values: None
- float16¶
Whether half-precision float will be used.
type: bool
default_value: False
searchable_values: None
- input_int32¶
Whether int32 tensors will be used as input.
type: bool
default_value: False
searchable_values: None
- keep_io_types¶
Keep input and output tensors in their original data type
type: bool
default_value: True
searchable_values: None
- force_fp32_ops¶
Operators that are forced to run in float32
type: List[str]
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
OrtPerfTuning¶
Optimize ONNX Runtime inference settings.
Input: ONNXModel
Output: ONNXModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- data_config¶
Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.
type: olive.data.config.DataConfig | str
default_value: None
searchable_values: None
- data_dir¶
Directory of sample inference data.
type: pathlib.Path | str
default_value: None
searchable_values: None
- dataloader_func¶
Dataloader function to load data from given data_dir with given batch size.
type: Callable | str
default_value: None
searchable_values: None
- batch_size¶
Batch size for inference.
type: int
default_value: None
searchable_values: None
- input_names¶
Input names list for ONNX model.
type: list
default_value: None
searchable_values: None
- input_shapes¶
Input shapes list for ONNX model.
type: list
default_value: None
searchable_values: None
- input_types¶
Input types list for ONNX model.
type: list
default_value: None
searchable_values: None
- device¶
Device selected for tuning process.
type: str
default_value: cpu
searchable_values: None
- cpu_cores¶
CPU cores used for thread tuning.
type: int
default_value: None
searchable_values: None
- io_bind¶
Whether enable IOBinding Search for ONNX Runtime inference.
type: bool
default_value: False
searchable_values: None
- providers_list¶
Execution providers framework list to execute the ONNX models.
type: list
default_value: None
searchable_values: None
- execution_mode_list¶
Parallelism list between operators.
type: list
default_value: None
searchable_values: None
- opt_level_list¶
Optimization level list for ONNX model.
type: list
default_value: None
searchable_values: None
- trt_fp16_enable¶
Whether enable FP16 mode for TensorRT execution provider.
type: bool
default_value: False
searchable_values: None
- intra_thread_num_list¶
List of intra thread number for test.
type: list
default_value: [None]
searchable_values: None
- inter_thread_num_list¶
List of inter thread number for test.
type: list
default_value: [None]
searchable_values: None
- extra_session_config¶
Extra customized session options during tuning process.
type: Dict[str, Any]
default_value: None
searchable_values: None
OnnxFloatToFloat16¶
Converts a model to float16. It is based on onnxconverter-common.convert_float_to_float16. See https://onnxruntime.ai/docs/performance/model-optimizations/float16.html#float16-conversion
Input: ONNXModel
Output: ONNXModel
- min_positive_val¶
Constant values will be clipped against this value
type: float
default_value: 1e-07
searchable_values: None
- max_finite_val¶
Constant values will be clipped against this value
type: float
default_value: 10000.0
searchable_values: None
- keep_io_types¶
Whether model inputs/outputs should be left as float32
type: bool
default_value: False
searchable_values: None
- disable_shape_infer¶
Skips running onnx shape/type inference.
type: bool
default_value: False
searchable_values: None
- op_block_list¶
List of op types to leave as float32
type: List[str]
default_value: None
searchable_values: None
- node_block_list¶
List of node names to leave as float32
type: List[str]
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
OrtMixedPrecision¶
Convert model to mixed precision.
Input: ONNXModel
Output: ONNXModel
- op_block_list¶
List of op types to leave as float32
type: List[str]
default_value: [‘SimplifiedLayerNormalization’, ‘SkipSimplifiedLayerNormalization’, ‘Relu’, ‘Add’]
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
OnnxDynamicQuantization¶
ONNX Dynamic Quantization Pass
Input: ONNXModel
Output: ONNXModel
- data_config¶
Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.
type: olive.data.config.DataConfig | str
default_value: None
searchable_values: None
- quant_mode¶
dynamic quantization mode
type: str
default_value: dynamic
searchable_values: None
- weight_type¶
Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.
type: str
default_value: QInt8
searchable_values: Categorical([‘QInt8’, ‘QUInt8’])
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- per_channel¶
Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- reduce_range¶
Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- optimize_model¶
Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default_value: True
searchable_values: Categorical([True, False])
- extra.Sigmoid.nnapi¶
type: bool
default_value: False
searchable_values: None
- ActivationSymmetric¶
symmetrize calibration data for activations
type: bool
default_value: False
searchable_values: None
- WeightSymmetric¶
symmetrize calibration data for weights
type: bool
default_value: True
searchable_values: None
- EnableSubgraph¶
If enabled, subgraph will be quantized. Dynamic mode currently is supported.
type: bool
default_value: False
searchable_values: None
- ForceQuantizeNoInputCheck¶
By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.
type: bool
default_value: False
searchable_values: None
- MatMulConstBOnly¶
If enabled, only MatMul with const B will be quantized.
type: bool
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: None
- extra_options¶
Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.
type: dict
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
OnnxStaticQuantization¶
ONNX Static Quantization Pass
Input: ONNXModel
Output: ONNXModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- data_config¶
Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.
type: olive.data.config.DataConfig | str
default_value: None
searchable_values: None
- quant_mode¶
static quantization mode
type: str
default_value: static
searchable_values: None
- weight_type¶
Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.
type: str
default_value: QInt8
searchable_values: Categorical([‘QInt8’, ‘QUInt8’])
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- per_channel¶
Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- reduce_range¶
Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- optimize_model¶
Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default_value: True
searchable_values: Categorical([True, False])
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.
type: pathlib.Path | str
default_value: None
searchable_values: None
- batch_size¶
Batch size for calibration, required if quant_mode is ‘static’.
type: int
default_value: 1
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’
type: Callable | str
default_value: None
searchable_values: None
- calibrate_method¶
Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.
type: str
default_value: MinMax
searchable_values: Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])
- quant_format¶
QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.
type: str
default_value: QDQ
searchable_values: Categorical([‘QOperator’, ‘QDQ’])
- activation_type¶
Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection
type: str
default_value: QInt8
searchable_values: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))
- extra.Sigmoid.nnapi¶
type: bool
default_value: False
searchable_values: None
- ActivationSymmetric¶
symmetrize calibration data for activations
type: bool
default_value: False
searchable_values: None
- WeightSymmetric¶
symmetrize calibration data for weights
type: bool
default_value: True
searchable_values: None
- EnableSubgraph¶
If enabled, subgraph will be quantized. Dynamic mode currently is supported.
type: bool
default_value: False
searchable_values: None
- ForceQuantizeNoInputCheck¶
By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.
type: bool
default_value: False
searchable_values: None
- MatMulConstBOnly¶
If enabled, only MatMul with const B will be quantized.
type: bool
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: None
- extra_options¶
Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.
type: dict
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
OnnxQuantization¶
Quantize ONNX model with onnxruntime where we can search for best parameters for static/dynamic quantization at same time.
Input: ONNXModel
Output: ONNXModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- data_config¶
Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.
type: olive.data.config.DataConfig | str
default_value: None
searchable_values: None
- quant_mode¶
Onnx Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.
type: str
default_value: static
searchable_values: Categorical([‘dynamic’, ‘static’])
- weight_type¶
Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.
type: str
default_value: QInt8
searchable_values: Categorical([‘QInt8’, ‘QUInt8’])
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- per_channel¶
Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- reduce_range¶
Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- optimize_model¶
Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default_value: True
searchable_values: Categorical([True, False])
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.
type: pathlib.Path | str
default_value: None
searchable_values: None
- batch_size¶
Batch size for calibration, required if quant_mode is ‘static’.
type: int
default_value: 1
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’
type: Callable | str
default_value: None
searchable_values: None
- calibrate_method¶
Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.
type: str
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘MinMax’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))
- quant_format¶
QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.
type: str
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QDQ’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))
- activation_type¶
Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection
type: str
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QInt8’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: Conditional(parents: (‘quant_mode’, ‘quant_format’, ‘weight_type’), support: {(‘static’, ‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘static’, ‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))
- extra.Sigmoid.nnapi¶
type: bool
default_value: False
searchable_values: None
- ActivationSymmetric¶
symmetrize calibration data for activations
type: bool
default_value: False
searchable_values: None
- WeightSymmetric¶
symmetrize calibration data for weights
type: bool
default_value: True
searchable_values: None
- EnableSubgraph¶
If enabled, subgraph will be quantized. Dynamic mode currently is supported.
type: bool
default_value: False
searchable_values: None
- ForceQuantizeNoInputCheck¶
By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.
type: bool
default_value: False
searchable_values: None
- MatMulConstBOnly¶
If enabled, only MatMul with const B will be quantized.
type: bool
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: None
- extra_options¶
Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.
type: dict
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
IncDynamicQuantization¶
Intel® Neural Compressor Dynamic Quantization Pass
Input: ONNXModel
Output: ONNXModel
- data_config¶
Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.
type: olive.data.config.DataConfig | str
default_value: None
searchable_values: None
- approach¶
dynamic quantization mode
type: str
default_value: dynamic
searchable_values: None
- device¶
Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.
type: str
default_value: cpu
searchable_values: None
- backend¶
Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’
type: str
default_value: default
searchable_values: None
- domain¶
Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.
type: str
default_value: auto
searchable_values: None
- recipes¶
Recipes for Intel® Neural Compressor quantiztaion, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocess and postprocess ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only vaild for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only vaild for onnxrt_trt_ep
type: dict
default_value: {}
searchable_values: None
- reduce_range¶
Whether use 7 bit to quantization.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_level¶
Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details
type: str
default_value: auto
searchable_values: None
- excluded_precisions¶
Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].
type: list
default_value: []
searchable_values: None
- use_distributed_tuning¶
Intel® Neural Compressor provides distributed tuning to speed up the tuning process by leveraging the multi-node cluster. Prerequisites: A working MPI implementation and installed mpi4py.
type: bool
default_value: False
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
IncStaticQuantization¶
Intel® Neural Compressor Static Quantization Pass
Input: ONNXModel
Output: ONNXModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- data_config¶
Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.
type: olive.data.config.DataConfig | str
default_value: None
searchable_values: None
- approach¶
static quantization mode
type: str
default_value: static
searchable_values: None
- device¶
Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.
type: str
default_value: cpu
searchable_values: None
- backend¶
Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’
type: str
default_value: default
searchable_values: None
- domain¶
Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.
type: str
default_value: auto
searchable_values: None
- recipes¶
Recipes for Intel® Neural Compressor quantiztaion, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocess and postprocess ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only vaild for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only vaild for onnxrt_trt_ep
type: dict
default_value: {}
searchable_values: None
- reduce_range¶
Whether use 7 bit to quantization.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_level¶
Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details
type: str
default_value: auto
searchable_values: None
- excluded_precisions¶
Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].
type: list
default_value: []
searchable_values: None
- use_distributed_tuning¶
Intel® Neural Compressor provides distributed tuning to speed up the tuning process by leveraging the multi-node cluster. Prerequisites: A working MPI implementation and installed mpi4py.
type: bool
default_value: False
searchable_values: None
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if approach is ‘static’.
type: pathlib.Path | str
default_value: None
searchable_values: None
- batch_size¶
Batch size for calibration, required if approach is ‘static’.
type: int
default_value: 1
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if approach is ‘static’
type: Callable | str
required: True
- quant_format¶
Quantization format. Support ‘QDQ’ and ‘QOperator’.
type: str
default_value: QOperator
searchable_values: Categorical([‘QOperator’, ‘QDQ’])
- calibration_sampling_size¶
Number of calibration sample.
type: list | int
default_value: [100]
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
IncQuantization¶
Quantize ONNX model with Intel® Neural Compressor.
Input: ONNXModel
Output: ONNXModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- data_config¶
Data config for calibration, required if quant_mode is ‘static’. If not provided, a default DataConfig will be used.
type: olive.data.config.DataConfig | str
default_value: None
searchable_values: None
- approach¶
Intel® Neural Compressor Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.
type: str
default_value: static
searchable_values: Categorical([‘dynamic’, ‘static’])
- device¶
Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.
type: str
default_value: cpu
searchable_values: None
- backend¶
Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’
type: str
default_value: default
searchable_values: None
- domain¶
Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.
type: str
default_value: auto
searchable_values: None
- recipes¶
Recipes for Intel® Neural Compressor quantiztaion, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocess and postprocess ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only vaild for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only vaild for onnxrt_trt_ep
type: dict
default_value: {}
searchable_values: None
- reduce_range¶
Whether use 7 bit to quantization.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_level¶
Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details
type: str
default_value: auto
searchable_values: None
- excluded_precisions¶
Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].
type: list
default_value: []
searchable_values: None
- use_distributed_tuning¶
Intel® Neural Compressor provides distributed tuning to speed up the tuning process by leveraging the multi-node cluster. Prerequisites: A working MPI implementation and installed mpi4py.
type: bool
default_value: False
searchable_values: None
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if approach is ‘static’.
type: pathlib.Path | str
default_value: None
searchable_values: None
- batch_size¶
Batch size for calibration, required if approach is ‘static’.
type: int
default_value: 1
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if approach is ‘static’
type: Callable | str
required: True
- quant_format¶
Quantization format. Support ‘QDQ’ and ‘QOperator’.
type: str
default_value: QOperator
searchable_values: Conditional(parents: (‘approach’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([‘default’]))
- calibration_sampling_size¶
Number of calibration sample.
type: list | int
default_value: [100]
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
AppendPrePostProcessingOps¶
Add Pre/Post nodes to the input model
Input: ONNXModel
Output: ONNXModel
- pre¶
List of pre-processing commands to add.
type: List[str]
default_value: None
searchable_values: None
- post¶
List of post-processing commands to add.
type: List[str]
default_value: None
searchable_values: None
- tool_command¶
Composited tool commands to invoke.
type: str
default_value: None
searchable_values: None
- tool_command_args¶
Arguments to pass to tool command.
type: Dict[str, Any]
default_value: None
searchable_values: None
- target_opset¶
The version of the default (ai.onnx) opset to target.
type: int
default_value: 16
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
InsertBeamSearch¶
Insert Beam Search Op.
Input: OliveModel
Output: ONNXModel
- no_repeat_ngram_size¶
If set to int > 0, all ngrams of that size can only occur once.
type: int
default_value: 3
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
QuantizationAwareTraining¶
Run quantization aware training on PyTorch model.
Input: PyTorchModel
Output: PyTorchModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- train_data_dir¶
Directory of training data.
type: str
default_value: None
searchable_values: None
- val_data_dir¶
Directory of validation data.
type: str
default_value: None
searchable_values: None
- train_dataloader_func¶
Dataloader function to load training data from given train_data_dir with given train_batch_size.
type: Callable | str
default_value: None
searchable_values: None
- training_loop_func¶
Customized training loop function.
type: Callable | str
default_value: None
searchable_values: None
- ptl_module¶
LightningModule for PyTorch Lightning trainer. It is a way of encapsulating all the logic related to the training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html for more details.
type: Callable | str
default_value: None
searchable_values: None
- ptl_data_module¶
LightningDataModule for PyTorch Lightning trainer. It is a way of encapsulating all the data-related logic for training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html for more details.
type: Callable | str
default_value: None
searchable_values: None
- train_batch_size¶
Batch size for training.
type: int
default_value: None
searchable_values: None
- num_epochs¶
Maximum number of epochs for training.
type: int
default_value: None
searchable_values: None
- num_steps¶
Maximum number of steps for training.
type: int
default_value: -1
searchable_values: None
- do_validate¶
Whether perform one evaluation epoch over the validation set after training.
type: bool
default_value: False
searchable_values: None
- modules_to_fuse¶
List of list of module names to fuse.
type: List[List[str]]
default_value: None
searchable_values: None
- qconfig_func¶
Customized function to create a QConfig for QAT. Please refer to https://pytorch.org/docs/stable/generated/torch.quantization.qconfig.QConfig.html for details.
type: Callable | str
default_value: None
searchable_values: None
- logger¶
Logger for training.
type: pytorch_lightning.loggers.logger.Logger | Iterable[pytorch_lightning.loggers.logger.Logger] | Callable | bool
default_value: False
searchable_values: None
- gpus¶
Number of GPUs to use.
type: int
default_value: None
searchable_values: None
- seed¶
Random seed for training.
type: int
default_value: None
searchable_values: None
- checkpoint_path¶
Path to save checkpoints.
type: str
default_value: None
searchable_values: None
OpenVINOConversion¶
Converts PyTorch, ONNX or TensorFlow Model to OpenVino Model.
Input: PyTorchModel | ONNXModel
Output: OpenVINOModel
- input¶
Input can be set by passing a list of tuples. Each tuple should contain input name and optionally input type or input shape.
type: List[Tuple]
default_value: None
searchable_values: None
- input_shape¶
Input shape(s) that should be fed to an input node(s) of the model. Shape is defined as a comma-separated list of integer numbers enclosed in parentheses or square brackets, for example [1,3,227,227].
type: List[int]
default_value: None
searchable_values: None
- extra_config¶
Extra configurations for OpenVINO model conversion. extra_config can be set by passing a dictionary where key is the parameter name, and the value is the parameter value. Please check ‘mo’ command usage instruction for available parameters: https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html
type: Dict
default_value: None
searchable_values: None
OpenVINOQuantization¶
Post-training quantization for OpenVINO model. Please refer to https://docs.openvino.ai/latest/pot_introduction.html for more details.
Input: OpenVINOModel
Output: OpenVINOModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- engine_config¶
Specific config for openvino.tools.pot.IEEngine. ‘engine_config’ can be set by passing a dictonary, for example engine_config: {‘device’: ‘CPU’}
type: Dict
required: True
- dataloader_func¶
A callable function or a str of the function name from ‘user_script’ for the instance of the dataloader.
type: Callable | str
default_value: None
searchable_values: None
- data_dir¶
Dataset path. ‘data_dir’ can be by a str or Pathlib.Path.
type: pathlib.Path | str
default_value: None
searchable_values: None
- batch_size¶
Batch size for the dataloader.
type: int
default_value: 1
searchable_values: None
- metric_func¶
A callable function or a str of the function name from ‘user_script’ for Metric instance to calculate the accuracy metric of the model.
type: Callable | str
default_value: None
searchable_values: None
- algorithms¶
A list defining optimization algorithms and their parameters included in the optimization pipeline. The order in which they are applied to the model in the optimization pipeline is determined by the order in the list. example: algorithms: [{‘name’: ‘DefaultQuantization’, ‘params’: {‘preset’: ‘performance’, ‘stat_subset_size’: 500},}]
type: List[Dict]
required: True
SNPEConversion¶
Convert ONNX or TensorFlow model to SNPE DLC. Uses snpe-tensorflow-to-dlc or snpe-onnx-to-dlc tools from the SNPE SDK.
Input: ONNXModel | TensorFlowModel
Output: SNPEModel
- input_names¶
List of input names.
type: List[str]
required: True
- input_shapes¶
List of input shapes. Must be the same length as input_names.
type: List[List[int]]
required: True
- output_names¶
List of output names.
type: List[str]
required: True
- output_shapes¶
List of output shapes. Must be the same length as output_names.
type: List[List[int]]
required: True
- input_types¶
List of input types. If not None, it must be a list of the same length as input_names. List members can be None to use default value. Refer to olive.snpe.constants.InputType for valid values.
type: List[str | None]
default_value: None
searchable_values: None
- input_layouts¶
List of input layouts. If not None, it must be a list of the same length as input_names. List members can be None to use infered value. Refer to olive.snpe.constants.InputLayout for valid values.
type: List[str | None]
default_value: None
searchable_values: None
- extra_args¶
Extra arguments to pass to snpe conversion tool. Refer to snpe-onnx-to-dlc and snpe-tensorflow-to-dlc at https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.
type: str
default_value: None
searchable_values: None
SNPEQuantization¶
Quantize SNPE model. Uses snpe-dlc-quantize tool from the SNPE SDK.
Input: SNPEModel
Output: SNPEModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- data_dir¶
Path to the data directory.
type: str
required: True
- dataloader_func¶
Function or function name to create dataloader for quantization. Function should take data directory as an argument and return a olive.snpe.SNPEDataLoader object.
type: Callable[[str], olive.snpe.data_loader.SNPEDataLoader] | str
required: True
- use_enhanced_quantizer¶
Use the enhanced quantizer feature when quantizing the model. Uses an algorithm to determine optimal range instead of min and max range of data. It can be useful for quantizing models that have long tails in the distribution of the data being quantized.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- enable_htp¶
Pack HTP information in quantized DLC.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- htp_socs¶
List of SoCs to generate HTP Offline cache for.
type: List[str]
default_value: None
searchable_values: None
- extra_args¶
Extra arguments to pass to snpe conversion tool. Refer to https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html#tools_snpe-dlc-quantize for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.
type: str
default_value: None
searchable_values: None
SNPEtoONNXConversion¶
Convert a SNPE DLC to ONNX to use with SNPE Execution Provider. Creates a ONNX graph with the SNPE DLC as a node.
Input: SNPEModel
Output: ONNXModel
- target_device¶
Target device for the ONNX model. Refer to olive.snpe.SNPEDevice for valid values.
type: str
default_value: cpu
searchable_values: None
- target_opset¶
Target ONNX opset version.
type: int
default_value: 12
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
VitisAIQuantization¶
Quantize ONNX model with onnxruntime where we can search for best parameters for vai_q_onnx quantization at same time.
Input: ONNXModel
Output: ONNXModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- quant_mode¶
Onnx Quantization mode. , ‘static’ for vitis ai quantization.
type: str
default_value: static
searchable_values: Categorical([‘static’])
- data_dir¶
Path to the directory containing the dataset.
type: pathlib.Path | str
default_value: None
searchable_values: None
- batch_size¶
Batch size for calibration, required.
type: int
default_value: 1
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required’
type: Callable | str
required: True
- weight_type¶
Data type for quantizing weights which is used in vai_q_onnx quantization. ‘QInt8’ for signed 8-bit integer,
type: str
default_value: QInt8
searchable_values: Categorical([‘QInt8’])
- input_nodes¶
Start node that needs quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- output_nodes¶
End node that needs quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- optimize_model¶
Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- use_external_data_format¶
option used for large size (>2GB) model. Set to False by default.
type: bool
default_value: False
searchable_values: None
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default_value: True
searchable_values: Categorical([True, False])
- calibrate_method¶
Current calibration methods supported are NonOverflow and MinMSE, Please use NonOverflow or MinMSE as options.
type: str
default_value: MinMSE
searchable_values: Categorical([‘NonOverflow’, ‘MinMSE’])
- quant_format¶
QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.
type: str
default_value: QDQ
searchable_values: Categorical([‘QDQ’])
- activation_type¶
Quantization data type of activation.
type: str
default_value: QInt8
searchable_values: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))
- ActivationSymmetric¶
symmetrize calibration data for activations
type: bool
default_value: True
searchable_values: None
- WeightSymmetric¶
symmetrize calibration data for weights
type: bool
default_value: True
searchable_values: None
- AddQDQPairToWeight¶
remains floating-point weight and inserts both QuantizeLinear/DeQuantizeLinear nodes to weight
type: bool
default_value: True
searchable_values: None
- extra_options¶
Key value pair dictionary for extra_options in quantization. If an option is one of [‘ActivationSymmetric’, ‘WeightSymmetric’, ‘AddQDQPairToWeight’], it will be overwritten by the corresponding config parameter value.
type: dict
default_value: None
searchable_values: None
OptimumConversion¶
Convert a Optimum model to ONNX model using the Optimum export function.
Input: OptimumModel
Output: ONNXModel | CompositeOnnxModel
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- target_opset¶
The version of the default (ai.onnx) opset to target.
type: int
default_value: 14
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
OptimumMerging¶
Merges a decoder_model with its decoder_with_past_model via the Optimum library.
Input: CompositeOnnxModel
Output: ONNXModel | CompositeOnnxModel
- execution_provider¶
Target execution provider. This parameter will be removed when accelerators/targets are visible to passes.
type: str
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None