Passes¶
The following passes are available in Olive.
Each pass is followed by a description of the pass and a list of the pass’s configuration options.
OnnxConversion¶
Convert a PyTorch model to ONNX model using torch.onnx.export.
Input: PyTorchModel
Output: ONNXModel
- script_dir¶
Directory containing user script dependencies.
type: str
default: None
default_search: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default: None
default_search: None
- input_names¶
List of input names.
type: List[str]
required: True
- input_shapes¶
List of input shapes. Must be provided if input_tensor_func is not provided. It is used to create dummy inputs for the model during onnx export.
type: List[List[int]]
default: None
default_search: None
- input_types¶
List of input types. If provided, must be the same length as input_shapes. Otherwise, defaults to float32 for all inputs. Used with input_shapes to create dummy inputs for the model during onnx export.
type: List[str]
default: None
default_search: None
- input_tensor_func¶
Function (no input) to create dummy inputs for the model. Can be a function (local use) or name of a function to be imported from user script. If provided, input_shapes and input_types will be ignored. Refer to ‘args’ at https://pytorch.org/docs/stable/onnx.html#torch.onnx.export for more details.
type: Callable | str
default: None
default_search: None
- output_names¶
List of output names.
type: List[str]
required: True
- dynamic_axes¶
Dynamic axes for the model. Refer to ‘dynamic_axes’ at https://pytorch.org/docs/stable/onnx.html#torch.onnx.export for more details.
type: dict
default: None
default_search: None
- target_opset¶
The version of the default (ai.onnx) opset to target.
type: int
default: 14
default_search: None
OnnxModelOptimizer¶
Optimize ONNX model by fusing nodes.
Input: ONNXModel
Output: ONNXModel
OrtTransformersOptimization¶
Optimize transformer based models in scenarios where ONNX Runtime does not apply the optimization at load time. It is based on onnxruntime.transformers.optimizer.
Input: ONNXModel
Output: ONNXModel
- model_type¶
Transformer based model type, includig bert (exported by PyTorch), gpt2 (exported by PyTorch), bert_tf (BERT exported by tf2onnx), bert_keras (BERT exported by keras2onnx).
type: str
required: True
- num_heads¶
Number of attention heads.
type: int
default: 0
default_search: None
Number of hidden nodes.
type: int
default: 0
default_search: None
- optimization_options¶
Optimization options that turn on/off some fusions.
type: Any
default: None
default_search: None
- opt_level¶
Graph optimization level of Onnx Runtime: 0 - disable all (default), 1 - basic, 2 - extended, 99 - all.
type: Any
default: None
default_search: None
- use_gpu¶
Flag for GPU inference.
type: bool
default: False
default_search: None
- only_onnxruntime¶
Whether only use onnxruntime to optimize model, and no python fusion.
type: bool
default: False
default_search: None
- float16¶
Whether half-precision float will be used.
type: bool
default: False
default_search: None
- input_int32¶
Whether int32 tensors will be used as input.
type: bool
default: False
default_search: None
- use_external_data_format¶
Whether use external data format to store large model (>2GB)
type: bool
default: False
default_search: None
OrtPerfTuning¶
Optimize ONNX Runtime inference settings.
Input: ONNXModel
Output: ONNXModel
- script_dir¶
Directory containing user script dependencies.
type: str
default: None
default_search: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default: None
default_search: None
- data_dir¶
Directory of sample inference data.
type: pathlib.Path | str
default: None
default_search: None
- dataloader_func¶
Dataloader function to load data from given data_dir with given batch size.
type: Callable | str
required: True
- batch_size¶
Batch size for inference.
type: int
required: True
- device¶
Device selected for tuning process.
type: str
default: cpu
default_search: None
- cpu_cores¶
CPU cores used for thread tuning.
type: int
default: None
default_search: None
- io_bind¶
Whether enable IOBingding for ONNX Runimte infernece.
type: bool | List[bool]
default: False
default_search: None
- providers_list¶
Execution providers framework list to execute the ONNX models.
type: list
default: None
default_search: None
- execution_mode_list¶
Parallelism list between operators.
type: list
default: None
default_search: None
- opt_level_list¶
Optimization level list for ONNX model.
type: list
default: None
default_search: None
- trt_fp16_enable¶
Whether enable FP16 mode for TensorRT execution provider.
type: bool
default: False
default_search: None
- intra_thread_num_list¶
List of intra thread number for test.
type: list
default: [None]
default_search: None
- inter_thread_num_list¶
List of inter thread number for test.
type: list
default: [None]
default_search: None
- extra_session_config¶
Extra customized session options during tuning process.
type: Dict[str, Any]
default: None
default_search: None
OnnxDynamicQuantization¶
ONNX Dynamic Quantization Pass
Input: ONNXModel
Output: ONNXModel
- quant_mode¶
dynamic quantization mode
type: str
default: dynamic
default_search: None
- weight_type¶
Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.
type: str
default: QInt8
default_search: Categorical([‘QInt8’, ‘QUInt8’])
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default: None
default_search: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default: None
default_search: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default: None
default_search: None
- per_channel¶
Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default: False
default_search: Categorical([True, False])
- reduce_range¶
Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default: False
default_search: Categorical([True, False])
- optimize_model¶
Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.
type: bool
default: False
default_search: Categorical([True, False])
- use_external_data_format¶
option used for large size (>2GB) model. Set to False by default.
type: bool
default: False
default_search: None
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default: True
default_search: Categorical([True, False])
OnnxStaticQuantization¶
ONNX Static Quantization Pass
Input: ONNXModel
Output: ONNXModel
- script_dir¶
Directory containing user script dependencies.
type: str
default: None
default_search: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default: None
default_search: None
- quant_mode¶
static quantization mode
type: str
default: static
default_search: None
- weight_type¶
Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.
type: str
default: QInt8
default_search: Categorical([‘QInt8’, ‘QUInt8’])
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default: None
default_search: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default: None
default_search: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default: None
default_search: None
- per_channel¶
Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default: False
default_search: Categorical([True, False])
- reduce_range¶
Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default: False
default_search: Categorical([True, False])
- optimize_model¶
Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.
type: bool
default: False
default_search: Categorical([True, False])
- use_external_data_format¶
option used for large size (>2GB) model. Set to False by default.
type: bool
default: False
default_search: None
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default: True
default_search: Categorical([True, False])
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.
type: pathlib.Path | str
default: None
default_search: None
- batch_size¶
Batch size for calibration, required if quant_mode is ‘static’.
type: int
default: 1
default_search: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’
type: Callable | str
required: True
- calibrate_method¶
Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.
type: str
default: MinMax
default_search: Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])
- quant_format¶
QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.
type: str
default: QDQ
default_search: Categorical([‘QOperator’, ‘QDQ’])
- activation_type¶
Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection
type: str
default: QInt8
default_search: Conditional(parents: (‘quant_format’,), support: {(‘QDQ’,): Categorical([‘QInt8’, ‘QUInt8’]), (‘QOperator’,): Categorical([‘QInt8’])}, default: Categorical([None]))
OnnxQuantization¶
Quantize ONNX model with onnxruntime where we can search for best parameters for static/dynamic quantization at same time.
Input: ONNXModel
Output: ONNXModel
- script_dir¶
Directory containing user script dependencies.
type: str
default: None
default_search: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default: None
default_search: None
- quant_mode¶
Onnx Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.
type: str
default: static
default_search: Categorical([‘dynamic’, ‘static’])
- weight_type¶
Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.
type: str
default: QInt8
default_search: Categorical([‘QInt8’, ‘QUInt8’])
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default: None
default_search: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default: None
default_search: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default: None
default_search: None
- per_channel¶
Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default: False
default_search: Categorical([True, False])
- reduce_range¶
Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default: False
default_search: Categorical([True, False])
- optimize_model¶
Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.
type: bool
default: False
default_search: Categorical([True, False])
- use_external_data_format¶
option used for large size (>2GB) model. Set to False by default.
type: bool
default: False
default_search: None
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default: True
default_search: Categorical([True, False])
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’.
type: pathlib.Path | str
default: None
default_search: None
- batch_size¶
Batch size for calibration, required if quant_mode is ‘static’.
type: int
default: 1
default_search: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’
type: Callable | str
required: True
- calibrate_method¶
Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options.
type: str
default: MinMax
default_search: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])}, default: Categorical([‘Invalid’]))
- quant_format¶
QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.
type: str
default: QDQ
default_search: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([‘Invalid’]))
- activation_type¶
Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection
type: str
default: QInt8
default_search: Conditional(parents: (‘quant_mode’, ‘quant_format’), support: {(‘static’, ‘QDQ’): Categorical([‘QInt8’, ‘QUInt8’]), (‘static’, ‘QOperator’): Categorical([‘QInt8’])}, default: Categorical([‘Invalid’]))
QuantizationAwareTraining¶
Run quantization aware training on PyTorch model.
Input: PyTorchModel
Output: PyTorchModel
- script_dir¶
Directory containing user script dependencies.
type: str
default: None
default_search: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default: None
default_search: None
- train_data_dir¶
Directory of training data.
type: str
default: None
default_search: None
- val_data_dir¶
Directory of validation data.
type: str
default: None
default_search: None
- train_dataloader_func¶
Dataloader function to load training data from given train_data_dir with given train_batch_size.
type: Callable | str
default: None
default_search: None
- training_loop_func¶
Customized training loop function.
type: Callable | str
default: None
default_search: None
- ptl_module¶
LightningModule for PyTorch Lightning trainer. It is a way of encapsulating all the logic related to the training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html for more details.
type: Callable | str
default: None
default_search: None
- ptl_data_module¶
LightningDataModule for PyTorch Lightning trainer. It is a way of encapsulating all the data-related logic for training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html for more details.
type: Callable | str
default: None
default_search: None
- train_batch_size¶
Batch size for training.
type: int
default: None
default_search: None
- num_epochs¶
Maximum number of epochs for training.
type: int
default: None
default_search: None
- num_steps¶
Maximum number of steps for training.
type: int
default: -1
default_search: None
- do_validate¶
Whether perform one evaluation epoch over the validation set after training.
type: bool
default: False
default_search: None
- modules_to_fuse¶
List of list of module names to fuse.
type: List[List[str]]
default: None
default_search: None
- input_shapes¶
List ot input shapes. It is used to create dummy input for PyTorch model tracing.
type: List[List[int]]
required: True
- input_types¶
List ot input types. It is used to create dummy input for PyTorch model tracing.
type: List[str]
default: None
default_search: None
- qconfig_func¶
Customized function to create a QConfig for QAT. Please refer to https://pytorch.org/docs/stable/generated/torch.quantization.qconfig.QConfig.html for details.
type: Callable | str
default: None
default_search: None
- logger¶
Logger for training.
type: pytorch_lightning.loggers.logger.Logger | Iterable[pytorch_lightning.loggers.logger.Logger] | Callable | bool
default: False
default_search: None
- gpus¶
Number of GPUs to use.
type: int
default: None
default_search: None
- seed¶
Random seed for training.
type: int
default: None
default_search: None
OpenVINOConversion¶
Converts PyTorch, ONNX or TensorFlow Model to OpenVino Model.
Input: PyTorchModel | ONNXModel
Output: OpenVINOModel
- input¶
Input can be set by passing a list of tuples. Each tuple should contain input name and optionally input type or input shape.
type: List[Tuple]
default: None
default_search: None
- input_shape¶
Input shape(s) that should be fed to an input node(s) of the model. Shape is defined as a comma-separated list of integer numbers enclosed in parentheses or square brackets, for example [1,3,227,227].
type: List[int]
default: None
default_search: None
- extra_config¶
Extra configurations for OpenVINO model conversion. extra_config can be set by passing a dictionary where key is the parameter name, and the value is the parameter value. Please check ‘mo’ command usage instruction for available parameters: https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html
type: Dict
default: None
default_search: None
OpenVINOQuantization¶
Post-training quantization for OpenVINO model. Please refer to https://docs.openvino.ai/latest/pot_introduction.html for more details.
Input: OpenVINOModel
Output: OpenVINOModel
- script_dir¶
Directory containing user script dependencies.
type: str
default: None
default_search: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default: None
default_search: None
- engine_config¶
Specific config for openvino.tools.pot.IEEngine. ‘engine_config’ can be set by passing a dictonary, for example engine_config = {‘device’: ‘CPU’}
type: Dict
required: True
- dataloader_func¶
A callable function or a str of the function name from ‘user_script’ for the instance of the dataloader.
type: Callable | str
default: None
default_search: None
- data_dir¶
Dataset path. ‘data_dir’ can be by a str or Pathlib.Path.
type: pathlib.Path | str
default: None
default_search: None
- batch_size¶
Batch size for the dataloader.
type: int
default: 1
default_search: None
- metric_func¶
A callable function or a str of the function name from ‘user_script’ for Metric instance to calculate the accuracy metric of the model.
type: Callable | str
default: None
default_search: None
- algorithms¶
A list defining optimization algorithms and their parameters included in the optimization pipeline. The order in which they are applied to the model in the optimization pipeline is determined by the order in the list. example: algorithms = [{‘name’: ‘DefaultQuantization’, ‘params’: {‘preset’: ‘performance’, ‘stat_subset_size’: 500},}]
type: List[Dict]
required: True
SNPEConversion¶
Convert ONNX or TensorFlow model to SNPE DLC. Uses snpe-tensorflow-to-dlc or snpe-onnx-to-dlc tools from the SNPE SDK.
Input: ONNXModel | TensorFlowModel
Output: SNPEModel
- input_names¶
List of input names.
type: List[str]
required: True
- input_shapes¶
List of input shapes. Must be the same length as input_names.
type: List[List[int]]
required: True
- output_names¶
List of output names.
type: List[str]
required: True
- output_shapes¶
List of output shapes. Must be the same length as output_names.
type: List[List[int]]
required: True
- input_types¶
List of input types. If not None, it must be a list of the same length as input_names. List members can be None to use default value. Refer to olive.snpe.constants.InputType for valid values.
type: List[str | None]
default: None
default_search: None
- input_layouts¶
List of input layouts. If not None, it must be a list of the same length as input_names. List members can be None to use infered value. Refer to olive.snpe.constants.InputLayout for valid values.
type: List[str | None]
default: None
default_search: None
- extra_args¶
Extra arguments to pass to snpe conversion tool. Refer to snpe-onnx-to-dlc and snpe-tensorflow-to-dlc at https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.
type: str
default: None
default_search: None
SNPEQuantization¶
Quantize SNPE model. Uses snpe-dlc-quantize tool from the SNPE SDK.
Input: SNPEModel
Output: SNPEModel
- script_dir¶
Directory containing user script dependencies.
type: str
default: None
default_search: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default: None
default_search: None
- data_dir¶
Path to the data directory.
type: str
required: True
- dataloader_func¶
Function or function name to create dataloader for quantization. Function should take data directory as an argument and return a olive.snpe.SNPEDataLoader object.
type: Callable[[str], olive.snpe.data_loader.SNPEDataLoader] | str
required: True
- use_enhanced_quantizer¶
Use the enhanced quantizer feature when quantizing the model. Uses an algorithm to determine optimal range instead of min and max range of data. It can be useful for quantizing models that have long tails in the distribution of the data being quantized.
type: bool
default: False
default_search: Categorical([True, False])
- enable_htp¶
Pack HTP information in quantized DLC.
type: bool
default: False
default_search: Categorical([True, False])
- htp_socs¶
List of SoCs to generate HTP Offline cache for.
type: List[str]
default: None
default_search: None
- extra_args¶
Extra arguments to pass to snpe conversion tool. Refer to https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html#tools_snpe-dlc-quantize for more additional arguments. Must be a dictionary of the form: {‘arg_name’: ‘arg_value’}.
type: str
default: None
default_search: None
SNPEtoONNXConversion¶
Convert a SNPE DLC to ONNX to use with SNPE Execution Provider. Creates a ONNX graph with the SNPE DLC as a node.
Input: SNPEModel
Output: ONNXModel
- target_device¶
Target device for the ONNX model. Refer to olive.snpe.SNPEDevice for valid values.
type: str
default: cpu
default_search: None
- target_opset¶
Target ONNX opset version.
type: int
default: 12
default_search: None