Passes¶
The following passes are available in Olive.
Each pass is followed by a description of the pass and a list of the pass’s configuration options.
OnnxConversion¶
Convert a PyTorch model to ONNX model using torch.onnx.export on CPU.
Input: handler.pytorch.PyTorchModelHandler
Output: handler.composite.CompositeModelHandler | handler.onnx.DistributedOnnxModelHandler | handler.onnx.ONNXModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- target_opset¶
The version of the default (ai.onnx) opset to target.
type: int
default_value: 13
searchable_values: None
- use_dynamo_exporter¶
Whether to use dynamo_export API to export ONNX model.
type: bool
default_value: False
searchable_values: None
- device¶
The device to use for conversion, e.g., ‘cuda’ or ‘cpu’. If not specified, will use ‘cpu’ for PyTorch model and ‘cuda’ for DistributedPyTorchModel.
type: str
default_value: None
searchable_values: None
- torch_dtype¶
The dtype to cast the model to before conversion, e.g., ‘float32’ or ‘float16’. If not specified, will use the model as is.
type: str
default_value: None
searchable_values: None
- parallel_jobs¶
Number of parallel jobs. Defaulted to number of CPUs. Set it to 0 to disable.
type: int
default_value: None
searchable_values: None
- merge_components¶
Whether to merge the converted components.
type: bool
default_value: False
searchable_values: None
- merge_adapter_weights¶
Whether to merge adapter weights before conversion. After merging, the model structure is consistent with base model. That is useful if you cannot run conversion for some fine-tuned models with adapter weights
type: bool
default_value: False
searchable_values: None
- save_metadata_for_token_generation¶
Whether to save metadata for token generation or not. Includes config.json, generation_config.json, and tokenizer related files.
type: bool
default_value: False
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
OnnxOpVersionConversion¶
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- target_opset¶
The version of the default (ai.onnx) opset to target. Default: latest opset version.
type: int
default_value: 21
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
OnnxModelOptimizer¶
Optimize ONNX model by fusing nodes.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
OrtTransformersOptimization¶
Use ONNX Transformer Optimizer to optimize transformer based models. Optimize transformer based models in scenarios where ONNX Runtime does not apply the optimization at load time. It is based on onnxruntime.transformers.optimizer.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- model_type¶
Transformer based model type, including bert (exported by PyTorch), gpt2 (exported by PyTorch), bert_tf (BERT exported by tf2onnx), bert_keras (BERT exported by keras2onnx), and unet/vae/clip (stable diffusion).
type: str
default_value: None
searchable_values: None
- num_heads¶
Number of attention heads.
type: int
default_value: 0
searchable_values: None
- num_key_value_heads¶
Number of key/value attention heads.
type: int
default_value: 0
searchable_values: None
Number of hidden nodes.
type: int
default_value: 0
searchable_values: None
- optimization_options¶
Optimization options that turn on/off some fusions.
type: Dict[str, Any] | onnxruntime.transformers.fusion_options.FusionOptions
default_value: None
searchable_values: None
- opt_level¶
Graph optimization level of Onnx Runtime: 0 - disable all (default), 1 - basic, 2 - extended, 99 - all.
type: Any
default_value: None
searchable_values: Categorical([0, 1, 2, 99])
- use_gpu¶
Flag for GPU inference.
type: bool
default_value: False
searchable_values: None
- only_onnxruntime¶
Whether only use onnxruntime to optimize model, and no python fusion. Disable some optimizers that might cause failure in symbolic shape inference or attention fusion, when opt_level > 1.
type: bool
default_value: False
searchable_values: Conditional(parents: (‘opt_level’,), support: {(2,): Categorical([False]), (99,): Categorical([False])}, default: Categorical([True, False]))
- float16¶
Whether half-precision float will be used.
type: bool
default_value: False
searchable_values: None
- keep_io_types¶
Keep input and output tensors in their original data type. Only used when float16 is True.
type: bool
default_value: True
searchable_values: None
- force_fp32_ops¶
Operators that are forced to run in float32. Only used when float16 is True.
type: List[str]
default_value: None
searchable_values: None
- force_fp32_nodes¶
Nodes that are forced to run in float32. Only used when float16 is True.
type: List[str]
default_value: None
searchable_values: None
- force_fp16_inputs¶
Force the conversion of the inputs of some operators to float16, even if ‘convert_float_to_float16` tool prefers it to keep them in float32.
type: Dict[str, List[int]]
default_value: None
searchable_values: None
- use_gqa¶
Replace MultiHeadAttention with GroupQueryAttention. True is only supported when float16 is True.
type: bool
default_value: False
searchable_values: None
- input_int32¶
Whether int32 tensors will be used as input.
type: bool
default_value: False
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
OrtPerfTuning¶
Optimize ONNX Runtime inference settings.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- data_dir¶
Directory of sample inference data.
type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None
default_value: None
searchable_values: None
- dataloader_func¶
Dataloader function to load data from given data_dir with given batch size.
type: Callable | str
default_value: None
searchable_values: None
- dataloader_func_kwargs¶
Keyword arguments for dataloader_func.
type: Dict[str, Any]
default_value: None
searchable_values: None
- batch_size¶
Batch size for inference.
type: int
default_value: None
searchable_values: None
- data_config¶
Data config to load data for computing latency.
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- input_names¶
Input names list for ONNX model.
type: list
default_value: None
searchable_values: None
- input_shapes¶
Input shapes list for ONNX model.
type: list
default_value: None
searchable_values: None
- input_types¶
Input types list for ONNX model.
type: list
default_value: None
searchable_values: None
- device¶
Device selected for tuning process.
type: str
default_value: cpu
searchable_values: None
- cpu_cores¶
CPU cores used for thread tuning.
type: int
default_value: None
searchable_values: None
- io_bind¶
Whether enable IOBinding Search for ONNX Runtime inference.
type: bool
default_value: False
searchable_values: None
- enable_cuda_graph¶
Whether enable CUDA Graph for CUDA execution provider.
type: bool
default_value: False
searchable_values: None
- providers_list¶
Execution providers framework list to execute the ONNX models.
type: list
default_value: [‘CPUExecutionProvider’]
searchable_values: None
- execution_mode_list¶
Parallelism list between operators.
type: list
default_value: None
searchable_values: None
- opt_level_list¶
Optimization level list for ONNX model.
type: list
default_value: None
searchable_values: None
- trt_fp16_enable¶
Whether enable FP16 mode for TensorRT execution provider.
type: bool
default_value: False
searchable_values: None
- intra_thread_num_list¶
List of intra thread number for test.
type: list
default_value: [None]
searchable_values: None
- inter_thread_num_list¶
List of inter thread number for test.
type: list
default_value: [None]
searchable_values: None
- extra_session_config¶
Extra customized session options during tuning process.
type: Dict[str, Any]
default_value: None
searchable_values: None
- force_evaluate_other_eps¶
Whether force to evaluate all execution providers which are different with the associated execution provider.
type: bool
default_value: False
searchable_values: None
- enable_profiling¶
Whether enable profiling for ONNX Runtime inference.
type: bool
default_value: False
searchable_values: None
OnnxFloatToFloat16¶
Converts a model to float16. It is based on onnxconverter-common.convert_float_to_float16. See https://onnxruntime.ai/docs/performance/model-optimizations/float16.html#float16-conversion
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- min_positive_val¶
Constant values will be clipped against this value
type: float
default_value: 1e-07
searchable_values: None
- max_finite_val¶
Constant values will be clipped against this value
type: float
default_value: 10000.0
searchable_values: None
- keep_io_types¶
Whether model inputs/outputs should be left as float32
type: bool
default_value: False
searchable_values: None
- disable_shape_infer¶
Skips running onnx shape/type inference.
type: bool
default_value: False
searchable_values: None
- op_block_list¶
List of op types to leave as float32
type: List[str]
default_value: None
searchable_values: None
- node_block_list¶
List of node names to leave as float32
type: List[str]
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
OrtMixedPrecision¶
Convert model to mixed precision.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- op_block_list¶
List of op types to leave as float32
type: List[str]
default_value: [‘SimplifiedLayerNormalization’, ‘SkipSimplifiedLayerNormalization’, ‘Relu’, ‘Add’]
searchable_values: None
- atol¶
Absolute tolerance for checking float16 conversion
type: float
default_value: 1e-06
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
QNNPreprocess¶
Preprocess ONNX model for quantization targeting QNN Execution Provider.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- fuse_layernorm¶
Whether to fuse ReduceMean sequence into a single LayerNormalization node.
type: bool
default_value: False
searchable_values: None
- inputs_to_make_channel_last¶
inputs_to_make_channel_last: List of graph input names to transpose to be “channel-last”. For example, if “input0” originally has the shape (N, C, D1, D2, …, Dn), the resulting model will change input0’s shape to (N, D1, D2, …, Dn, C) and add a transpose node after it. Original: input0 (N, C, D1, D2, …, Dn) –> <Nodes> Updated: input0 (N, D1, D2, …, Dn, C) –> Transpose –> input0_chanfirst (N, C, D1, D2, …, Dn) –> <Nodes> This can potentially improve inference latency for QDQ models running on QNN EP because the additional transpose node may allow other transpose nodes inserted during ORT layout transformation to cancel out.
type: list
default_value: None
searchable_values: None
- outputs_to_make_channel_last¶
List of graph output names to transpose to be “channel-last”. For example, if “output0” originally has the shape (N, C, D1, D2, …, Dn), the resulting model will change output0’s shape to (N, D1, D2, …, Dn, C) and add a transpose node before it. Original: <Nodes> –> output0 (N, C, D1, D2, …, Dn) Updated: <Nodes> –> output0_chanfirst (N, C, D1, D2, …, Dn) –> Transpose –> output0 (N, D1, D2, …, Dn, C) This can potentially improve inference latency for QDQ models running on QNN EP because the additional transpose node may allow other transpose nodes inserted during ORT layout transformation to cancel out.
type: list
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
OnnxDynamicQuantization¶
ONNX Dynamic Quantization Pass.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- quant_mode¶
dynamic quantization mode
type: str
default_value: dynamic
searchable_values: None
- weight_type¶
Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.
type: str
default_value: QInt8
searchable_values: Categorical([‘QInt8’, ‘QUInt8’])
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- per_channel¶
Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- reduce_range¶
Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default_value: True
searchable_values: Categorical([True, False])
- extra.Sigmoid.nnapi¶
type: bool
default_value: False
searchable_values: None
- ActivationSymmetric¶
symmetrize calibration data for activations
type: bool
default_value: False
searchable_values: None
- WeightSymmetric¶
symmetrize calibration data for weights
type: bool
default_value: True
searchable_values: None
- EnableSubgraph¶
If enabled, subgraph will be quantized. Dynamic mode currently is supported.
type: bool
default_value: False
searchable_values: None
- ForceQuantizeNoInputCheck¶
By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.
type: bool
default_value: False
searchable_values: None
- MatMulConstBOnly¶
If enabled, only MatMul with const B will be quantized.
type: bool
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: None
- extra_options¶
Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.
type: dict
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
OnnxStaticQuantization¶
ONNX Static Quantization Pass.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- quant_mode¶
static quantization mode
type: str
default_value: static
searchable_values: None
- weight_type¶
Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.
type: str
default_value: QInt8
searchable_values: Categorical([‘QInt8’, ‘QUInt8’])
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- per_channel¶
Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- reduce_range¶
Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default_value: True
searchable_values: Categorical([True, False])
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’ and dataloader_func is provided.
type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None
default_value: None
searchable_values: None
- batch_size¶
Batch size for calibration, only used if dataloader_func is provided.
type: int
default_value: 1
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’ and data_config is None.
type: Callable | str
default_value: None
searchable_values: None
- dataloader_func_kwargs¶
Keyword arguments for dataloader_func.
type: Dict[str, Any]
default_value: None
searchable_values: None
- data_config¶
Data config for calibration, required if quant_mode is ‘static’ and dataloader_func is None.
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- calibrate_method¶
Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options. Percentile is not supported for onnxruntime==1.16.0, please avoid to set/search it.
type: str
default_value: MinMax
searchable_values: Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])
- quant_format¶
QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.
type: str
default_value: QDQ
searchable_values: Categorical([‘QOperator’, ‘QDQ’])
- activation_type¶
Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection
type: str
default_value: QInt8
searchable_values: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))
- prepare_qnn_config¶
Whether to generate a suitable quantization config for the input model. Should be set to True if model is targeted for QNN EP.
type: bool
default_value: False
searchable_values: None
- extra.Sigmoid.nnapi¶
type: bool
default_value: False
searchable_values: None
- ActivationSymmetric¶
symmetrize calibration data for activations
type: bool
default_value: False
searchable_values: None
- WeightSymmetric¶
symmetrize calibration data for weights
type: bool
default_value: True
searchable_values: None
- EnableSubgraph¶
If enabled, subgraph will be quantized. Dynamic mode currently is supported.
type: bool
default_value: False
searchable_values: None
- ForceQuantizeNoInputCheck¶
By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.
type: bool
default_value: False
searchable_values: None
- MatMulConstBOnly¶
If enabled, only MatMul with const B will be quantized.
type: bool
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: None
- extra_options¶
Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.
type: dict
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
OnnxQuantization¶
Quantize ONNX model with static/dynamic quantization techniques.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- quant_mode¶
Onnx Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization.
type: str
default_value: static
searchable_values: Categorical([‘dynamic’, ‘static’])
- weight_type¶
Data type for quantizing weights which is used both in dynamic and static quantization. ‘QInt8’ for signed 8-bit integer, ‘QUInt8’ for unsigned 8-bit integer.
type: str
default_value: QInt8
searchable_values: Categorical([‘QInt8’, ‘QUInt8’])
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- per_channel¶
Quantize weights per channel. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- reduce_range¶
Quantize weights with 7-bits. It may improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode. Tips: When to use reduce_range and per-channel quantization: https://onnxruntime.ai/docs/performance/quantization.html#when-to-use-reduce-range-and-per-channel-quantization
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default_value: True
searchable_values: Categorical([True, False])
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if quant_mode is ‘static’ and dataloader_func is provided.
type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None
default_value: None
searchable_values: None
- batch_size¶
Batch size for calibration, only used if dataloader_func is provided.
type: int
default_value: 1
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if quant_mode is ‘static’ and data_config is None.
type: Callable | str
default_value: None
searchable_values: None
- dataloader_func_kwargs¶
Keyword arguments for dataloader_func.
type: Dict[str, Any]
default_value: None
searchable_values: None
- data_config¶
Data config for calibration, required if quant_mode is ‘static’ and dataloader_func is None.
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- calibrate_method¶
Current calibration methods supported are MinMax and Entropy, Please use CalibrationMethod.MinMax or CalibrationMethod.Entropy as options. Percentile is not supported for onnxruntime==1.16.0, please avoid to set/search it.
type: str
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘MinMax’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘MinMax’, ‘Entropy’, ‘Percentile’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))
- quant_format¶
QOperator format quantizes the model with quantized operators directly. QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.
type: str
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QDQ’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: Conditional(parents: (‘quant_mode’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))
- activation_type¶
Quantization data type of activation. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for more details on data type selection
type: str
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): ‘QInt8’, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: Conditional(parents: (‘quant_mode’, ‘quant_format’, ‘weight_type’), support: {(‘static’, ‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘static’, ‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘static’, ‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>]))
- prepare_qnn_config¶
Whether to generate a suitable quantization config for the input model. Should be set to True if model is targeted for QNN EP.
type: bool
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘static’,): False, (‘dynamic’,): <SpecialParamValue.IGNORED: ‘OLIVE_IGNORED_PARAM_VALUE’>}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: None
- extra.Sigmoid.nnapi¶
type: bool
default_value: False
searchable_values: None
- ActivationSymmetric¶
symmetrize calibration data for activations
type: bool
default_value: False
searchable_values: None
- WeightSymmetric¶
symmetrize calibration data for weights
type: bool
default_value: True
searchable_values: None
- EnableSubgraph¶
If enabled, subgraph will be quantized. Dynamic mode currently is supported.
type: bool
default_value: False
searchable_values: None
- ForceQuantizeNoInputCheck¶
By default, some latent operators like maxpool, transpose, do not quantize if their input is not quantized already. Setting to True to force such operator always quantize input and so generate quantized output. Also the True behavior could be disabled per node using the nodes_to_exclude.
type: bool
default_value: False
searchable_values: None
- MatMulConstBOnly¶
If enabled, only MatMul with const B will be quantized.
type: bool
default_value: ConditionalDefault(parents: (‘quant_mode’,), support: {(‘dynamic’,): True, (‘static’,): False}, default: OLIVE_INVALID_PARAM_VALUE)
searchable_values: None
- extra_options¶
Key value pair dictionary for extra_options in quantization. Please refer to https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py for details about the supported options. If an option is one of [‘extra.Sigmoid.nnapi’, ‘ActivationSymmetric’, ‘WeightSymmetric’, ‘EnableSubgraph’, ‘ForceQuantizeNoInputCheck’, ‘MatMulConstBOnly’], it will be overwritten by the corresponding config parameter value.
type: dict
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
DynamicToFixedShape¶
Convert dynamic shape to fixed shape for ONNX model.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- dim_param¶
Symbolic parameter name. Provide dim_value if specified.
type: List[str]
default_value: None
searchable_values: None
- dim_value¶
Value to replace dim_param with in the model. Must be > 0.
type: List[int]
default_value: None
searchable_values: None
- input_name¶
Model input name to replace shape of. Provide input_shape if specified.
type: List[str]
default_value: None
searchable_values: None
- input_shape¶
Shape to use for input_shape. Provide comma separated list for the shape. All values must be > 0. e.g. [1,3,256,256]
type: List[List[int]]
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
IncDynamicQuantization¶
Intel® Neural Compressor Dynamic Quantization Pass.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- approach¶
dynamic quantization mode
type: str
default_value: dynamic
searchable_values: None
- device¶
Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.
type: str
default_value: cpu
searchable_values: None
- backend¶
Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’
type: str
default_value: default
searchable_values: None
- domain¶
Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.
type: str
default_value: auto
searchable_values: None
- workspace¶
Workspace for Intel® Neural Compressor quantization where intermediate files and tuning history file are stored. Default value is: “./nc_workspace/{}/”.format(datetime.datetime.now().strftime(“%Y-%m-%d_%H-%M-%S”))
type: str
default_value: None
searchable_values: None
- recipes¶
Recipes for Intel® Neural Compressor quantization, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocessing and postprocessing ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only valid for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only valid for onnxrt_trt_ep
type: dict
default_value: {}
searchable_values: None
- reduce_range¶
Whether use 7 bit to quantization.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_level¶
Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details
type: str
default_value: auto
searchable_values: None
- excluded_precisions¶
Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].
type: list
default_value: []
searchable_values: None
- tuning_criterion¶
Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs, timeout, max_trials and objective.
type: dict
default_value: {‘strategy’: ‘basic’, ‘strategy_kwargs’: None, ‘timeout’: 0, ‘max_trials’: 5, ‘objective’: ‘performance’}
searchable_values: None
- metric¶
Accuracy metric to generate an evaluation function for Intel® Neural Compressor accuracy aware tuning.
type: olive.evaluator.metric.Metric | None
default_value: None
searchable_values: None
- weight_only_config¶
INC weight only quantization config.
type: dict
default_value: {}
searchable_values: None
- op_type_dict¶
INC weight only quantization config.
type: dict
default_value: {}
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
IncStaticQuantization¶
Intel® Neural Compressor Static Quantization Pass.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- approach¶
static quantization mode
type: str
default_value: static
searchable_values: None
- diagnosis¶
Whether to enable diagnosis mode. If enabled, Intel® Neural Compressor will print the quantization summary.
type: bool
default_value: False
searchable_values: None
- device¶
Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.
type: str
default_value: cpu
searchable_values: None
- backend¶
Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’
type: str
default_value: default
searchable_values: None
- domain¶
Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.
type: str
default_value: auto
searchable_values: None
- workspace¶
Workspace for Intel® Neural Compressor quantization where intermediate files and tuning history file are stored. Default value is: “./nc_workspace/{}/”.format(datetime.datetime.now().strftime(“%Y-%m-%d_%H-%M-%S”))
type: str
default_value: None
searchable_values: None
- recipes¶
Recipes for Intel® Neural Compressor quantization, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocessing and postprocessing ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only valid for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only valid for onnxrt_trt_ep
type: dict
default_value: {}
searchable_values: None
- reduce_range¶
Whether use 7 bit to quantization.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_level¶
Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details
type: str
default_value: auto
searchable_values: None
- excluded_precisions¶
Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].
type: list
default_value: []
searchable_values: None
- tuning_criterion¶
Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs, timeout, max_trials and objective.
type: dict
default_value: {‘strategy’: ‘basic’, ‘strategy_kwargs’: None, ‘timeout’: 0, ‘max_trials’: 5, ‘objective’: ‘performance’}
searchable_values: None
- metric¶
Accuracy metric to generate an evaluation function for Intel® Neural Compressor accuracy aware tuning.
type: olive.evaluator.metric.Metric | None
default_value: None
searchable_values: None
- weight_only_config¶
INC weight only quantization config.
type: dict
default_value: {‘bits’: 4, ‘group_size’: 4, ‘scheme’: ‘asym’, ‘algorithm’: ‘RTN’}
searchable_values: None
- op_type_dict¶
INC weight only quantization config.
type: dict
default_value: {}
searchable_values: None
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if approach is ‘static’ and dataloader_func is provided.
type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None
default_value: None
searchable_values: None
- batch_size¶
Batch size for calibration, only used if dataloader_func is provided.
type: int
default_value: 1
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if approach is ‘static’ and data_config is None.
type: Callable | str
default_value: None
searchable_values: None
- dataloader_func_kwargs¶
Keyword arguments for dataloader_func.
type: Dict[str, Any]
default_value: None
searchable_values: None
- data_config¶
Data config for calibration, required if approach is ‘static’ and dataloader_func is None.
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- quant_format¶
Quantization format. Support ‘QDQ’ and ‘QOperator’.
type: str
default_value: QOperator
searchable_values: Categorical([‘QOperator’, ‘QDQ’])
- calibration_sampling_size¶
Number of calibration sample.
type: list | int
default_value: [100]
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
IncQuantization¶
Quantize ONNX model with Intel® Neural Compressor.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- approach¶
Intel® Neural Compressor Quantization mode. ‘dynamic’ for dynamic quantization, ‘static’ for static quantization, “weight_only” for 4-bits weight-only quantization.
type: str
default_value: static
searchable_values: Categorical([‘dynamic’, ‘static’, ‘weight_only’])
- device¶
Intel® Neural Compressor quantization device. Support ‘cpu’ and ‘gpu’.
type: str
default_value: cpu
searchable_values: None
- backend¶
Backend for model execution. Support ‘default’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’
type: str
default_value: default
searchable_values: None
- domain¶
Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Intel® Neural Compressor Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.
type: str
default_value: auto
searchable_values: None
- workspace¶
Workspace for Intel® Neural Compressor quantization where intermediate files and tuning history file are stored. Default value is: “./nc_workspace/{}/”.format(datetime.datetime.now().strftime(“%Y-%m-%d_%H-%M-%S”))
type: str
default_value: None
searchable_values: None
- recipes¶
Recipes for Intel® Neural Compressor quantization, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’ only valid for onnx models ‘first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocessing and postprocessing ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only valid for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only valid for onnxrt_trt_ep
type: dict
default_value: {}
searchable_values: None
- reduce_range¶
Whether use 7 bit to quantization.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- quant_level¶
Intel® Neural Compressor allows users to choose different tuning processes by specifying the quantization level (quant_level). Currently 3 quant_levels are supported. 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1. Please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-process https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#tuning-algorithms for more details
type: str
default_value: auto
searchable_values: None
- excluded_precisions¶
Precisions to be excluded, Default value is empty list. Intel® Neural Compressor enable the mixed precision with fp32 + bf16(only when device is ‘gpu’ and backend is ‘onnxrt_cuda_ep’) + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16’].
type: list
default_value: []
searchable_values: None
- tuning_criterion¶
Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs, timeout, max_trials and objective.
type: dict
default_value: {‘strategy’: ‘basic’, ‘strategy_kwargs’: None, ‘timeout’: 0, ‘max_trials’: 5, ‘objective’: ‘performance’}
searchable_values: None
- metric¶
Accuracy metric to generate an evaluation function for Intel® Neural Compressor accuracy aware tuning.
type: olive.evaluator.metric.Metric | None
default_value: None
searchable_values: None
- weight_only_config¶
INC weight only quantization config.
type: dict
default_value: {‘bits’: 4, ‘group_size’: 4, ‘scheme’: ‘asym’, ‘algorithm’: ‘RTN’}
searchable_values: None
- op_type_dict¶
INC weight only quantization config.
type: dict
default_value: {}
searchable_values: None
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if approach is ‘static’ and dataloader_func is provided.
type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None
default_value: None
searchable_values: None
- batch_size¶
Batch size for calibration, only used if dataloader_func is provided.
type: int
default_value: 1
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if approach is ‘static’ and data_config is None.
type: Callable | str
default_value: None
searchable_values: None
- dataloader_func_kwargs¶
Keyword arguments for dataloader_func.
type: Dict[str, Any]
default_value: None
searchable_values: None
- data_config¶
Data config for calibration, required if approach is ‘static’ and dataloader_func is None.
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- quant_format¶
Quantization format. Support ‘QDQ’ and ‘QOperator’.
type: str
default_value: QOperator
searchable_values: Conditional(parents: (‘approach’,), support: {(‘static’,): Categorical([‘QOperator’, ‘QDQ’])}, default: Categorical([‘default’]))
- calibration_sampling_size¶
Number of calibration sample.
type: list | int
default_value: [100]
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
VitisAIQuantization¶
Quantize ONNX model with onnxruntime. We can search for best parameters for vai_q_onnx quantization at same time.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- quant_mode¶
Onnx Quantization mode. ‘static’ for vitis ai quantization.
type: str
default_value: static
searchable_values: Categorical([‘static’])
- data_dir¶
Path to the directory containing the dataset.
type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None
default_value: None
searchable_values: None
- batch_size¶
Batch size for calibration, required.
type: int
default_value: 1
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required’
type: Callable | str
required: True
- dataloader_func_kwargs¶
Keyword arguments for dataloader_func.
type: Dict[str, Any]
default_value: None
searchable_values: None
- weight_type¶
Data type for quantizing weights which is used in vai_q_onnx quantization. ‘QInt8’ for signed 8-bit integer,
type: str
default_value: QInt8
searchable_values: Categorical([‘QInt8’])
- input_nodes¶
Start node that needs quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- output_nodes¶
End node that needs quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- op_types_to_quantize¶
List of operator types to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_quantize¶
List of node names to quantize. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- nodes_to_exclude¶
List of node names to exclude from quantization. If None, all quantizable.
type: list
default_value: None
searchable_values: None
- per_channel¶
Quantize weights per channel.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- optimize_model¶
Deprecating Soon in ONNX! Optimize model before quantization. NOT recommended, optimization will change the computation graph, making debugging of quantization loss difficult.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- use_external_data_format¶
option used for large size (>2GB) model. Set to True by default.
type: bool
default_value: True
searchable_values: None
- quant_preprocess¶
Shape inference and model optimization, in preparation for quantization. https://onnxruntime.ai/docs/performance/quantization.html#pre-processing
type: bool
default_value: True
searchable_values: Categorical([True, False])
- calibrate_method¶
Current calibration methods supported are NonOverflow and MinMSE, Please use NonOverflow or MinMSE as options.
type: str
default_value: MinMSE
searchable_values: Categorical([‘NonOverflow’, ‘MinMSE’])
- quant_format¶
QDQ format quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor.
type: str
default_value: QDQ
searchable_values: Categorical([‘QDQ’, ‘QOperator’])
- need_layer_fusing¶
Perform layer fusion for conv-relu type operations
type: bool
default_value: False
searchable_values: Categorical([True, False])
- activation_type¶
Quantization data type of activation.
type: str
default_value: QUInt8
searchable_values: Conditional(parents: (‘quant_format’, ‘weight_type’), support: {(‘QDQ’, ‘QInt8’): Categorical([‘QInt8’]), (‘QDQ’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QUInt8’): Categorical([‘QUInt8’]), (‘QOperator’, ‘QInt8’): Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>])}, default: Categorical([<SpecialParamValue.INVALID: ‘OLIVE_INVALID_PARAM_VALUE’>]))
- enable_dpu¶
Use QDQ format optimized specifically for DPU.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- ActivationSymmetric¶
symmetrize calibration data for activations
type: bool
default_value: False
searchable_values: None
- WeightSymmetric¶
symmetrize calibration data for weights
type: bool
default_value: True
searchable_values: None
- AddQDQPairToWeight¶
remains floating-point weight and inserts both QuantizeLinear/DeQuantizeLinear nodes to weight
type: bool
default_value: False
searchable_values: None
- extra_options¶
Key value pair dictionary for extra_options in quantization. If an option is one of [‘ActivationSymmetric’, ‘WeightSymmetric’, ‘AddQDQPairToWeight’], it will be overwritten by the corresponding config parameter value.
type: dict
default_value: None
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
AppendPrePostProcessingOps¶
Add Pre/Post nodes to the input model.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- pre¶
List of pre-processing commands to add.
type: List[Dict[str, Any]]
default_value: None
searchable_values: None
- post¶
List of post-processing commands to add.
type: List[Dict[str, Any]]
default_value: None
searchable_values: None
- tool_command¶
Composited tool commands to invoke.
type: str
default_value: None
searchable_values: None
- tool_command_args¶
Arguments to pass to tool command or to PrePostProcessor. If it is used for PrePostProcessor, the schema would like: { “name”: “image”, “data_type”: “uint8”, “shape”: [“num_bytes”],
type: Dict[str, Any] | List[olive.passes.onnx.append_pre_post_processing_ops.PrePostProcessorInput]
default_value: None
searchable_values: None
- target_opset¶
The version of the default (ai.onnx) opset to target.
type: int
default_value: 16
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
InsertBeamSearch¶
Insert Beam Search Op. Only used for whisper models. Uses WhisperBeamSearch contrib op if ORT version >= 1.17.1, else uses BeamSearch contrib op.
Input: handler.base.OliveModelHandler
Output: handler.onnx.ONNXModelHandler
- no_repeat_ngram_size¶
If set to int > 0, all ngrams of that size can only occur once.
type: int
default_value: 0
searchable_values: None
- use_vocab_mask¶
Use vocab_mask as an extra graph input to the beam search op. Only supported in ORT >= 1.16.0
type: bool
default_value: False
searchable_values: None
- use_prefix_vocab_mask¶
Use prefix_vocab_mask as an extra graph input to the beam search op. Only supported in ORT >= 1.16.0
type: bool
default_value: False
searchable_values: None
- use_forced_decoder_ids¶
Use decoder_input_ids as an extra graph input to the beam search op. Only supported in ORT >= 1.16.0
type: bool
default_value: False
searchable_values: None
- use_logits_processor¶
Use logits_processor as an extra graph input to the beam search op. Only supported in ORT >= 1.16.0
type: bool
default_value: False
searchable_values: None
- use_temperature¶
Use temperature as an extra graph input to the beam search op. Only supported in ORT >= 1.17.1
type: bool
default_value: False
searchable_values: None
- fp16¶
Is the model in fp16 precision.
type: bool
default_value: False
searchable_values: None
- use_gpu¶
Use GPU for beam search op.
type: bool
default_value: False
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
ExtractAdapters¶
Extract adapter weights from model and save them as external weights file. If make_inputs is False, model proto is invalid after this pass as the adapter weights point to non-existent external files. Inference session must be created by first loading the adapter weights using SessionOptions.add_external_initializers. If make_inputs is True, the adapter weights are inputs to the model and must be provided during inference.
Input: handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- make_inputs¶
Convert adapter weights to inputs. If false, the adapter weights will be set as initializers with external data.
type: bool
default_value: False
searchable_values: None
- pack_inputs¶
Pack adapter weights for the same module type into a single input tensor. Only used if make_inputs is True.
type: bool
default_value: True
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
LoRA¶
Run LoRA fine-tuning on a Hugging Face PyTorch model. This pass only supports PyTorchModelHandler with hf_config.
Input: handler.pytorch.PyTorchModelHandler
Output: handler.pytorch.PyTorchModelHandler
- target_modules¶
Target modules
type: List[str]
default_value: None
searchable_values: None
- use_ort_trainer¶
Whether or not to use ORTTrainer.
type: bool
default_value: False
searchable_values: None
- ortmodule_onnx_opset_version¶
The opset version to use for ONNX export when using ORTTrainer. Only used if use_ort_trainer is True. 16+ is required when using bfloat16 and model has operators such as Where.
type: int
default_value: 16
searchable_values: None
- lora_r¶
Lora attention dimension.
type: int
default_value: 64
searchable_values: None
- lora_alpha¶
The alpha parameter for Lora scaling.
type: float
default_value: 16
searchable_values: None
- lora_dropout¶
The dropout probability for Lora layers.
type: float
default_value: 0.0
searchable_values: None
- bias¶
Bias type for Lora
type: str
default_value: none
searchable_values: None
- modules_to_save¶
List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.
type: None
default_value: None
searchable_values: None
- torch_dtype¶
Data type to use for training. Should be one of bfloat16, float16 or float32. If float16 will use fp16 mixed-precision training.
type: str
default_value: bfloat16
searchable_values: None
- allow_tf32¶
Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see ‘https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices’
type: bool
default_value: True
searchable_values: None
- train_data_config¶
Data config for fine-tuning training. If eval_data_config is not provided and eval_dataset_size is not None, the data will be split into train and eval. Otherwise, the data will be used for training only.
type: olive.data.config.DataConfig | Dict
required: True
- eval_data_config¶
Data config for fine-tuning evaluation. Optional if eval_dataset_size is provided or evaluation is not needed.
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- eval_dataset_size¶
Size of the validation dataset. Should be either positive and smaller than the number of train sample or a float in the (0, 1) range. If eval_data_config is provided, this parameter will be ignored.
type: float
default_value: None
searchable_values: None
- training_args¶
Training arguments. If None, will use default arguments. See HFTrainingArguments for more details.
type: olive.passes.pytorch.lora.HFTrainingArguments | Dict
default_value: None
searchable_values: None
QLoRA¶
Run QLoRA fine-tuning on a Hugging Face PyTorch model. This pass only supports PyTorchModelHandler with hf_config.
Input: handler.pytorch.PyTorchModelHandler
Output: handler.pytorch.PyTorchModelHandler
- double_quant¶
Whether to use nested quantization where the quantization constants from the first quantization are quantized again.
type: bool
default_value: False
searchable_values: None
- quant_type¶
Quantization data type to use. Should be one of fp4 or nf4.
type: str
default_value: nf4
searchable_values: None
- compute_dtype¶
Computation data type for the quantized modules. If not provided, will use the same dtype as torch_dtype
type: str
default_value: None
searchable_values: None
- use_ort_trainer¶
Whether or not to use ORTTrainer.
type: bool
default_value: False
searchable_values: None
- ortmodule_onnx_opset_version¶
The opset version to use for ONNX export when using ORTTrainer. Only used if use_ort_trainer is True. 16+ is required when using bfloat16 and model has operators such as Where.
type: int
default_value: 16
searchable_values: None
- lora_r¶
Lora attention dimension.
type: int
default_value: 64
searchable_values: None
- lora_alpha¶
The alpha parameter for Lora scaling.
type: float
default_value: 16
searchable_values: None
- lora_dropout¶
The dropout probability for Lora layers.
type: float
default_value: 0.0
searchable_values: None
- bias¶
Bias type for Lora
type: str
default_value: none
searchable_values: None
- modules_to_save¶
List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.
type: None
default_value: None
searchable_values: None
- torch_dtype¶
Data type to use for training. Should be one of bfloat16, float16 or float32. If float16 will use fp16 mixed-precision training.
type: str
default_value: bfloat16
searchable_values: None
- allow_tf32¶
Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see ‘https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices’
type: bool
default_value: True
searchable_values: None
- train_data_config¶
Data config for fine-tuning training. If eval_data_config is not provided and eval_dataset_size is not None, the data will be split into train and eval. Otherwise, the data will be used for training only.
type: olive.data.config.DataConfig | Dict
required: True
- eval_data_config¶
Data config for fine-tuning evaluation. Optional if eval_dataset_size is provided or evaluation is not needed.
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- eval_dataset_size¶
Size of the validation dataset. Should be either positive and smaller than the number of train sample or a float in the (0, 1) range. If eval_data_config is provided, this parameter will be ignored.
type: float
default_value: None
searchable_values: None
- training_args¶
Training arguments. If None, will use default arguments. See HFTrainingArguments for more details.
type: olive.passes.pytorch.lora.HFTrainingArguments | Dict
default_value: None
searchable_values: None
LoftQ¶
Run LoftQ fine-tuning on a Hugging Face PyTorch model. This pass only supports PyTorchModelHandler with hf_config.
Input: handler.pytorch.PyTorchModelHandler
Output: handler.pytorch.PyTorchModelHandler
- loftq_iter¶
Number of LoftQ iterations.
type: int
default_value: 1
searchable_values: None
- compute_dtype¶
Computation data type for the quantized modules. If not provided, will use the same dtype as torch_dtype
type: str
default_value: None
searchable_values: None
- use_ort_trainer¶
Whether or not to use ORTTrainer.
type: bool
default_value: False
searchable_values: None
- ortmodule_onnx_opset_version¶
The opset version to use for ONNX export when using ORTTrainer. Only used if use_ort_trainer is True. 16+ is required when using bfloat16 and model has operators such as Where.
type: int
default_value: 16
searchable_values: None
- lora_r¶
Lora attention dimension.
type: int
default_value: 64
searchable_values: None
- lora_alpha¶
The alpha parameter for Lora scaling.
type: float
default_value: 16
searchable_values: None
- lora_dropout¶
The dropout probability for Lora layers.
type: float
default_value: 0.0
searchable_values: None
- bias¶
Bias type for Lora
type: str
default_value: none
searchable_values: None
- modules_to_save¶
List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.
type: None
default_value: None
searchable_values: None
- torch_dtype¶
Data type to use for training. Should be one of bfloat16, float16 or float32. If float16 will use fp16 mixed-precision training.
type: str
default_value: bfloat16
searchable_values: None
- allow_tf32¶
Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see ‘https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices’
type: bool
default_value: True
searchable_values: None
- train_data_config¶
Data config for fine-tuning training. If eval_data_config is not provided and eval_dataset_size is not None, the data will be split into train and eval. Otherwise, the data will be used for training only.
type: olive.data.config.DataConfig | Dict
required: True
- eval_data_config¶
Data config for fine-tuning evaluation. Optional if eval_dataset_size is provided or evaluation is not needed.
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- eval_dataset_size¶
Size of the validation dataset. Should be either positive and smaller than the number of train sample or a float in the (0, 1) range. If eval_data_config is provided, this parameter will be ignored.
type: float
default_value: None
searchable_values: None
- training_args¶
Training arguments. If None, will use default arguments. See HFTrainingArguments for more details.
type: olive.passes.pytorch.lora.HFTrainingArguments | Dict
default_value: None
searchable_values: None
LoRA/QLoRA/LoftQ HFTrainingArguments¶
- pydantic settings olive.passes.pytorch.lora.HFTrainingArguments[source]¶
Training arguments for transformers.Trainer.
Has the same fields as transformers.TrainingArguments with recommended default values for QLoRA fine-tuning.
- field seed: int = 42¶
Random seed for initialization.
- field data_seed: int = 42¶
Random seed to be used with data samplers.
- field optim: str = 'paged_adamw_32bit'¶
The optimizer to use.
- field per_device_train_batch_size: int = 1¶
The batch size per GPU for training.
- field per_device_eval_batch_size: int = 1¶
The batch size per GPU for evaluation.
- field gradient_accumulation_steps: int = 16¶
Number of updates steps to accumulate the gradients for, before performing a backward/update pass.
- field max_steps: int = 10000¶
The total number of training steps to perform.
- field weight_decay: float = 0.0¶
The L2 weight decay rate of AdamW
- field learning_rate: float = 0.0002¶
The initial learning rate for AdamW.
- field gradient_checkpointing: bool = True¶
Use gradient checkpointing. Recommended.
- field lr_scheduler_type: str = 'constant'¶
Learning rate schedule. Constant a bit better than cosine, and has advantage for analysis.
- field warmup_ratio: float = 0.03¶
Fraction of steps to do a warmup for.
- field logging_steps: int = 10¶
Number of update steps between two logs.
- field evaluation_strategy: str = 'no'¶
The evaluation strategy to use. Will be forced to ‘no’ if there is no eval dataset.
- field eval_steps: float = None¶
Number of update steps between two evaluations if evaluation_strategy=’steps’. Will default to the same value as logging_steps if not set
- field group_by_length: bool = True¶
Whether or not to group samples of roughly the same length together when batching.
- field report_to: str | List[str] = 'none'¶
The list of integrations to report the results and logs to.
- field output_dir: str = None¶
The output dir for logs and checkpoints. If None, will use a temp dir.
- field overwrite_output_dir: bool = False¶
If True, overwrite the content of output_dir. Otherwise, will continue training if output_dir points to a checkpoint directory.
- field resume_from_checkpoint: str = None¶
The path to a folder with a valid checkpoint for the model. Supercedes any checkpoint found in output_dir.
- field extra_args: Dict[str, Any] = None¶
Extra arguments to pass to the trainer. Values can be provided directly to this field as a dict or as keyword arguments to the config. See transformers.TrainingArguments for more details on the available arguments.
QuantizationAwareTraining¶
Run quantization aware training on PyTorch model.
Input: handler.pytorch.PyTorchModelHandler
Output: handler.pytorch.PyTorchModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- train_data_dir¶
Directory of training data.
type: str
default_value: None
searchable_values: None
- val_data_dir¶
Directory of validation data.
type: str
default_value: None
searchable_values: None
- train_dataloader_func¶
Dataloader function to load training data from given train_data_dir with given train_batch_size.
type: Callable | str
default_value: None
searchable_values: None
- training_loop_func¶
Customized training loop function.
type: Callable | str
default_value: None
searchable_values: None
- ptl_module¶
LightningModule for PyTorch Lightning trainer. It is a way of encapsulating all the logic related to the training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html for more details.
type: Callable | str
default_value: None
searchable_values: None
- ptl_data_module¶
LightningDataModule for PyTorch Lightning trainer. It is a way of encapsulating all the data-related logic for training, validation, and testing of a PyTorch model. Please refer to https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html for more details.
type: Callable | str
default_value: None
searchable_values: None
- train_batch_size¶
Batch size for training.
type: int
default_value: None
searchable_values: None
- num_epochs¶
Maximum number of epochs for training.
type: int
default_value: None
searchable_values: None
- num_steps¶
Maximum number of steps for training.
type: int
default_value: -1
searchable_values: None
- do_validate¶
Whether perform one evaluation epoch over the validation set after training.
type: bool
default_value: False
searchable_values: None
- modules_to_fuse¶
List of list of module names to fuse.
type: List[List[str]]
default_value: None
searchable_values: None
- qconfig_func¶
Customized function to create a QConfig for QAT. Please refer to https://pytorch.org/docs/stable/generated/torch.ao.quantization.qconfig.QConfig.html for details.
type: Callable | str
default_value: None
searchable_values: None
- logger¶
Logger for training.
type: pytorch_lightning.loggers.logger.Logger | Iterable[pytorch_lightning.loggers.logger.Logger] | Callable | bool
default_value: False
searchable_values: None
- gpus¶
Number of GPUs to use.
type: int
default_value: None
searchable_values: None
- seed¶
Random seed for training.
type: int
default_value: None
searchable_values: None
- checkpoint_path¶
Path to save checkpoints.
type: str
default_value: None
searchable_values: None
OpenVINOConversion¶
Converts PyTorch, ONNX or TensorFlow Model to OpenVino Model.
Input: handler.pytorch.PyTorchModelHandler | handler.onnx.ONNXModelHandler
Output: handler.openvino.OpenVINOModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- input¶
Set or override shapes for model inputs. It configures dynamic and static dimensions in model inputs depending on your inference requirements.
type: Callable | str | List
default_value: None
searchable_values: None
- example_input_func¶
Function/function name to generate sample of model input in original framework. For PyTorch it can be torch.Tensor. For Tensorflow it can be tf.Tensor or numpy.ndarray.
type: Callable | str
default_value: None
searchable_values: None
- compress_to_fp16¶
Compress weights in output OpenVINO model to FP16. Default is True.
type: bool
default_value: True
searchable_values: None
- extra_configs¶
Extra configurations for OpenVINO model conversion. extra_config can be set by passing a dictionary where key is the parameter name, and the value is the parameter value. Please check Conversion Parameters documentation for more details: https://docs.openvino.ai/2023.3/openvino_docs_OV_Converter_UG_Conversion_Options.html
type: Dict
default_value: None
searchable_values: None
- output_model¶
Name of the output OpenVINO model.
type: str
default_value: ov_model
searchable_values: None
OpenVINOQuantization¶
Input: handler.openvino.OpenVINOModelHandler
Output: handler.openvino.OpenVINOModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- dataloader_func¶
Function/function name to generate dataloader for calibration, required if data_config is None.
type: Callable | str
default_value: None
searchable_values: None
- dataloader_func_kwargs¶
Keyword arguments for dataloader_func.
type: Dict[str, Any]
default_value: None
searchable_values: None
- data_dir¶
Path to the directory containing the dataset. For local data, it is required if dataloader_func is provided.
type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None
default_value: None
searchable_values: None
- batch_size¶
Data config for calibration, required if dataloader_func is None.
type: int
default_value: 1
searchable_values: None
- data_config¶
Data config for calibration, required if dataloader_func is None.
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- model_type¶
Used to specify quantization scheme required for specific type of the model. ‘TRANSFORMER’ is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, DistilBERT, etc.). None is default.
type: olive.passes.openvino.quantization.ModelTypeEnum
default_value: None
searchable_values: None
- preset¶
Defines quantization scheme for the model. Supported values: ‘PERFORMANCE’, ‘MIXED’.
type: olive.passes.openvino.quantization.PresetEnum
default_value: PERFORMANCE
searchable_values: None
- ignored_scope¶
This parameter can be used to exclude some layers from the quantization process to preserve the model accuracy. Please refer to https://docs.openvino.ai/2023.3/basic_quantization_flow.html#tune-quantization-parameters.
type: str | List[str]
default_value: None
searchable_values: None
- ignored_scope_type¶
Defines the type of the ignored scope. Supported values: ‘names’, ‘types’, ‘patterns’.
type: olive.passes.openvino.quantization.IgnoreScopeTypeEnum
default_value: None
searchable_values: None
- target_device¶
Target device for the model. Supported values: ‘any’, ‘cpu’, ‘gpu’, ‘cpu_spr’, ‘vpu’. Default value is the same as the accelerator type of this workflow run.
type: olive.hardware.accelerator.Device
default_value: cpu
searchable_values: None
- extra_configs¶
Extra configurations for OpenVINO model quantization. Please refer to https://docs.openvino.ai/2023.3/basic_quantization_flow.html#tune-quantization-parameters.
type: List[Dict]
default_value: None
searchable_values: None
SNPEConversion¶
Convert ONNX or TensorFlow model to SNPE DLC. Uses snpe-tensorflow-to-dlc or snpe-onnx-to-dlc tools from the SNPE SDK.
Input: handler.onnx.ONNXModelHandler | handler.tensorflow.TensorFlowModelHandler
Output: handler.snpe.SNPEModelHandler
- input_names¶
List of input names.
type: List[str]
required: True
- input_shapes¶
List of input shapes. Must be the same length as input_names.
type: List[List[int]]
required: True
- output_names¶
List of output names.
type: List[str]
required: True
- input_types¶
List of input types. If not None, it must be a list of the same length as input_names. List members can be None to use default value. Refer to olive.platform_sdk.qualcomm.constants.InputType for valid values.
type: List[str | None]
default_value: None
searchable_values: None
- input_layouts¶
List of input layouts. If not None, it must be a list of the same length as input_names. List members can be None to use inferred value. Refer to olive.platform_sdk.qualcomm.constants.InputLayout for valid values.
type: List[str | None]
default_value: None
searchable_values: None
- extra_args¶
Extra arguments to pass to snpe conversion tool. Refer to snpe-onnx-to-dlc and snpe-tensorflow-to-dlc at https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html for more additional arguments. The value is a string that will be passed as is to the tool. e.g.: –enable_cpu_fallback –priority_hint low
type: str
default_value: None
searchable_values: None
SNPEQuantization¶
Quantize SNPE model. Uses snpe-dlc-quantize tool from the SNPE SDK.
Input: handler.snpe.SNPEModelHandler
Output: handler.snpe.SNPEModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- data_dir¶
Path to the data directory. Required is data_config is None.
type: str | pathlib.Path | olive.resource_path.ResourcePath | olive.resource_path.ResourcePathConfig | None
default_value: None
searchable_values: None
- dataloader_func¶
Function or function name to create dataloader for quantization. Function should take data directory as an argument and return a FileListDataLoader or torch.data.DataLoader-like object. Required if data_config is None.
type: Callable | str
default_value: None
searchable_values: None
- dataloader_func_kwargs¶
Keyword arguments for dataloader_func.
type: Dict[str, Any]
default_value: None
searchable_values: None
- data_config¶
Data config for quantization, required if dataloader_func is None
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- use_enhanced_quantizer¶
Use the enhanced quantizer feature when quantizing the model. Uses an algorithm to determine optimal range instead of min and max range of data. It can be useful for quantizing models that have long tails in the distribution of the data being quantized.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- enable_htp¶
Pack HTP information in quantized DLC, which is not available in Windows.
type: bool
default_value: False
searchable_values: Categorical([True, False])
- htp_socs¶
List of SoCs to generate HTP Offline cache for.
type: List[str]
default_value: None
searchable_values: None
- extra_args¶
Extra arguments to pass to snpe conversion tool. Refer to https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html#tools_snpe-dlc-quantize for more additional arguments. The value is a string that will be passed as is to the tool. e.g.: –bias_bitwidth 16 –overwrite_cache_records
type: str
default_value: None
searchable_values: None
SNPEtoONNXConversion¶
Convert a SNPE DLC to ONNX to use with SNPE Execution Provider. Creates a ONNX graph with the SNPE DLC as a node.
Input: handler.snpe.SNPEModelHandler
Output: handler.onnx.ONNXModelHandler
- target_device¶
Target device for the ONNX model. Refer to oliveolive.platform_sdk.qualcomm.constants.SNPEDevice for valid values.
type: str
default_value: cpu
searchable_values: None
- target_opset¶
Target ONNX opset version.
type: int
default_value: 12
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
QNNConversion¶
Convert ONNX, TensorFlow, or PyTorch model to QNN C++ model. Quantize the model if –input_list is provided as extra_args. Uses qnn-[framework]-converter tool from the QNN SDK.
Input: handler.tensorflow.TensorFlowModelHandler | handler.pytorch.PyTorchModelHandler | handler.onnx.ONNXModelHandler
Output: handler.qnn.QNNModelHandler
- input_dim¶
The names and dimensions of the network input layers specified in the format [input_name comma-separated-dimensions], for example: [“data 1,224,224,3”] Note that the quotes should always be included in order to handle special characters, spaces, etc. For multiple inputs specify multiple –input_dim on the command line like: [“data 1,224,224,3”, “data2 1,224,224,3”] If –input_dim is not specified, the input dimensions will be inferred from the model. If –input_dim is specified, the input dimensions will be used as-is.
type: List[str]
default_value: None
searchable_values: None
- out_node¶
The name of the output node. If not specified, the output node will be inferred from the model. If specified, the output node will be used as-is. Example: [“out_1”, “out_2”]
type: List[str]
default_value: None
searchable_values: None
- extra_args¶
Extra arguments to pass to qnn-[framework]-converter tool, e.g. –show_unconsumed_nodes –custom_io CUSTOM_IO. See the documentation for more details: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/tools.html
type: str
default_value: None
searchable_values: None
QNNModelLibGenerator¶
Compile QNN C++ model source code into QNN model library for a specific target. Uses qnn-model-lib-generator tool from the QNN SDK.
Input: handler.qnn.QNNModelHandler
Output: handler.qnn.QNNModelHandler
- lib_targets¶
Specifies the targets to build the models for. Default: aarch64-android x86_64-linux-clang
type: str
default_value: None
searchable_values: None
- lib_name¶
Specifies the name to use for libraries. Default: uses name in <model.bin> if provided, else generic qnn_model.so
type: str
default_value: None
searchable_values: None
QNNContextBinaryGenerator¶
Create QNN context binary from a QNN model library using a particular backend. Uses qnn-context-binary-generator tool from the QNN SDK.
Input: handler.qnn.QNNModelHandler
Output: handler.qnn.QNNModelHandler
- backend¶
Path to a QNN backend .so library to create the context binary.
type: str
required: True
- binary_file¶
Name of the binary file to save the context binary to. Saved in the same path as –output_dir option with .bin as the binary file extension. If not provided, no backend binary is created.
type: str
default_value: None
searchable_values: None
- extra_args¶
Extra arguments to qnn-context-binary-generator
type: str
default_value: None
searchable_values: None
SparseGPT¶
Run SparseGPT on a Hugging Face PyTorch model. See https://arxiv.org/abs/2301.00774 for more details on the algorithm. This pass only supports PyTorchModelHandler with hf_config. The transformers model type must be one of [bloom, gpt2, gpt_neox, llama, opt].
Input: handler.pytorch.PyTorchModelHandler
Output: handler.pytorch.PyTorchModelHandler
- sparsity¶
Target sparsity. This can be a float or a list of two integers. Float is the target sparsity per layer. List [n,m] applies semi-structured (n:m) sparsity patterns. Refer to https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/ for more details on 2:4 sparsity pattern.
type: float | List[int]
default_value: None
searchable_values: None
- blocksize¶
Blocksize to use for adaptive mask selection.
type: int
default_value: 128
searchable_values: None
- percdamp¶
Percentage of the average Hessian diagonal to use for dampening. Must be in [0,1].
type: float
default_value: 0.01
searchable_values: None
- min_layer¶
Prune all layers with id >= min_layer.
type: int
default_value: None
searchable_values: None
- max_layer¶
Prune all layers with id < max_layer.
type: int
default_value: None
searchable_values: None
- layer_name_filter¶
Only prune layers whose name contains the given string(s).
type: str | List[str]
default_value: None
searchable_values: None
- device¶
Device to use for performing computations. Can be ‘auto, ‘cpu’, ‘cuda’, ‘cuda:0’, etc. If ‘auto’, will use cuda if available. Does not affect the final model.
type: str
default_value: auto
searchable_values: None
- data_config¶
Data config to use for pruning weights. All samples in the data are expected to be of the same length, most likely the max sequence length of the model.
type: olive.data.config.DataConfig | Dict
required: True
SliceGPT¶
Run SliceGPT on a Hugging Face PyTorch model. See https://arxiv.org/pdf/2401.15024.pdf for more details on the algorithm. This pass only supports PyTorchModelHandler with hf_config.
Input: handler.pytorch.PyTorchModelHandler
Output: handler.pytorch.PyTorchModelHandler
- calibration_data_config¶
Data config for Dataset to calibrate and calculate perplexity on.
type: olive.data.config.DataConfig | Dict
required: True
- calibration_nsamples¶
Number of samples of the calibration data to load.
type: int
default_value: 128
searchable_values: None
- calibration_batch_size¶
Batch size for loading the calibration data.
type: int
default_value: 16
searchable_values: None
- calibration_max_seqlen¶
Maximum sequence length for the calibration data.
type: int
default_value: 2048
searchable_values: None
- varied_seqlen¶
Varied sequence lengths in the calibration data.
type: bool
default_value: False
searchable_values: None
- seed¶
Seed for sampling the calibration data.
type: int
default_value: 42
searchable_values: None
- sparsity¶
A measure of how much slicing is applied (in the range [0, 1))
type: float
default_value: 0.0
searchable_values: None
- round_interval¶
Interval for rounding the weights (the best value may depend on your hardware)
type: int
default_value: 8
searchable_values: None
- final_orientation¶
Final orientation of the sliced weights. Choices are random or pca.
type: str
default_value: random
searchable_values: None
GptqQuantizer¶
GPTQ quantization using Hugging Face Optimum and export model with onnxruntime optimized kernel.
Input: handler.pytorch.PyTorchModelHandler
Output: handler.pytorch.PyTorchModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- nsamples¶
number of samples in calibration dataset to apply quantization. Default value is 128
type: int
default_value: 128
searchable_values: None
- bits¶
quantization bits. Default value is 4
type: int
default_value: 4
searchable_values: None
- layers_block_name¶
Block name to quantize. Default value is model.layers. For models can’t be auto filled, you can refer this link to fill these parameters. https://github.com/AutoGPTQ/AutoGPTQ/blob/896d8204bc89a7cfbda42bf3314e13cf4ce20b02/auto_gptq/modeling/llama.py#L19-L26
type: str
default_value: model.layers
searchable_values: None
- outside_layer_modules¶
Names of other nn modules that in the same level as the transformer layer block. Default value is None.
type: List[str]
default_value: None
searchable_values: None
- inside_layer_modules¶
Names of linear layers in transformer layer module. Default value is None.
type: List[List[str]]
default_value: None
searchable_values: None
- group_size¶
Block size for quantization. Default value is 128.
type: int
default_value: 128
searchable_values: None
- batch_size¶
Batch size for quantization. Default value is 1.
type: int
default_value: 1
searchable_values: None
- seed¶
Random seed for sampling calibration dataset. Default value is 0.
type: int
default_value: 0
searchable_values: None
- damp_percent¶
Damping factor for quantization. Default value is 0.01.
type: float
default_value: 0.01
searchable_values: None
- static_groups¶
Use static groups for quantization. Default value is False.
type: bool
default_value: False
searchable_values: None
- true_sequential¶
Use true sequential for quantization. Default value is False.
type: bool
default_value: False
searchable_values: None
- desc_act¶
Use descriptive activation for quantization. Default value is False.
type: bool
default_value: False
searchable_values: None
- sym¶
Symmetric quantization. Default value is False.
type: bool
default_value: False
searchable_values: None
- data_config¶
Data config for quantization. Default value is None.
type: olive.data.config.DataConfig | Dict
default_value: None
searchable_values: None
- dataloader_func¶
Function/function name to generate dataset for quantization. The returned datasets is a list of tokenized data (e.g. [{ ‘input_ids’: [ 1, 100, 15, … ],’attention_mask’: [ 1, 1, 1, … ]},…]). Default is None.
type: Callable | str
default_value: None
searchable_values: None
- dataloader_func_kwargs¶
Keyword arguments for dataloader_func. Default value is None.
type: Dict[str, Any]
default_value: None
searchable_values: None
TorchTRTConversion¶
Convert torch.nn.Linear modules in the transformer layers of a HuggingFace PyTorch model to TensorRT modules. The conversion would include fp16 precision and sparse weights, if applicable. The entire model is saved using torch.save and can be loaded using torch.load. Loading the model requires torch-tensorrt and Olive to be installed. This pass only supports PyTorchModelHandler with hf_config. The transformers model type must be one of [bloom, gpt2, gpt_neox, llama, opt].
Input: handler.pytorch.PyTorchModelHandler
Output: handler.pytorch.PyTorchModelHandler
- min_layer¶
Convert all layers with id >= min_layer.
type: int
default_value: None
searchable_values: None
- max_layer¶
Convert all layers with id < max_layer.
type: int
default_value: None
searchable_values: None
- layer_name_filter¶
Only convert layers whose name contains the given string(s).
type: str | List[str]
default_value: None
searchable_values: None
- float16¶
Convert entire model to fp16. If False, only the sparse modules are converted to fp16.
type: bool
default_value: False
searchable_values: None
- data_config¶
Data config to use for compiling module to TensorRT. The batch size of the compiled module is set to the batch size of the first batch of the dataloader.
type: olive.data.config.DataConfig | Dict
required: True
OptimumConversion¶
Convert a Hugging Face PyTorch model to ONNX model using the Optimum export function.
Input: handler.pytorch.PyTorchModelHandler
Output: handler.onnx.ONNXModelHandler | handler.composite.CompositeModelHandler
- script_dir¶
Directory containing user script dependencies.
type: str
default_value: None
searchable_values: None
- user_script¶
Path to user script. The values for other parameters which were assigned function or object names will be imported from this script.
type: str
default_value: None
searchable_values: None
- target_opset¶
The version of the default (ai.onnx) opset to target.
type: int
default_value: 14
searchable_values: None
- components¶
List of component models to export. E.g. [‘decoder_model’, ‘decoder_with_past_model’]. None means export all components.
type: List[str]
default_value: None
searchable_values: None
- fp16¶
Whether to use fp16 precision to load torch model and then convert it to onnx.
type: bool
default_value: False
searchable_values: None
- device¶
The device to use to do the export. Defaults to ‘cpu’.
type: str
default_value: cpu
searchable_values: None
- extra_args¶
Extra arguments to pass to the optimum.exporters.onnx.main_export function.
type: dict
default_value: None
searchable_values: None
OptimumMerging¶
Merges a decoder_model with its decoder_with_past_model via the Optimum library.
Input: handler.composite.CompositeModelHandler
Output: handler.onnx.ONNXModelHandler | handler.composite.CompositeModelHandler
- strict¶
When set, the decoder and decoder_with_past are expected to have strictly the same number of outputs. When False, the decoder is allowed to have more outputs that decoder_with_past, in which case constant outputs are added to match the number of outputs.
type: bool
default_value: True
searchable_values: None
- save_as_external_data¶
Serializes tensor data to separate files instead of directly in the ONNX file. Large models (>2GB) may be forced to save external data regardless of the value of this parameter.
type: bool
default_value: False
searchable_values: None
- all_tensors_to_one_file¶
Effective only if save_as_external_data is True. If true, save all tensors to one external file specified by ‘external_data_name’. If false, save each tensor to a file named with the tensor name.
type: bool
default_value: True
searchable_values: None
- external_data_name¶
Effective only if all_tensors_to_one_file is True and save_as_external_data is True. If not specified, the external data file will be named with <model_path_name>.data
type: str
default_value: None
searchable_values: None
- size_threshold¶
Effective only if save_as_external_data is True. Threshold for size of data. Only when tensor’s data is >= the size_threshold it will be converted to external data. To convert every tensor with raw data to external data set size_threshold=0.
type: int
default_value: 1024
searchable_values: None
- convert_attribute¶
Effective only if save_as_external_data is True. If true, convert all tensors to external data If false, convert only non-attribute tensors to external data
type: bool
default_value: False
searchable_values: None
ModelBuilder¶
Converts a Huggingface generative PyTorch model to ONNX model using the Generative AI builder. See https://github.com/microsoft/onnxruntime-genai
Input: handler.pytorch.PyTorchModelHandler | handler.onnx.ONNXModelHandler
Output: handler.onnx.ONNXModelHandler
- precision¶
Precision of model.
type: olive.passes.onnx.model_builder.ModelBuilder.Precision
required: True
- metadata_only¶
Whether to export the model or generate required metadata only.
type: bool
default_value: False
searchable_values: None
- search¶
Search options to use for generate loop.
type: Dict[str, Any]
default_value: None
searchable_values: None
- int4_block_size¶
Specify the block_size for int4 quantization. Acceptable values: 16/32/64/128/256.
type: int
default_value: None
searchable_values: None
- int4_accuracy_level¶
Specify the minimum accuracy level for activation of MatMul in int4 quantization.
type: olive.passes.onnx.model_builder.ModelBuilder.AccuracyLevel
default_value: None
searchable_values: None
- exclude_embeds¶
Remove embedding layer from your ONNX model.
type: bool
default_value: False
searchable_values: None
- exclude_lm_head¶
Remove language modeling head from your ONNX model.
type: bool
default_value: False
searchable_values: None
- enable_cuda_graph¶
The model can use CUDA graph capture for CUDA execution provider. If enabled, all nodes being placed on the CUDA EP is the prerequisite for the CUDA graph to be used correctly.
type: bool
default_value: False
searchable_values: None