OpenVINO#
OpenVINO is a cross-platform deep learning toolkit developed by Intel. The name stands for “Open Visual Inference and Neural Network Optimization.” OpenVINO focuses on optimizing neural network inference with a write-once, deploy-anywhere approach for Intel hardware platforms.
Read more at: Intel® Distribution of OpenVINO™ Toolkit
The nncf
package (Neural Network Compression Framework) is used for model compression and quantization. It is required for workflows involving post-training quantization or other advanced optimization techniques in OpenVINO.
For Generative AI models, install Optimum Intel® from Optimum Intel® Installation Instructions
Prerequisites#
Note: OpenVINO version in Olive: 2025.1.0
Option 1: install Olive with OpenVINO extras#
pip install olive-ai[openvino]
Option 2: Install OpenVINO Runtime and OpenVINO Development Tools from Pypi#
pip install openvino==2025.1.0
pip install nncf==2.16.0
pip install onnxruntime-openvino
Install Optimum Intel® for Generative AI Workloads#
pip install optimum[openvino]
More detailed instructions are available at Optimum Intel® Installation Instructions
Model Conversion#
OpenVINOConversion
pass will convert the model from original framework to OpenVINO IR Model. PyTorchModelHandler
, ONNXModelHandler
and
TensorFlowModelHandler
are supported for now.
Please refer to OpenVINOConversion for more details about the pass and its config parameters.
Example Conversion Configuration#
{
"type": "OpenVINOConversion",
"input_shapes": [[1, 3, 32, 32]]
}
Model IoUpdate#
OpenVINOIoUpdate
pass is a required pass used only for OpenVino IR Model. It converts OpenVINOModelHandler
into a static shaped model and
to update input and output tensors.
Please refer to OpenVINOIoUpdate for more details about the pass and its config parameters.
The "static"
parameter defaults to true
and does not need to be explicitly overridden.
Example IO Update Configuration#
{
"type": "OpenVINOIoUpdate",
"input_shapes": [[1, 3, 32, 32]],
"static": false
}
Post Training Quantization (PTQ)#
OpenVINOQuantization
pass and OpenVINOQuantizationWithAccuracy
passes will run Post-training quantization for OpenVINO models, as well as ONNX models, and support the uniform integer quantization method.
This method allows moving from floating-point precision to integer precision (for example, 8-bit) for weights and activations during the
inference time. It helps to reduce the model size, memory footprint and latency, as well as improve the computational efficiency, using
integer arithmetic. During the quantization process the model undergoes the transformation process when additional operations, that contain
quantization information, are inserted into the model. The actual transition to integer arithmetic happens at model inference.
Please refer to OpenVINOQuantization for more details about the OpenVINOQuantization
pass and its config parameters.
Please refer to OpenVINOQuantizationWithAccuracy for more details about the OpenVINOQuantizationWithAccuracy
pass and its config parameters.
Example PTQ Configuration#
{
"type": "OpenVINOQuantizationWithAccuracy",
"data_config": "calib_data_config",
"validation_func": "validate",
"max_drop": 0.01,
"drop_type": "ABSOLUTE"
}
Model Encapsulation#
OpenVINOEncapsulation
pass is used to generate an onnx model that encapsulates a OpenVINO IR model. It supports OpenVINOModelHandler
for now.
Please refer to OpenVINOEncapsulation for more details about the pass and its config parameters.
Example Encapsulation Configuration#
{
"type": "OpenVINOEncapsulation",
"target_device": "npu",
"ov_version": "2025.1"
}
Optimum CLI Command for Generative AI workloads#
OpenVINOOptimumConversion
pass will run optimum-cli export openvino command on the input Huggingface models to convert those to OpenVINO models and perform weight compression and quantization if necessary to produce an output OpenVINO model.
Please refer to OpenVINOOptimumConversion and also to optimum-cli export openvino for more details about the pass and its config parameters.
Example Optimum Conversion Configuration#
{
"type": "OpenVINOOptimumConversion",
"extra_args" : { "device": "npu" },
"ov_quant_config": {
"weight_format": "int4",
"dataset": "wikitext2",
"awq": true
}
}