OpenVINO#

OpenVINO is a cross-platform deep learning toolkit developed by Intel. The name stands for “Open Visual Inference and Neural Network Optimization.” OpenVINO focuses on optimizing neural network inference with a write-once, deploy-anywhere approach for Intel hardware platforms.

The nncf package (Neural Network Compression Framework) is used for model compression and quantization. It is required for workflows involving post-training quantization or other advanced optimization techniques in OpenVINO.

For Generative AI models, install Optimum Intel® from Optimum Intel® Installation Instructions

Prerequisites#

Note: OpenVINO version in Olive >= 2025.3.0

Option 1: install Olive with OpenVINO extras#

pip install olive-ai[openvino]

Option 2: Install OpenVINO Runtime and OpenVINO Development Tools from Pypi#

pip install openvino>=2025.3.0
pip install nncf>=2.18.0
pip install onnxruntime-openvino

Install Optimum Intel® for Generative AI Workloads#

pip install optimum[openvino]<=1.24.0

More detailed instructions are available at Optimum Intel® Installation Instructions

Model Conversion#

OpenVINOConversion pass will convert the model from original framework to OpenVINO IR Model. PyTorchModelHandler, ONNXModelHandler and TensorFlowModelHandler are supported for now.

Please refer to OpenVINOConversion for more details about the pass and its config parameters.

Example Conversion Configuration#

{
    "type": "OpenVINOConversion",
    "input_shapes": [[1, 3, 32, 32]]
}

Model IoUpdate#

OpenVINOIoUpdate pass is a required pass used only for OpenVino IR Model. It converts OpenVINOModelHandler into a static shaped model and to update input and output tensors.

Please refer to OpenVINOIoUpdate for more details about the pass and its config parameters. The "static" parameter defaults to true and does not need to be explicitly overridden.

Example IO Update Configuration#

{
    "type": "OpenVINOIoUpdate",
    "input_shapes": [[1, 3, 32, 32]],
    "static": false
}

Post Training Quantization (PTQ)#

OpenVINOQuantization pass and OpenVINOQuantizationWithAccuracy passes will run Post-training quantization for OpenVINO models, as well as ONNX models, and support the uniform integer quantization method. This method allows moving from floating-point precision to integer precision (for example, 8-bit) for weights and activations during the inference time. It helps to reduce the model size, memory footprint and latency, as well as improve the computational efficiency, using integer arithmetic. During the quantization process the model undergoes the transformation process when additional operations, that contain quantization information, are inserted into the model. The actual transition to integer arithmetic happens at model inference.

Please refer to OpenVINOQuantization for more details about the OpenVINOQuantization pass and its config parameters.

Please refer to OpenVINOQuantizationWithAccuracy for more details about the OpenVINOQuantizationWithAccuracy pass and its config parameters.

Example PTQ Configuration#

{
    "type": "OpenVINOQuantizationWithAccuracy",
    "data_config": "calib_data_config",
    "validation_func": "validate",
    "max_drop": 0.01,
    "drop_type": "ABSOLUTE"
}

Weight Compression#

OpenVINOWeightCompression pass runs Weight Compression to compress Huggingface to OpenVINO model and Huggingface to ONNX model, as well as ONNX to ONNX model using Intel® NNCF.

Please refer to OpenVINOWeightCompression for more details about the OpenVINOWeightCompression pass and its config parameters.

Example Weight Compression Configuration#

{
    "type": "OpenVINOWeightCompression",
    "data_config": "compress_data_config",
    "transform_fn": "custom_transform_func",
    "extra_args": { "tokenizer": true },
    "compress_config": {
        "mode": "INT4_SYM",
        "ratio": 0.8
    }
}

Model Encapsulation#

OpenVINOEncapsulation pass is used to generate an onnx model that encapsulates a OpenVINO IR model. It supports OpenVINOModelHandler for now.

Please refer to OpenVINOEncapsulation for more details about the pass and its config parameters.

Example Encapsulation Configuration#

{
    "type": "OpenVINOEncapsulation",
    "target_device": "npu",
    "ov_version": "2025.1"
}

Optimum CLI Command for Generative AI workloads#

OpenVINOOptimumConversion pass will run optimum-cli export openvino command on the input Huggingface models to convert those to OpenVINO models and perform weight compression and quantization if necessary to produce an output OpenVINO model.

Please refer to OpenVINOOptimumConversion and also to optimum-cli export openvino for more details about the pass and its config parameters.

Example Optimum Conversion Configuration#

{
    "type": "OpenVINOOptimumConversion",
    "extra_args" : { "device": "npu" },
    "ov_quant_config": {
        "weight_format": "int4",
        "dataset": "wikitext2",
        "awq": true
    }
}