QNN
Qualcomm AI Engine Direct is a Qualcomm Technologies Inc. software architecture for AI/ML use cases on Qualcomm chipsets and AI acceleration cores.
Olive provides tools to convert models from different frameworks such as ONNX, TensorFlow, and PyTorch to QNN model formats and quantize them to 8 bit fixed point for running on NPU cores. Olive uses the development tools available in the Qualcomm AI Engine Direct SDK (QNN SDK).
Prerequisites
Download and unzip QNN SDK
Download the QNN SDK and unzip the file.
Set the environment variable QNN_SDK_ROOT as
Configure Olive QNN
olive configure-qualcomm-sdk --py_version 3.8 --sdk qnn
Note: If olive
cannot be found in your path, you can use python -m olive
instead.
Model Conversion/Quantization
QNNConversion
converts ONNX, TensorFlow, or PyTorch models to QNN C++ model. Optionally, it can also quantize the model if a calibration dataset is provided using the --input_list
in the extra_args parameter.
The C++ model must be compiled into a model library for the desired target using the QNNModelLibGenerator
pass for inference on the target device.
Please refer to QNNConversion for more details about the pass and its config parameters.
Example Configuration
Conversion
{
"type": "QNNConversion"
}
Conversion and Quantization
{
"type": "QNNConversion",
"extra_args": "--input_list <input_list.txt>"
}
Model Library Generation
QNNModelLibGenerator
compiles the QNN C++ model into a model library for the desired target. The model library can be used for inference on the target device.
Please refer to QNNModelLibGenerator for more details about the pass and its config parameters.
Example Configuration
{
"type": "QNNModelLibGenerator",
"lib_targets": "x86_64-linux-clang"
}
Context Binary Generation
A QNN Context provides execution environment for graphs and operations. Context content can be cached into a binary form which later can be used for faster context/graph loading.
QNNContextBinaryGenerator
generated the context binary from a compiled model library using a specific backend.
Please refer to QNNContextBinaryGenerator for more details about the pass and its config parameters.
Example Configuration
{
"type": "QNNContextBinaryGenerator",
"backend": "<QNN_BACKEND.so>"
}