NLP |
deepseek |
Link |
QDQ: QDQ Model with 4-bit Weights & 16-bit Activations
QNN EP: PTQ + AOT Compilation for Qualcomm NPUs using QNN EP
Vitis AI EP: PTQ + AOT Compilation for AMD NPUs using Vitis AI EP
OpenVINO EP: PTQ + AOT Compilation for OpenVINO EP
Intel® NPU: PTQ + AWQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
|
|
llama2 |
Link |
CPU: with ONNX Runtime optimizations for optimized FP32 ONNX model
CPU: with ONNX Runtime optimizations for optimized INT8 ONNX model
CPU: with ONNX Runtime optimizations for optimized INT4 ONNX model
GPU: with ONNX Runtime optimizations for optimized FP16 ONNX model
GPU: with ONNX Runtime optimizations for optimized INT4 ONNX model
GPU: with QLoRA for model fine tune and ONNX Runtime optimizations for optimized ONNX model
AzureML compute: with AzureML compute to fine tune and optimize for your local GPUs
|
|
llama3 |
Link |
QDQ: QDQ Model with 4-bit Weights & 16-bit Activations
QNN EP: PTQ + AOT Compilation for Qualcomm NPUs using QNN EP
Vitis AI EP: PTQ + AOT Compilation for AMD NPUs using Vitis AI EP
OpenVINO EP: PTQ + AOT Compilation for AMD NPUs using OpenVINO EP
Intel® NPU: PTQ + AWQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
|
|
mistral |
Link |
CPU: with Optimum conversion and ONNX Runtime optimizations and Intel® Neural Compressor static quantization for optimized INT8 ONNX model
GPU: with ONNX Runtime optimizations for optimized FP16 ONNX model
|
|
open llama |
Link |
GPU: with Optimum conversion and merging and ONNX Runtime optimizations for optimized ONNX model
GPU: with SparseGPT and TorchTRT conversion for an optimized PyTorch model with sparsity
AzureML compute: with Optimum conversion and merging and ONNX Runtime optimizations in AzureML
CPU: with Optimum conversion and merging and ONNX Runtime optimizations and Intel® Neural Compressor 4-bits weight-only quantization for optimized INT4 ONNX model
|
|
phi2 |
Link |
CPU: with ONNX Runtime optimizations fp32/int4
GPU with ONNX Runtime optimizations fp16/int4, with PyTorch QLoRA for model fine tune
GPU with SliceGPT for an optimized PyTorch model with sparsity
|
|
phi3.5 |
Link |
QDQ: QDQ Model with 4-bit Weights & 16-bit Activations
QNN EP: PTQ + AOT Compilation for Qualcomm NPUs using QNN EP
Vitis AI EP: PTQ + AOT Compilation for AMD NPUs using Vitis AI EP
OpenVINO EP: PTQ + AOT Compilation for AMD NPUs using OpenVINO EP
Intel® NPU: PTQ + AWQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
|
|
phi4 |
Link |
Intel® NPU: PTQ + AWQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
|
|
qwen2.5 |
Link |
QDQ: QDQ Model with 4-bit Weights & 16-bit Activations
QNN EP: PTQ + AOT Compilation for Qualcomm NPUs using QNN EP
Vitis AI EP: PTQ + AOT Compilation for AMD NPUs using Vitis AI EP
OpenVINO EP: PTQ + AOT Compilation for AMD NPUs using OpenVINO EP
Intel® NPU: PTQ + AWQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
|
|
falcon |
Link |
GPU: with ONNX Runtime optimizations for optimized FP16 ONNX model
|
|
red pajama |
Link |
CPU: with Optimum conversion and merging and ONNX Runtime optimizations for a single optimized ONNX model
|
|
bert |
Link |
CPU: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model
CPU: with ONNX Runtime optimizations and Intel® Neural Compressor quantization for optimized INT8 ONNX model
CPU: with PyTorch QAT Customized Training Loop and ONNX Runtime optimizations for optimized ONNX INT8 model
GPU: with ONNX Runtime optimizations for CUDA EP
GPU: with ONNX Runtime optimizations for TRT EP
QNN EP: with ONNX Runtime optimizations for QNN EP
Vitis AI EP: with ONNX Runtime optimizations for Vitis AI EP
OpenVINO EP: with ONNX Runtime optimizations for OpenVINO EP
QDQ: with ONNX Runtime optimizations and INT8 quantization encoded in QDQ format
Intel® NPU: PTQ using Intel® NNCF for ONNX OpenVINO IR encapsulated model
|
|
deberta |
Link |
GPU: Optimize Azureml Registry Model with ONNX Runtime optimizations and quantization
|
|
gptj |
Link |
CPU: with Intel® Neural Compressor static/dynamic quantization for INT8 ONNX model
|
|
bge |
Link |
NPU: with ONNX Runtime optimizations for QNN EP
|
|
audio spectrogram transformer |
Link |
CPU: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model
|
Vision |
stable diffusion |
Link |
GPU: with ONNX Runtime optimization for DirectML EP
GPU: with ONNX Runtime optimization for CUDA EP
Intel CPU: with OpenVINO toolkit
QDQ: with ONNX Runtime static Quantization for ONNX INT8 model with QDQ format
|
|
stable diffusion XL |
Link |
GPU: with ONNX Runtime optimizations with DirectML EP
GPU: with ONNX Runtime optimization for CUDA EP
|
|
squeezenet |
Link |
GPU: with ONNX Runtime optimizations with DirectML EP
|
|
mobilenet |
Link |
QNN EP: with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
|
|
clip |
Link |
QNN EP: with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
Vitis AI EP: with ONNX Runtime static QDQ quantization for ONNX Runtime Vitis AI EP
QDQ: with ONNX Runtime static Quantization for ONNX INT8 model with QDQ format
Intel® NPU: PTQ using Intel® NNCF for ONNX OpenVINO IR encapsulated model
|
|
resnet |
Link |
CPU: with ONNX Runtime static/dynamic Quantization for ONNX INT8 model
QDQ: with ONNX Runtime static Quantization for ONNX INT8 model with QDQ format
CPU: with PyTorch QAT Default Training Loop and ONNX Runtime optimizations for ONNX INT8 model
CPU: with PyTorch QAT Lightning Module and ONNX Runtime optimizations for ONNX INT8 model
AMD DPU: with AMD Vitis-AI Quantization
Intel GPU: with ONNX Runtime optimizations with multiple EPs
QNN EP: with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
Intel® NPU: PTQ using Intel® NNCF for ONNX OpenVINO IR encapsulated model
|
|
VGG |
Link |
Qualcomm NPU: with SNPE toolkit
|
|
super resolution |
Link |
CPU: with ONNX Runtime pre/post processing integration for a single ONNX model
|
|
Vision Transformer |
Link |
QNN EP: with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
Vitis AI EP: with ONNX Runtime static QDQ quantization for ONNX Runtime Vitis AI EP
QDQ: with ONNX Runtime static Quantization for ONNX INT8 model with QDQ format
Intel® NPU: PTQ using Intel® NNCF for ONNX OpenVINO IR encapsulated model
|
|
Table Transformer Detection |
Link |
QNN EP: with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
|