NLP |
deepseek |
Link |
QDQ : QDQ Model with 4-bit Weights & 16-bit Activations
Qualcomm NPU : PTQ + AOT Compilation for Qualcomm NPUs using QNN EP
|
|
llama2 |
Link |
CPU : with ONNX Runtime optimizations for optimized FP32 ONNX model
CPU : with ONNX Runtime optimizations for optimized INT8 ONNX model
CPU : with ONNX Runtime optimizations for optimized INT4 ONNX model
GPU : with ONNX Runtime optimizations for optimized FP16 ONNX model
GPU : with ONNX Runtime optimizations for optimized INT4 ONNX model
GPU : with QLoRA for model fine tune and ONNX Runtime optimizations for optimized ONNX model
AzureML compute : with AzureML compute to fine tune and optimize for your local GPUs
|
|
llama3 |
Link |
QDQ : QDQ Model with 4-bit Weights & 16-bit Activations
Qualcomm NPU : PTQ + AOT Compilation for Qualcomm NPUs using QNN EP
|
|
mistral |
Link |
CPU : with Optimum conversion and ONNX Runtime optimizations and Intel® Neural Compressor static quantization for optimized INT8 ONNX model
GPU : with ONNX Runtime optimizations for optimized FP16 ONNX model
|
|
open llama |
Link |
GPU : with Optimum conversion and merging and ONNX Runtime optimizations for optimized ONNX model
GPU : with SparseGPT and TorchTRT conversion for an optimized PyTorch model with sparsity
AzureML compute : with Optimum conversion and merging and ONNX Runtime optimizations in AzureML
CPU : with Optimum conversion and merging and ONNX Runtime optimizations and Intel® Neural Compressor 4-bits weight-only quantization for optimized INT4 ONNX model
|
|
phi2 |
Link |
CPU : with ONNX Runtime optimizations fp32/int4
GPU with ONNX Runtime optimizations fp16/int4, with PyTorch QLoRA for model fine tune
GPU with SliceGPT for an optimized PyTorch model with sparsity
|
|
phi3.5 |
Link |
QDQ : QDQ Model with 4-bit Weights & 16-bit Activations
Qualcomm NPU : PTQ + AOT Compilation for Qualcomm NPUs using QNN EP
|
|
qwen2.5 |
Link |
QDQ : QDQ Model with 4-bit Weights & 16-bit Activations
Qualcomm NPU : PTQ + AOT Compilation for Qualcomm NPUs using QNN EP
|
|
falcon |
Link |
GPU : with ONNX Runtime optimizations for optimized FP16 ONNX model
|
|
red pajama |
Link |
CPU : with Optimum conversion and merging and ONNX Runtime optimizations for a single optimized ONNX model
|
|
bert |
Link |
CPU : with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model
CPU : with ONNX Runtime optimizations and Intel® Neural Compressor quantization for optimized INT8 ONNX model
CPU : with PyTorch QAT Customized Training Loop and ONNX Runtime optimizations for optimized ONNX INT8 model
GPU : with ONNX Runtime optimizations for CUDA EP
GPU : with ONNX Runtime optimizations for TRT EP
NPU : with ONNX Runtime optimizations for QNN EP
QDQ : with ONNX Runtime optimizations and INT8 quantization encoded in QDQ format
|
|
deberta |
Link |
GPU : Optimize Azureml Registry Model with ONNX Runtime optimizations and quantization
|
|
gptj |
Link |
CPU : with Intel® Neural Compressor static/dynamic quantization for INT8 ONNX model
|
|
bge |
Link |
NPU : with ONNX Runtime optimizations for QNN EP
|
Audio |
whisper |
Link |
CPU : with ONNX Runtime optimizations for all-in-one ONNX model in FP32
CPU : with ONNX Runtime optimizations for all-in-one ONNX model in INT8
CPU : with ONNX Runtime optimizations and Intel® Neural Compressor Dynamic Quantization for all-in-one ONNX model in INT8
GPU : with ONNX Runtime optimizations for all-in-one ONNX model in FP32
GPU : with ONNX Runtime optimizations for all-in-one ONNX model in FP16
GPU : with ONNX Runtime optimizations for all-in-one ONNX model in INT8
|
|
audio spectrogram transformer |
Link |
CPU : with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model
|
Vision |
stable diffusion stable diffusion XL |
Link |
GPU : with ONNX Runtime optimization for DirectML EP
GPU : with ONNX Runtime optimization for CUDA EP
Intel CPU : with OpenVINO toolkit
|
|
squeezenet |
Link |
GPU : with ONNX Runtime optimizations with DirectML EP
|
|
mobilenet |
Link |
Qualcomm NPU : with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
|
|
clip |
Link |
Qualcomm NPU : with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
QDQ : with ONNX Runtime static Quantization for ONNX INT8 model with QDQ format
|
|
resnet |
Link |
CPU : with ONNX Runtime static/dynamic Quantization for ONNX INT8 model
QDQ : with ONNX Runtime static Quantization for ONNX INT8 model with QDQ format
CPU : with PyTorch QAT Default Training Loop and ONNX Runtime optimizations for ONNX INT8 model
CPU : with PyTorch QAT Lightning Module and ONNX Runtime optimizations for ONNX INT8 model
AMD DPU : with AMD Vitis-AI Quantization
Intel GPU : with ONNX Runtime optimizations with multiple EPs
Qualcomm NPU : with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
|
|
VGG |
Link |
Qualcomm NPU : with SNPE toolkit
|
|
inception |
Link |
Qualcomm NPU : with SNPE toolkit
|
|
super resolution |
Link |
CPU : with ONNX Runtime pre/post processing integration for a single ONNX model
|
|
Vision Transformer |
Link |
Qualcomm NPU : with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
QDQ : with ONNX Runtime static Quantization for ONNX INT8 model with QDQ format
|
|
Table Transformer Detection |
Link |
Qualcomm NPU : with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
|