Examples#

Scenario	Model	Examples	Hardware Targeted Optimization
NLP	deepseek	Link	`QDQ`: QDQ Model with 4-bit Weights & 16-bit Activations `QNN EP`: PTQ + AOT Compilation for Qualcomm NPUs using QNN EP `Vitis AI EP`: PTQ + AOT Compilation for AMD NPUs using Vitis AI EP `OpenVINO EP`: PTQ + AOT Compilation for OpenVINO EP `Intel® NPU`: PTQ + AWQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
	llama2	Link	`CPU`: with ONNX Runtime optimizations for optimized FP32 ONNX model `CPU`: with ONNX Runtime optimizations for optimized INT8 ONNX model `CPU`: with ONNX Runtime optimizations for optimized INT4 ONNX model `GPU`: with ONNX Runtime optimizations for optimized FP16 ONNX model `GPU`: with ONNX Runtime optimizations for optimized INT4 ONNX model `GPU`: with QLoRA for model fine tune and ONNX Runtime optimizations for optimized ONNX model
	llama3	Link	`QDQ`: QDQ Model with 4-bit Weights & 16-bit Activations `QNN EP`: PTQ + AOT Compilation for Qualcomm NPUs using QNN EP `Vitis AI EP`: PTQ + AOT Compilation for AMD NPUs using Vitis AI EP `OpenVINO EP`: PTQ + AOT Compilation for AMD NPUs using OpenVINO EP `Intel® NPU`: PTQ + AWQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
	mistral	Link	`CPU`: with Optimum conversion and ONNX Runtime optimizations and Intel® Neural Compressor static quantization for optimized INT8 ONNX model `GPU`: with ONNX Runtime optimizations for optimized FP16 ONNX model
	open llama	Link	`GPU`: with Optimum conversion and merging and ONNX Runtime optimizations for optimized ONNX model `GPU`: with SparseGPT and TorchTRT conversion for an optimized PyTorch model with sparsity `CPU`: with Optimum conversion and merging and ONNX Runtime optimizations and Intel® Neural Compressor 4-bits weight-only quantization for optimized INT4 ONNX model
	phi2	Link	`CPU`: with ONNX Runtime optimizations fp32/int4 `GPU` with ONNX Runtime optimizations fp16/int4, with PyTorch QLoRA for model fine tune `GPU` with SliceGPT for an optimized PyTorch model with sparsity
	phi3	Link	`Intel® GPU`: PTQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
	phi3.5	Link	`QDQ`: QDQ Model with 4-bit Weights & 16-bit Activations `QNN EP`: PTQ + AOT Compilation for Qualcomm NPUs using QNN EP `Vitis AI EP`: PTQ + AOT Compilation for AMD NPUs using Vitis AI EP `OpenVINO EP`: PTQ + AOT Compilation for AMD NPUs using OpenVINO EP `Intel® NPU`: PTQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model `Intel® GPU`: PTQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
	phi4	Link	`Intel® NPU`: PTQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model `Intel® GPU`: PTQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
	qwen2.5	Link	`QDQ`: QDQ Model with 4-bit Weights & 16-bit Activations `QNN EP`: PTQ + AOT Compilation for Qualcomm NPUs using QNN EP `Vitis AI EP`: PTQ + AOT Compilation for AMD NPUs using Vitis AI EP `OpenVINO EP`: PTQ + AOT Compilation for AMD NPUs using OpenVINO EP `Intel® NPU`: PTQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model `Intel® GPU`: PTQ with 4-bit weight compression using Intel® Optimum OpenVINO for ONNX OpenVINO IR Encapsulated Model
	falcon	Link	`GPU`: with ONNX Runtime optimizations for optimized FP16 ONNX model
	red pajama	Link	`CPU`: with Optimum conversion and merging and ONNX Runtime optimizations for a single optimized ONNX model
	bert	Link	`CPU`: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model `CPU`: with ONNX Runtime optimizations and Intel® Neural Compressor quantization for optimized INT8 ONNX model `GPU`: with ONNX Runtime optimizations for CUDA EP `GPU`: with ONNX Runtime optimizations for TRT EP `QNN EP`: with ONNX Runtime optimizations for QNN EP `Vitis AI EP`: with ONNX Runtime optimizations for Vitis AI EP `OpenVINO EP`: with ONNX Runtime optimizations for OpenVINO EP `QDQ`: with ONNX Runtime optimizations and INT8 quantization encoded in QDQ format `Intel® NPU`: PTQ using Intel® NNCF for ONNX OpenVINO IR encapsulated model
	deberta	Link	`GPU`: Optimize Azureml Registry Model with ONNX Runtime optimizations and quantization
	gptj	Link	`CPU`: with Intel® Neural Compressor static/dynamic quantization for INT8 ONNX model
	bge	Link	`NPU`: with ONNX Runtime optimizations for QNN EP
	audio spectrogram transformer	Link	`CPU`: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model
Vision	stable diffusion	Link	`GPU`: with ONNX Runtime optimization for DirectML EP `GPU`: with ONNX Runtime optimization for CUDA EP `Intel CPU`: with OpenVINO toolkit `QDQ`: with ONNX Runtime static Quantization for ONNX INT8 model with QDQ format
	stable diffusion XL	Link	`GPU`: with ONNX Runtime optimizations with DirectML EP `GPU`: with ONNX Runtime optimization for CUDA EP
	squeezenet	Link	`GPU`: with ONNX Runtime optimizations with DirectML EP
	mobilenet	Link	`QNN EP`: with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP
	resnet	Link	`CPU`: with ONNX Runtime static/dynamic Quantization for ONNX INT8 model `QDQ`: with ONNX Runtime static Quantization for ONNX INT8 model with QDQ format `AMD DPU`: with AMD Vitis-AI Quantization `QNN EP`: with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP `Intel® NPU`: PTQ using Intel® NNCF for ONNX OpenVINO IR encapsulated model `Intel® NPU`: PTQ using Intel® NNCF for ONNX model
	Table Transformer Detection	Link	`QNN EP`: with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP