Supported Models
winml-cli supports a wide range of model architectures and tasks. This page
lists what's validated and how to discover model support.
Discovery Commands
# Browse the curated catalog (57 validated models)
uv run winml catalog
# Filter by task
uv run winml catalog -t image-classification
# Check if a specific model is supported
uv run winml inspect -m microsoft/resnet-50
# List all known tasks
uv run winml inspect --list-tasks
Supported Tasks
winml-cli recognizes 35 task types across vision, NLP, audio, and multimodal domains. Of these, 16 have dedicated inference classes; the remainder are supported via the generic task fallback.
Vision
Task
Example Models
image-classification
ResNet, ConvNeXt, ViT, Swin
image-segmentation
Segformer, Mask2Former
semantic-segmentation
Segformer
object-detection
DETR, YOLOS, Table-Transformer
depth-estimation
Depth Anything, ZoeDepth
image-feature-extraction
DINOv2, ViT
zero-shot-image-classification
CLIP, SigLIP
NLP
Task
Example Models
text-classification
BERT, RoBERTa, XLM-RoBERTa
token-classification
BERT, RoBERTa (NER)
question-answering
BERT, RoBERTa
fill-mask
BERT, RoBERTa
feature-extraction
BGE, BERT, all-MiniLM
text-generation
Qwen3 (composite)
text2text-generation
T5, BART, Marian
Audio
Task
Example Models
automatic-speech-recognition
Whisper
audio-classification
Wav2Vec2
Multimodal
Task
Example Models
zero-shot-image-classification
CLIP (text + vision)
image-to-text
VisionEncoderDecoder
visual-question-answering
BLIP
Validated Model Catalog
The following models have been validated end-to-end with EP compatibility
testing. Use winml catalog to browse the full list interactively.
Image Classification
Model
Architecture
AdamCodd/vit-base-nsfw-detector
ViT
Falconsai/nsfw_image_detection
ViT
amunchet/rorshark-vit-base
ViT
apple/mobilevit-small
MobileViT
dima806/fairface_age_image_detection
ViT
google/vit-base-patch16-224
ViT
microsoft/resnet-18
ResNet
rizvandwiki/gender-classification
ViT
Model
Architecture
facebook/dino-vitb16
ViT
facebook/dino-vits16
ViT
facebook/dinov2-base
DINOv2
facebook/dinov2-large
DINOv2
facebook/dinov2-small
DINOv2
google/vit-base-patch16-224-in21k
ViT
microsoft/rad-dino
DINOv2
Model
Architecture
laion/CLIP-ViT-B-32-laion2B-s34B-b79K
CLIP
openai/clip-vit-base-patch16
CLIP
openai/clip-vit-base-patch32
CLIP
sentence-transformers/all-MiniLM-L6-v2
BERT
sentence-transformers/all-mpnet-base-v2
MPNet
sentence-transformers/multi-qa-mpnet-base-dot-v1
MPNet
Sentence Similarity
Model
Architecture
BAAI/bge-large-en-v1.5
BERT
BAAI/bge-small-en-v1.5
BERT
sentence-transformers/all-MiniLM-L6-v2
BERT
sentence-transformers/all-mpnet-base-v2
MPNet
sentence-transformers/multi-qa-mpnet-base-dot-v1
MPNet
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
BERT
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
XLM-RoBERTa
Fill-Mask
Model
Architecture
FacebookAI/roberta-base
RoBERTa
FacebookAI/xlm-roberta-base
XLM-RoBERTa
distilbert/distilbert-base-uncased
DistilBERT
google-bert/bert-base-multilingual-cased
BERT
google-bert/bert-base-multilingual-uncased
BERT
google-bert/bert-base-uncased
BERT
sentence-transformers/all-mpnet-base-v2
MPNet
sentence-transformers/multi-qa-mpnet-base-dot-v1
MPNet
Text Classification
Model
Architecture
cardiffnlp/twitter-roberta-base-sentiment-latest
RoBERTa
cross-encoder/ms-marco-MiniLM-L4-v2
BERT
cross-encoder/ms-marco-MiniLM-L6-v2
BERT
distilbert/distilbert-base-uncased-finetuned-sst-2-english
DistilBERT
Token Classification
Model
Architecture
Isotonic/distilbert_finetuned_ai4privacy_v2
DistilBERT
Jean-Baptiste/camembert-ner-with-dates
CamemBERT
kredor/punctuate-all
XLM-RoBERTa
w11wo/indonesian-roberta-base-posp-tagger
RoBERTa
Question Answering
Model
Architecture
ahotrod/electra_large_discriminator_squad2_512
Electra
deepset/bert-large-uncased-whole-word-masking-squad2
BERT
deepset/roberta-base-squad2
RoBERTa
deepset/tinyroberta-squad2
RoBERTa
distilbert/distilbert-base-cased-distilled-squad
DistilBERT
distilbert/distilbert-base-uncased-distilled-squad
DistilBERT
monologg/koelectra-small-v2-distilled-korquad-384
Electra
Zero-Shot Classification
Model
Architecture
lxyuan/distilbert-base-multilingual-cased-sentiments-student
DistilBERT
Zero-Shot Image Classification
Model
Architecture
laion/CLIP-ViT-B-32-laion2B-s34B-b79K
CLIP
Object Detection
Model
Architecture
hustvl/yolos-small
YOLOS
valentinafeve/yolos-fashionpedia
YOLOS
Depth Estimation
Model
Architecture
Intel/dpt-hybrid-midas
DPT
Execution Provider Compatibility
Each validated model is tested against available EPs:
EP
Alias
Devices
Notes
NvTensorRTRTXExecutionProvider
nvtensorrtrtx, nv_tensorrt_rtx
GPU
NVIDIA TensorRT-RTX; NVIDIA GPU with TensorRT runtime
CUDAExecutionProvider
cuda
GPU
NVIDIA CUDA; any CUDA-capable GPU
MIGraphXExecutionProvider
migraphx
GPU
AMD ROCm MIGraphX
QNNExecutionProvider
qnn
NPU, GPU
Qualcomm Snapdragon; bundled in ORT
OpenVINOExecutionProvider
openvino
NPU, GPU, CPU
Intel hardware
DmlExecutionProvider
dml
GPU
DirectML; any DirectX 12 GPU
CPUExecutionProvider
cpu
CPU
Always available
VitisAIExecutionProvider
vitisai
NPU
AMD/Xilinx
Adding Unsupported Models
If your model architecture isn't in the catalog, winml-cli may still support it
through auto-detection:
# Try inspecting first
uv run winml inspect -m your-org/your-model
# If "Status: Supported", proceed normally
uv run winml build -m your-org/your-model -d auto -o output/
For truly custom architectures, use --trust-remote-code to allow execution of
model code from the Hugging Face Hub.
See also