Skip to content

Supported Models

Windows ML CLI has validated a set of models for compatibility across all Execution Providers (EPs)—see the full Model Accuracy Report.

winml-cli supports a wide range of model architectures and tasks. This page lists what's validated and how to discover model support.


Discovery Commands

# Browse the curated catalog (64 validated models)
uv run winml catalog

# Filter by task
uv run winml catalog -t image-classification

# Check if a specific model is supported
uv run winml inspect -m microsoft/resnet-50

# List all known tasks
uv run winml inspect --list-tasks

Supported Tasks

winml-cli recognizes 35 task types across vision, NLP, audio, and multimodal domains. Of these, 16 have dedicated inference classes; the remainder are supported via the generic task fallback.

Vision

Task Example Models
image-classification ResNet, ConvNeXt, ViT, Swin
image-segmentation Segformer, Mask2Former
semantic-segmentation Segformer
object-detection DETR, YOLOS, Table-Transformer
depth-estimation Depth Anything, ZoeDepth
image-feature-extraction DINOv2, ViT
zero-shot-image-classification CLIP, SigLIP

NLP

Task Example Models
text-classification BERT, RoBERTa, XLM-RoBERTa
token-classification BERT, RoBERTa (NER)
question-answering BERT, RoBERTa
fill-mask BERT, RoBERTa
feature-extraction BGE, BERT, all-MiniLM
text-generation Qwen3 (composite)
text2text-generation T5, BART, Marian

Audio

Task Example Models
automatic-speech-recognition Whisper
audio-classification Wav2Vec2

Multimodal

Task Example Models
zero-shot-image-classification CLIP (text + vision)
image-to-text VisionEncoderDecoder
visual-question-answering BLIP

Validated Model Catalog

The following models have been validated end-to-end with EP compatibility testing. Use winml catalog to browse the full list interactively.

Image Classification

Model Architecture
apple/mobilevit-small MobileViT
dima806/fairface_age_image_detection ViT
facebook/convnext-tiny-224 ConvNeXt
google/vit-base-patch16-224 ViT
microsoft/resnet-18 ResNet
microsoft/resnet-50 ResNet
microsoft/swin-large-patch4-window7-224 Swin
rizvandwiki/gender-classification ViT

Image Feature Extraction

Model Architecture
facebook/dino-vitb16 ViT
facebook/dino-vits16 ViT
facebook/dinov2-small DINOv2
google/vit-base-patch16-224-in21k ViT

Feature Extraction (Text)

Model Architecture
BAAI/bge-base-en-v1.5 BERT
BAAI/bge-m3 XLM-RoBERTa
BAAI/bge-small-en-v1.5 BERT
google-bert/bert-base-multilingual-cased BERT
Intel/bert-base-uncased-mrpc BERT
laion/CLIP-ViT-B-32-laion2B-s34B-b79K CLIP
openai/clip-vit-base-patch16 CLIP
openai/clip-vit-base-patch32 CLIP
sentence-transformers/all-MiniLM-L6-v2 BERT
sentence-transformers/all-mpnet-base-v2 MPNet
sentence-transformers/multi-qa-mpnet-base-dot-v1 MPNet
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 BERT

Sentence Similarity

Model Architecture
BAAI/bge-base-en-v1.5 BERT
BAAI/bge-large-en-v1.5 BERT
BAAI/bge-m3 XLM-RoBERTa
BAAI/bge-small-en-v1.5 BERT
sentence-transformers/all-MiniLM-L6-v2 BERT
sentence-transformers/all-mpnet-base-v2 MPNet
sentence-transformers/multi-qa-mpnet-base-dot-v1 MPNet
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 BERT
sentence-transformers/paraphrase-multilingual-mpnet-base-v2 XLM-RoBERTa

Fill-Mask

Model Architecture
distilbert/distilbert-base-uncased DistilBERT
FacebookAI/roberta-base RoBERTa
FacebookAI/roberta-large RoBERTa
FacebookAI/xlm-roberta-base XLM-RoBERTa
google-bert/bert-base-multilingual-cased BERT
google-bert/bert-base-multilingual-uncased BERT
google-bert/bert-base-uncased BERT

Text Classification

Model Architecture
cardiffnlp/twitter-roberta-base-sentiment-latest RoBERTa
distilbert/distilbert-base-uncased-finetuned-sst-2-english DistilBERT
Intel/bert-base-uncased-mrpc BERT
ProsusAI/finbert BERT

Token Classification

Model Architecture
Babelscape/wikineural-multilingual-ner BERT
dbmdz/bert-large-cased-finetuned-conll03-english BERT
dslim/bert-base-NER BERT
Isotonic/distilbert_finetuned_ai4privacy_v2 DistilBERT
w11wo/indonesian-roberta-base-posp-tagger RoBERTa

Question Answering

Model Architecture
deepset/bert-large-uncased-whole-word-masking-squad2 BERT
deepset/roberta-base-squad2 RoBERTa
deepset/tinyroberta-squad2 RoBERTa
distilbert/distilbert-base-cased-distilled-squad DistilBERT
distilbert/distilbert-base-uncased-distilled-squad DistilBERT
google-bert/bert-large-uncased-whole-word-masking-finetuned-squad BERT

Zero-Shot Classification

Model Architecture
joeddav/xlm-roberta-large-xnli XLM-RoBERTa

Zero-Shot Image Classification

Model Architecture
openai/clip-vit-base-patch16 CLIP

Image Segmentation

Model Architecture
mattmdjaga/segformer_b2_clothes Segformer
nvidia/segformer-b1-finetuned-ade-512-512 Segformer
nvidia/segformer-b2-finetuned-ade-512-512 Segformer
nvidia/segformer-b5-finetuned-ade-640-640 Segformer

Image-to-Text

Model Architecture
microsoft/trocr-base-handwritten VisionEncoderDecoder
microsoft/trocr-base-printed VisionEncoderDecoder
microsoft/trocr-large-handwritten VisionEncoderDecoder

Execution Provider Compatibility

Each validated model is tested against available EPs:

EP Alias Devices Notes
NvTensorRTRTXExecutionProvider nvtensorrtrtx, nv_tensorrt_rtx GPU NVIDIA TensorRT-RTX; NVIDIA GPU with TensorRT runtime
CUDAExecutionProvider cuda GPU NVIDIA CUDA; any CUDA-capable GPU
MIGraphXExecutionProvider migraphx GPU AMD ROCm MIGraphX
QNNExecutionProvider qnn NPU, GPU Qualcomm Snapdragon; bundled in ORT
OpenVINOExecutionProvider openvino NPU, GPU, CPU Intel hardware
DmlExecutionProvider dml GPU DirectML; any DirectX 12 GPU
CPUExecutionProvider cpu CPU Always available
VitisAIExecutionProvider vitisai NPU AMD/Xilinx

Adding Unsupported Models

If your model architecture isn't in the catalog, winml-cli may still support it through auto-detection:

# Try inspecting first
uv run winml inspect -m your-org/your-model

# If "Status: Supported", proceed normally
uv run winml build -m your-org/your-model -d auto -o output/

For truly custom architectures, use --trust-remote-code to allow execution of model code from the Hugging Face Hub.


See also