Commands¶

winml-cli exposes a CLI named winml with 12 subcommands covering the full journey from model discovery to a deployment-ready artifact. Every subcommand shares a consistent invocation style — winml <command> [flags] — and the same global flags are available on the root winml group.

The commands group by user intent. Discover (sys, inspect, catalog, analyze) helps you understand your hardware and model before writing any artifacts. Configure (config, optimize) produces a reusable build configuration and tunes the ONNX graph. Build (export, quantize, compile, build) runs the pipeline stages that produce deployment artifacts. Measure (perf, eval) benchmarks and validates the result.

The typical workflow follows that order: run winml sys to confirm hardware and EPs, then winml inspect or winml catalog to verify model support. Use winml config to generate a build configuration, then winml build to execute the full pipeline — or chain export → analyze → optimize → quantize → compile individually for finer control. Close with winml perf and winml eval to measure speed and accuracy.

Command map¶

Command	Group	Purpose
`sys`	Discover	Inspect your machine — devices, EPs, and runtime versions at a glance.
`inspect`	Discover	Inspect a model's tasks, classes, and hierarchy before committing to an export.
`catalog`	Discover	Browse the curated winml-cli catalog of validated models and benchmarks.
`config`	Configure	Generate a reusable build configuration for a Hugging Face model or ONNX file.
`export`	Build	Convert a PyTorch / Hugging Face model to ONNX, preserving module hierarchy.
`analyze`	Build	Verify an ONNX model is compatible with a target execution provider before deployment.
`optimize`	Build	Apply graph optimizations and fusions to an ONNX model to reduce node count and improve inference speed.
`quantize`	Build	Quantize an ONNX model with QDQ insertion and calibration-based scaling.
`compile`	Build	Compile an ONNX model to an EP-specific format for fast runtime loading.
`build`	Build	Run the entire winml-cli pipeline (export → optimize → quantize → compile) in one command.
`perf`	Measure	Benchmark an ONNX model's latency and throughput on a target device.
`eval`	Measure	Evaluate ONNX model accuracy on a standard dataset.

Choosing a command¶

I want to see what hardware and EPs I have → winml sys
I want to know if my model is supported → winml inspect
I want to browse validated models with known benchmarks → winml catalog
I want to verify EP operator compatibility before compiling → winml analyze
I want to convert a Hugging Face model to ONNX → winml export
I want to run the whole pipeline in one go → winml build
I want to benchmark latency and throughput → winml perf
I want to measure model accuracy → winml eval

Global flags¶

-v / --verbose, -q / --quiet, --version, and -h / --help live on the root winml group only. Subcommands access them through ctx.obj and do not redefine them. See src/winml/modelkit/cli.py for the canonical contract.

Shared flags¶

Several flags share semantics across the commands that accept them: -m / --model, -d / --device, --ep, -o / --output, -t / --task, and --precision. Defaults and accepted values can differ per command (e.g., -p is a short form for --precision only on config and quantize); check the Flags section of each command page rather than assuming they transfer.

Commands¶

Command map¶

Choosing a command¶

Global flags¶

Shared flags¶

See also¶