Skip to content

Commands

winml-cli exposes a CLI named winml with 12 subcommands covering the full journey from model discovery to a deployment-ready artifact. Every subcommand shares a consistent invocation style — winml <command> [flags] — and the same global flags are available on the root winml group.

The commands group by user intent. Discover (sys, inspect, catalog, analyze) helps you understand your hardware and model before writing any artifacts. Configure (config, optimize) produces a reusable build configuration and tunes the ONNX graph. Build (export, quantize, compile, build) runs the pipeline stages that produce deployment artifacts. Measure (perf, eval) benchmarks and validates the result.

The typical workflow follows that order: run winml sys to confirm hardware and EPs, then winml inspect or winml catalog to verify model support. Use winml config to generate a build configuration, then winml build to execute the full pipeline — or chain exportanalyzeoptimizequantizecompile individually for finer control. Close with winml perf and winml eval to measure speed and accuracy.

Command map

Command Group Purpose
sys Discover Inspect your machine — devices, EPs, and runtime versions at a glance.
inspect Discover Inspect a model's tasks, classes, and hierarchy before committing to an export.
catalog Discover Browse the curated winml-cli catalog of validated models and benchmarks.
config Configure Generate a reusable build configuration for a Hugging Face model or ONNX file.
export Build Convert a PyTorch / Hugging Face model to ONNX, preserving module hierarchy.
analyze Build Verify an ONNX model is compatible with a target execution provider before deployment.
optimize Build Apply graph optimizations and fusions to an ONNX model to reduce node count and improve inference speed.
quantize Build Quantize an ONNX model with QDQ insertion and calibration-based scaling.
compile Build Compile an ONNX model to an EP-specific format for fast runtime loading.
build Build Run the entire winml-cli pipeline (export → optimize → quantize → compile) in one command.
perf Measure Benchmark an ONNX model's latency and throughput on a target device.
eval Measure Evaluate ONNX model accuracy on a standard dataset.

Choosing a command

  • I want to see what hardware and EPs I havewinml sys
  • I want to know if my model is supportedwinml inspect
  • I want to browse validated models with known benchmarkswinml catalog
  • I want to verify EP operator compatibility before compilingwinml analyze
  • I want to convert a Hugging Face model to ONNXwinml export
  • I want to run the whole pipeline in one gowinml build
  • I want to benchmark latency and throughputwinml perf
  • I want to measure model accuracywinml eval

Global flags

-v / --verbose, -q / --quiet, --version, and -h / --help live on the root winml group only. Subcommands access them through ctx.obj and do not redefine them. See src/winml/modelkit/cli.py for the canonical contract.

Shared flags

Several flags share semantics across the commands that accept them: -m / --model, -d / --device, --ep, -o / --output, -t / --task, and --precision. Defaults and accepted values can differ per command (e.g., -p is a short form for --precision only on config and quantize); check the Flags section of each command page rather than assuming they transfer.

See also