Skip to content

winml build

Run the entire winml-cli pipeline (export → optimize → quantize → compile) in one command.

When to use this

Use winml build when you want to go from a Hugging Face model ID (or an existing .onnx file) to a deployment-ready artifact in a single invocation, without manually chaining winml export, winml optimize, winml quantize, and winml compile. A build config file — generated by winml config — controls every stage of the pipeline.

Synopsis

$ winml build [options]

Flags

Flag Short Type Default Description
--config -c path None WinMLBuildConfig JSON file, generated by winml config. If omitted, config is auto-generated from -m.
--model -m string None Hugging Face model ID or path to an existing .onnx file.
--output-dir -o path None Directory for all build artifacts. Mutually exclusive with --use-cache.
--use-cache/--no-use-cache flag false Store artifacts in the winml-cli global cache (~/.cache/winml/). Mutually exclusive with --output-dir.
--rebuild/--no-rebuild flag false Overwrite existing artifacts and re-run the full pipeline.
--quant/--no-quant flag true Run the quantization stage (use --no-quant to skip), overriding the config.
--no-compile / --compile flag None Override compilation. --compile forces enable (config must have a compile section). --no-compile forces skip. Default: inherit from config.
--optimize/--no-optimize flag true Run the optimization stage (use --no-optimize to skip).
--ep string None Target execution provider for the analyzer (e.g., qnn). Falls back to the compile config EP if not set.
--device -d string auto Target device for the analyzer (e.g., npu, gpu). Default: auto (auto-detect).
--analyze/--no-analyze flag true Run the analyzer loop during build (use --no-analyze to skip).
--max-optim-iterations integer None Maximum autoconf re-optimization rounds (3 enforced internally when not set). --no-analyze implicitly sets this to 0.
--trust-remote-code/--no-trust-remote-code flag false Allow executing custom code from model repositories. Use only with trusted sources.
--allow-unsupported-nodes/--no-allow-unsupported-nodes flag false Allow unsupported nodes to remain in the graph instead of failing the build.
--help -h flag Show this message and exit.

How it works

winml build reads a WinMLBuildConfig JSON file (from winml config) that encodes device, precision, export, quantization, and compilation settings. When -m is a Hugging Face model ID, the full pipeline runs: export → optimize → quantize → compile. When -m points to an existing .onnx file, the export stage is skipped and the pipeline starts at optimization. After compilation, an optional analyzer loop (--max-optim-iterations) re-evaluates graph quality and applies further passes; --no-analyze disables it for a deterministic single-pass build. Individual stages can be suppressed with --no-quant, --no-compile, and --no-optimize without touching the config file.

Reproducible CI/CD builds

The config file is a portable, self-contained pipeline specification. Check it into source control and invoke winml build -c config.json in CI to produce identical artifacts without manual flag management. Set "auto": false in the config to disable the autoconf discovery loop for fully deterministic output.

Examples

# Full pipeline: HF model → export → optimize → quantize → compile
winml build -c config.json -m microsoft/resnet-50 -o output/
winml build
  Config:     config.json
  Model:      microsoft/resnet-50
  Output:     output/

  export       done  (28.3s)
  optimize     done  (4.1s)
  quantize     done  (6.8s)
  compile      done  (14.2s)

  Build complete in 53.4s
  Final artifact: output/resnet50_ctx.onnx
# Start from a pre-exported ONNX file (skips export stage)
winml build -c config.json -m resnet50.onnx -o output/
# Export and optimize only — skip quantization and compilation for quick testing
winml build -c config.json -m bert-base-uncased -o output/ \
  --no-quant --no-compile
# Force a clean rebuild, overwriting any cached artifacts
winml build -c config.json -m facebook/convnext-tiny-224 -o output/ --rebuild
# Use the global cache and cap optimizer iterations for faster turnaround
winml build -c config.json -m microsoft/resnet-50 \
  --use-cache --max-optim-iterations 1

Common pitfalls

  • Either --output-dir or --use-cache is required; they are mutually exclusive. Omitting both raises an error immediately.
  • --use-cache is not supported in module mode. When the config is a JSON array (module mode), only --output-dir is accepted.
  • The config file must come from winml config. The schema is strict; unknown keys are rejected.
  • Existing artifacts are reused by default. Pass --rebuild to force a fresh run after changing the config.

See also