winml build¶

Run the entire winml-cli pipeline (export → optimize → quantize → compile) in one command.

When to use this¶

Use winml build when you want to go from a Hugging Face model ID (or an existing .onnx file) to a deployment-ready artifact in a single invocation, without manually chaining winml export, winml optimize, winml quantize, and winml compile. A build config file — generated by winml config — controls every stage of the pipeline.

Synopsis¶

$ winml build [options]

Flags¶

Flag	Short	Type	Default	Description
`--config`	`-c`	path	`None`	`WinMLBuildConfig` JSON file, generated by `winml config`. If omitted, config is auto-generated from `-m`.
`--model`	`-m`	string	`None`	Hugging Face model ID or path to an existing `.onnx` file.
`--output-dir`	`-o`	path	`None`	Directory for all build artifacts. Mutually exclusive with `--use-cache`.
`--use-cache/--no-use-cache`		flag	`false`	Store artifacts in the winml-cli global cache (`~/.cache/winml/`). Mutually exclusive with `--output-dir`.
`--rebuild/--no-rebuild`		flag	`false`	Overwrite existing artifacts and re-run the full pipeline.
`--quant/--no-quant`		flag	`true`	Run the quantization stage (use `--no-quant` to skip), overriding the config.
`--no-compile` / `--compile`		flag	`None`	Override compilation. `--compile` forces enable (config must have a compile section). `--no-compile` forces skip. Default: inherit from config.
`--optimize/--no-optimize`		flag	`true`	Run the optimization stage (use `--no-optimize` to skip).
`--ep`		string	`None`	Target execution provider for the analyzer (e.g., `qnn`). Falls back to the compile config EP if not set.
`--device`	`-d`	string	`auto`	Target device for the analyzer (e.g., `npu`, `gpu`). Default: `auto` (auto-detect).
`--analyze/--no-analyze`		flag	`true`	Run the analyzer loop during build (use `--no-analyze` to skip).
`--max-optim-iterations`		integer	`None`	Maximum autoconf re-optimization rounds (3 enforced internally when not set). `--no-analyze` implicitly sets this to 0.
`--trust-remote-code/--no-trust-remote-code`		flag	`false`	Allow executing custom code from model repositories. Use only with trusted sources.
`--allow-unsupported-nodes/--no-allow-unsupported-nodes`		flag	`false`	Allow unsupported nodes to remain in the graph instead of failing the build.
`--help`	`-h`	flag		Show this message and exit.

How it works¶

winml build reads a WinMLBuildConfig JSON file (from winml config) that encodes device, precision, export, quantization, and compilation settings. When -m is a Hugging Face model ID, the full pipeline runs: export → optimize → quantize → compile. When -m points to an existing .onnx file, the export stage is skipped and the pipeline starts at optimization. After compilation, an optional analyzer loop (--max-optim-iterations) re-evaluates graph quality and applies further passes; --no-analyze disables it for a deterministic single-pass build. Individual stages can be suppressed with --no-quant, --no-compile, and --no-optimize without touching the config file.

Reproducible CI/CD builds

The config file is a portable, self-contained pipeline specification. Check it into source control and invoke winml build -c config.json in CI to produce identical artifacts without manual flag management. Set "auto": false in the config to disable the autoconf discovery loop for fully deterministic output.

Examples¶

# Full pipeline: HF model → export → optimize → quantize → compile
winml build -c config.json -m microsoft/resnet-50 -o output/

winml build
  Config:     config.json
  Model:      microsoft/resnet-50
  Output:     output/

  export       done  (28.3s)
  optimize     done  (4.1s)
  quantize     done  (6.8s)
  compile      done  (14.2s)

  Build complete in 53.4s
  Final artifact: output/resnet50_ctx.onnx

# Start from a pre-exported ONNX file (skips export stage)
winml build -c config.json -m resnet50.onnx -o output/

# Export and optimize only — skip quantization and compilation for quick testing
winml build -c config.json -m bert-base-uncased -o output/ \
  --no-quant --no-compile

# Force a clean rebuild, overwriting any cached artifacts
winml build -c config.json -m facebook/convnext-tiny-224 -o output/ --rebuild

# Use the global cache and cap optimizer iterations for faster turnaround
winml build -c config.json -m microsoft/resnet-50 \
  --use-cache --max-optim-iterations 1

Common pitfalls¶

Either --output-dir or --use-cache is required; they are mutually exclusive. Omitting both raises an error immediately.
--use-cache is not supported in module mode. When the config is a JSON array (module mode), only --output-dir is accepted.
The config file must come from winml config. The schema is strict; unknown keys are rejected.
Existing artifacts are reused by default. Pass --rebuild to force a fresh run after changing the config.