winml build¶
Run the entire winml-cli pipeline (export → optimize → quantize → compile) in one command.
When to use this¶
Use winml build when you want to go from a Hugging Face model ID (or an
existing .onnx file) to a deployment-ready artifact in a single invocation,
without manually chaining winml export, winml optimize, winml quantize,
and winml compile. A build config file — generated by winml config — controls every
stage of the pipeline.
Synopsis¶
Flags¶
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--config |
-c |
path | None |
WinMLBuildConfig JSON file, generated by winml config. If omitted, config is auto-generated from -m. |
--model |
-m |
string | None |
Hugging Face model ID or path to an existing .onnx file. |
--output-dir |
-o |
path | None |
Directory for all build artifacts. Mutually exclusive with --use-cache. |
--use-cache/--no-use-cache |
flag | false |
Store artifacts in the winml-cli global cache (~/.cache/winml/). Mutually exclusive with --output-dir. |
|
--rebuild/--no-rebuild |
flag | false |
Overwrite existing artifacts and re-run the full pipeline. | |
--quant/--no-quant |
flag | true |
Run the quantization stage (use --no-quant to skip), overriding the config. |
|
--no-compile / --compile |
flag | None |
Override compilation. --compile forces enable (config must have a compile section). --no-compile forces skip. Default: inherit from config. |
|
--optimize/--no-optimize |
flag | true |
Run the optimization stage (use --no-optimize to skip). |
|
--ep |
string | None |
Target execution provider for the analyzer (e.g., qnn). Falls back to the compile config EP if not set. |
|
--device |
-d |
string | auto |
Target device for the analyzer (e.g., npu, gpu). Default: auto (auto-detect). |
--analyze/--no-analyze |
flag | true |
Run the analyzer loop during build (use --no-analyze to skip). |
|
--max-optim-iterations |
integer | None |
Maximum autoconf re-optimization rounds (3 enforced internally when not set). --no-analyze implicitly sets this to 0. |
|
--trust-remote-code/--no-trust-remote-code |
flag | false |
Allow executing custom code from model repositories. Use only with trusted sources. | |
--allow-unsupported-nodes/--no-allow-unsupported-nodes |
flag | false |
Allow unsupported nodes to remain in the graph instead of failing the build. | |
--help |
-h |
flag | Show this message and exit. |
How it works¶
winml build reads a WinMLBuildConfig JSON file (from winml config) that
encodes device, precision, export, quantization, and compilation settings.
When -m is a Hugging Face model ID, the full pipeline runs: export → optimize
→ quantize → compile. When -m points to an existing .onnx file, the export
stage is skipped and the pipeline starts at optimization. After compilation, an
optional analyzer loop (--max-optim-iterations) re-evaluates graph quality
and applies further passes; --no-analyze disables it for a deterministic
single-pass build. Individual stages can be suppressed with --no-quant,
--no-compile, and --no-optimize without touching the config file.
Reproducible CI/CD builds
The config file is a portable, self-contained pipeline specification. Check it into source control and invoke winml build -c config.json in CI to produce identical artifacts without manual flag management. Set "auto": false in the config to disable the autoconf discovery loop for fully deterministic output.
Examples¶
# Full pipeline: HF model → export → optimize → quantize → compile
winml build -c config.json -m microsoft/resnet-50 -o output/
winml build
Config: config.json
Model: microsoft/resnet-50
Output: output/
export done (28.3s)
optimize done (4.1s)
quantize done (6.8s)
compile done (14.2s)
Build complete in 53.4s
Final artifact: output/resnet50_ctx.onnx
# Start from a pre-exported ONNX file (skips export stage)
winml build -c config.json -m resnet50.onnx -o output/
# Export and optimize only — skip quantization and compilation for quick testing
winml build -c config.json -m bert-base-uncased -o output/ \
--no-quant --no-compile
# Force a clean rebuild, overwriting any cached artifacts
winml build -c config.json -m facebook/convnext-tiny-224 -o output/ --rebuild
# Use the global cache and cap optimizer iterations for faster turnaround
winml build -c config.json -m microsoft/resnet-50 \
--use-cache --max-optim-iterations 1
Common pitfalls¶
- Either
--output-diror--use-cacheis required; they are mutually exclusive. Omitting both raises an error immediately. --use-cacheis not supported in module mode. When the config is a JSON array (module mode), only--output-diris accepted.- The config file must come from
winml config. The schema is strict; unknown keys are rejected. - Existing artifacts are reused by default. Pass
--rebuildto force a fresh run after changing the config.
See also¶
- winml export
- winml compile
- Config and build
- How it works
- Config Schema — full field-by-field config reference
- Output Layout — what each output file contains
- Supported Models — validated model architectures