Output Layout¶
When you run winml build, the tool writes all artifacts to the output
directory. This page documents what each file is and which ones you need
for deployment.
Directory Structure¶
After a full pipeline run (export → optimize → quantize → compile):
output/
├── model.onnx ← FINAL artifact (deploy this)
├── model.onnx.data ← External weights (if model ≥ 100 MiB)
├── winml_build_config.json ← Persisted build config
├── analyze_result.json ← Static analysis (EP compatibility)
├── build_manifest.json ← Build provenance (Python API only)
├── export_htp_metadata.json ← HTP export metadata (hierarchy info)
├── export.onnx ← Intermediate: raw ONNX export
├── export.onnx.data
├── optimized.onnx ← Intermediate: after graph optimization
├── optimized.onnx.data
├── quantized.onnx ← Intermediate: after QDQ insertion
├── quantized.onnx.data
├── compiled.onnx ← Intermediate: after EP compilation
└── compiled.onnx.data
File Categories¶
Final Artifacts (Keep for Deployment)¶
| File | Purpose |
|---|---|
model.onnx |
The deployment-ready model. Always present. |
model.onnx.data |
External weight data (only if model ≥ 100 MiB). Must stay alongside model.onnx. |
winml_build_config.json |
The complete pipeline config used for this build (includes auto-discovered optimization flags). This file is a reproducible pipeline specification — check it into version control or feed it directly to winml build -c in a CI/CD pipeline to guarantee identical model processing across machines and runs (set "auto": false for fully deterministic builds). |
analyze_result.json |
Static analysis output: EP compatibility, operator classification, detected patterns. |
build_manifest.json |
Build provenance with stage timings. Only generated via the Python API (build_hf_model/build_onnx_model). |
export_htp_metadata.json |
HTP export metadata: module hierarchy, tracing info, tagging coverage. |
Intermediate Files (Can Delete After Build)¶
| File | Stage | Contents |
|---|---|---|
export.onnx |
Export | Raw PyTorch → ONNX conversion (float32) |
optimized.onnx |
Optimize | Graph with fused operators, shape inference applied |
quantized.onnx |
Quantize | QDQ nodes inserted, calibrated scales |
compiled.onnx |
Compile | EPContext binary embedded or sidecar |
Each intermediate has a corresponding .onnx.data file if the model exceeds
100 MiB.
What Gets Written at Each Stage¶
Export only (winml export)¶
Optimize only (winml optimize)¶
Full build (winml build)¶
All stages write their intermediate, and model.onnx is a copy of the last
successful stage output. If you skip quantization (--no-quant), the final
model is a copy of optimized.onnx. If you skip compilation too, it's still
a copy of optimized.onnx.
External Data¶
Models larger than 100 MiB store weights in a separate .onnx.data file.
Both files must be kept together — the .onnx file contains a reference to the
data file by name.
| Model Size | Files |
|---|---|
| < 100 MiB | model.onnx only (weights embedded) |
| ≥ 100 MiB | model.onnx + model.onnx.data |
Warning
If you move model.onnx, always move model.onnx.data alongside it.
The ONNX file references the data file by relative path.
Analyzer Result¶
analyze_result.json contains the static analysis output from the build pipeline's
analyze stage. It reports EP compatibility and operator classification:
{
"analysis_timestamp": "2026-06-04T19:45:17.496169",
"metadata": {
"model_path": "iter.onnx",
"opset_version": 17,
"producer_name": "pytorch",
"producer_version": "2.12.0",
"total_operators": 122,
"operator_counts": {
"Conv": 53,
"Relu": 49,
"MaxPool": 1,
"Add": 16,
"GlobalAveragePool": 1,
"Flatten": 1,
"Gemm": 1
},
"unique_operator_types": 7,
"detected_pattern_count": {}
},
"results": [
{
"ihv_type": "Microsoft",
"ep_type": "CPUExecutionProvider",
"device_type": "cpu",
"runtime_support": false,
"has_errors": false,
"has_warnings": false,
"classification": {
"supported": [],
"partial": [],
"unsupported": [],
"unknown": [
"OP/ai.onnx/Conv",
"OP/ai.onnx/Relu",
"OP/ai.onnx/MaxPool",
"OP/ai.onnx/Add",
"OP/ai.onnx/GlobalAveragePool",
"OP/ai.onnx/Flatten",
"OP/ai.onnx/Gemm"
]
},
"information": []
}
]
}
Key fields:
| Field | Description |
|---|---|
metadata.total_operators |
Total ONNX operator nodes in the model graph |
metadata.operator_counts |
Frequency of each operator type |
metadata.detected_pattern_count |
Fused subgraph patterns (GeLU, LayerNorm, etc.) |
results[].ihv_type |
Hardware vendor ("Microsoft", "QC", "Intel", etc.) |
results[].runtime_support |
true if the EP can run all operators |
results[].classification |
Operators grouped by support level: supported, partial, unsupported, unknown |
results[].has_errors |
true if unsupported ops exist (model won't run on that EP) |
Build Manifest¶
build_manifest.json records provenance for every build:
{
"schema_version": 1,
"model_id": "microsoft/resnet-50",
"task": "image-classification",
"cache_key": "a1b2c3d4e5f6",
"config_hash": "f7e8d9c0b1a2",
"timestamp": "2026-01-15T10:30:00.000000+00:00",
"elapsed_seconds": 45.1,
"final_artifact": "model.onnx",
"analyze_iterations": 2,
"analyze_unsupported_node_count": 0,
"analyze_details": { "lint": {}, "autoconf": {} },
"stages": [
{
"name": "export",
"status": "completed",
"filename": "export.onnx",
"elapsed_seconds": 12.5
},
{
"name": "optimize",
"status": "completed",
"filename": "optimized.onnx",
"elapsed_seconds": 8.2
},
{
"name": "quantize",
"status": "completed",
"filename": "quantized.onnx",
"elapsed_seconds": 15.3,
"nodes_quantized": 150,
"nodes_skipped": 12
},
{
"name": "compile",
"status": "completed",
"filename": "compiled.onnx",
"elapsed_seconds": 9.1
}
]
}
Rebuild Behavior¶
- If
model.onnxalready exists andrebuild=False(default), the build is skipped entirely. - Pass
--rebuild(CLI) orforce_rebuild=True(Python API) to force a fresh build. - On rebuild, all old
.onnxand.onnx.datafiles are deleted before the pipeline runs.
See also¶
- winml build — build command reference
- Reference — Config Schema — config file format
- How winml-cli Works — pipeline stages explained