Output Layout¶

When you run winml build, the tool writes all artifacts to the output directory. This page documents what each file is and which ones you need for deployment.

Directory Structure¶

After a full pipeline run (export → optimize → quantize → compile):

output/
├── model.onnx                  ← FINAL artifact (deploy this)
├── model.onnx.data             ← External weights (if model ≥ 100 MiB)
├── winml_build_config.json     ← Persisted build config
├── analyze_result.json         ← Static analysis (EP compatibility)
├── build_manifest.json         ← Build provenance (Python API only)
├── export_htp_metadata.json    ← HTP export metadata (hierarchy info)
├── export.onnx                 ← Intermediate: raw ONNX export
├── export.onnx.data
├── optimized.onnx              ← Intermediate: after graph optimization
├── optimized.onnx.data
├── quantized.onnx              ← Intermediate: after QDQ insertion
├── quantized.onnx.data
├── compiled.onnx               ← Intermediate: after EP compilation
└── compiled.onnx.data

File Categories¶

Final Artifacts (Keep for Deployment)¶

File	Purpose
`model.onnx`	The deployment-ready model. Always present.
`model.onnx.data`	External weight data (only if model ≥ 100 MiB). Must stay alongside `model.onnx`.
`winml_build_config.json`	The complete pipeline config used for this build (includes auto-discovered optimization flags). This file is a reproducible pipeline specification — check it into version control or feed it directly to `winml build -c` in a CI/CD pipeline to guarantee identical model processing across machines and runs (set `"auto": false` for fully deterministic builds).
`analyze_result.json`	Static analysis output: EP compatibility, operator classification, detected patterns.
`build_manifest.json`	Build provenance with stage timings. Only generated via the Python API (`build_hf_model`/`build_onnx_model`).
`export_htp_metadata.json`	HTP export metadata: module hierarchy, tracing info, tagging coverage.

Intermediate Files (Can Delete After Build)¶

File	Stage	Contents
`export.onnx`	Export	Raw PyTorch → ONNX conversion (float32)
`optimized.onnx`	Optimize	Graph with fused operators, shape inference applied
`quantized.onnx`	Quantize	QDQ nodes inserted, calibrated scales
`compiled.onnx`	Compile	EPContext binary embedded or sidecar

Each intermediate has a corresponding .onnx.data file if the model exceeds 100 MiB.

What Gets Written at Each Stage¶

Export only (`winml export`)¶

output/
├── export.onnx
└── export.onnx.data          (if ≥ 100 MiB)

Optimize only (`winml optimize`)¶

output/
├── optimized.onnx
└── optimized.onnx.data

Full build (`winml build`)¶

All stages write their intermediate, and model.onnx is a copy of the last successful stage output. If you skip quantization (--no-quant), the final model is a copy of optimized.onnx. If you skip compilation too, it's still a copy of optimized.onnx.

External Data¶

Models larger than 100 MiB store weights in a separate .onnx.data file. Both files must be kept together — the .onnx file contains a reference to the data file by name.

Model Size	Files
< 100 MiB	`model.onnx` only (weights embedded)
≥ 100 MiB	`model.onnx` + `model.onnx.data`

Warning

If you move model.onnx, always move model.onnx.data alongside it. The ONNX file references the data file by relative path.

Analyzer Result¶

analyze_result.json contains the static analysis output from the build pipeline's analyze stage. It reports EP compatibility and operator classification:

{
  "analysis_timestamp": "2026-06-04T19:45:17.496169",
  "metadata": {
    "model_path": "iter.onnx",
    "opset_version": 17,
    "producer_name": "pytorch",
    "producer_version": "2.12.0",
    "total_operators": 122,
    "operator_counts": {
      "Conv": 53,
      "Relu": 49,
      "MaxPool": 1,
      "Add": 16,
      "GlobalAveragePool": 1,
      "Flatten": 1,
      "Gemm": 1
    },
    "unique_operator_types": 7,
    "detected_pattern_count": {}
  },
  "results": [
    {
      "ihv_type": "Microsoft",
      "ep_type": "CPUExecutionProvider",
      "device_type": "cpu",
      "runtime_support": false,
      "has_errors": false,
      "has_warnings": false,
      "classification": {
        "supported": [],
        "partial": [],
        "unsupported": [],
        "unknown": [
          "OP/ai.onnx/Conv",
          "OP/ai.onnx/Relu",
          "OP/ai.onnx/MaxPool",
          "OP/ai.onnx/Add",
          "OP/ai.onnx/GlobalAveragePool",
          "OP/ai.onnx/Flatten",
          "OP/ai.onnx/Gemm"
        ]
      },
      "information": []
    }
  ]
}

Key fields:

Field	Description
`metadata.total_operators`	Total ONNX operator nodes in the model graph
`metadata.operator_counts`	Frequency of each operator type
`metadata.detected_pattern_count`	Fused subgraph patterns (GeLU, LayerNorm, etc.)
`results[].ihv_type`	Hardware vendor (`"Microsoft"`, `"QC"`, `"Intel"`, etc.)
`results[].runtime_support`	`true` if the EP can run all operators
`results[].classification`	Operators grouped by support level: `supported`, `partial`, `unsupported`, `unknown`
`results[].has_errors`	`true` if unsupported ops exist (model won't run on that EP)

Build Manifest¶

build_manifest.json records provenance for every build:

{
  "schema_version": 1,
  "model_id": "microsoft/resnet-50",
  "task": "image-classification",
  "cache_key": "a1b2c3d4e5f6",
  "config_hash": "f7e8d9c0b1a2",
  "timestamp": "2026-01-15T10:30:00.000000+00:00",
  "elapsed_seconds": 45.1,
  "final_artifact": "model.onnx",
  "analyze_iterations": 2,
  "analyze_unsupported_node_count": 0,
  "analyze_details": { "lint": {}, "autoconf": {} },
  "stages": [
    {
      "name": "export",
      "status": "completed",
      "filename": "export.onnx",
      "elapsed_seconds": 12.5
    },
    {
      "name": "optimize",
      "status": "completed",
      "filename": "optimized.onnx",
      "elapsed_seconds": 8.2
    },
    {
      "name": "quantize",
      "status": "completed",
      "filename": "quantized.onnx",
      "elapsed_seconds": 15.3,
      "nodes_quantized": 150,
      "nodes_skipped": 12
    },
    {
      "name": "compile",
      "status": "completed",
      "filename": "compiled.onnx",
      "elapsed_seconds": 9.1
    }
  ]
}

Rebuild Behavior¶

If model.onnx already exists and rebuild=False (default), the build is skipped entirely.
Pass --rebuild (CLI) or force_rebuild=True (Python API) to force a fresh build.
On rebuild, all old .onnx and .onnx.data files are deleted before the pipeline runs.