Skip to content

Reference — Config Schema

This page documents the full schema for WinMLBuildConfig, the JSON configuration file that drives the winml-cli pipeline. Generate a config with winml config, then pass it to any command with -c config.json.

The config is accepted by all pipeline commands — not just winml build. For example, winml export -c config.json, winml quantize -c config.json, and winml compile -c config.json each read the relevant section of the same config file. This lets you use a single config as the source of truth across all stages.

Top-Level Structure

{
  "loader":  { ... },
  "export":  { ... },
  "optim":   { ... },
  "quant":   { ... },
  "compile": { ... },
  "eval":    { ... },
  "auto":    true
}

Setting quant or compile to null skips that pipeline stage entirely. Setting auto to true (default) lets winml-cli auto-configure downstream stages based on the target device and precision.


loader — Model Loading

Field Type Default Description
task str \| null null HuggingFace task (e.g., image-classification). Auto-detected if omitted.
model_class str \| null null Override model class (e.g., AutoModelForCTC).
model_type str \| null null HuggingFace model type (e.g., bert, resnet).
module_path str \| null null Dotted path to a submodule for targeted export.
user_script str \| null null Path to custom model class script.
trust_remote_code bool false Trust remote code from HuggingFace.

export — ONNX Export

Field Type Default Description
opset_version int 17 ONNX opset version.
batch_size int 1 Static batch size. Use 1 for QNN compatibility.
input_tensors list[InputTensorSpec] \| null null Input tensor specifications. Auto-inferred if omitted.
output_tensors list[OutputTensorSpec] \| null null Output tensor specifications.
dynamic_axes dict \| null null Dynamic axes mapping. ⚠️ Breaks MatMulAddFusion on QNN.
export_params bool true Include model parameters in ONNX.
do_constant_folding bool true Fold constants during export.
verbose bool false Verbose export logging.
dynamo bool false Use PyTorch 2.x Dynamo exporter.
enable_hierarchy_tags bool true Add module hierarchy tags to ONNX nodes.
clean_onnx bool false Strip hierarchy tags after export.
hierarchy_tag_format "full" \| "module_only" "full" Tag detail level.

InputTensorSpec:

Field Type Description
name str \| null Tensor name (e.g., pixel_values).
dtype str \| null Data type (e.g., float32, int64).
shape list[int] \| null Tensor shape (e.g., [1, 3, 224, 224]).
value_range [float, float] \| null Min/max for dummy tensor generation.

optim — Graph Optimization

A dictionary of boolean fusion flags. All default to false unless auto-configured.

Field Type Description
gelu_fusion bool Fuse GeLU activation patterns.
layer_norm_fusion bool Fuse LayerNorm patterns.
matmul_add_fusion bool Fuse MatMul + Add (enables BiasGelu).

Additional fusion flags can be added as key-value pairs.


quant — Quantization

Set to null to skip quantization.

Field Type Default Description
mode "qdq" \| "static" \| "dynamic" "qdq" Quantization mode.
weight_type "uint8" \| "int8" \| "uint16" \| "int16" "uint8" Weight data type.
activation_type "uint8" \| "int8" \| "uint16" \| "int16" "uint8" Activation data type.
calibration_method "minmax" \| "entropy" \| "percentile" "minmax" Scale computation method.
samples int 10 Number of calibration samples.
per_channel bool false Per-channel quantization.
symmetric bool false Symmetric quantization.
task str \| null null Task for dataset-aware calibration.
model_name str \| null null Model ID for calibration dataset resolution.
dataset_name str \| null null Override calibration dataset.
distribution str "uniform" Random distribution for dummy data.
seed int \| null null Random seed for reproducibility.
calibration_load_path str \| null null Load pre-computed calibration scales.
calibration_save_path str \| null null Save calibration scales.
op_types_to_quantize list[str] \| null null Operator types to quantize (all if null).
nodes_to_exclude list[str] \| null null Node names to skip.

compile — EP Compilation

Set to null to skip compilation.

Field Type Default Description
ep_config.provider str "qnn" EP alias: qnn, cpu, dml, openvino, tensorrt, vitisai, migraphx.
ep_config.device str "auto" Target device: npu, gpu, cpu, auto.
ep_config.enable_ep_context bool true Generate EPContext model.
ep_config.embed_context bool false Embed binary in ONNX (true) or external .bin (false).
ep_config.compiler str "ort" Compiler backend: ort or qairt.
ep_config.provider_options dict {} EP-specific options.
ep_config.qnn_sdk_root str \| null null QNN SDK path for QAIRT compiler backend.
validate bool true Validate compiled model.
verbose bool false Verbose compilation logging.

eval — Evaluation

Set to null (default) to skip evaluation.

Field Type Default Description
model_id str \| null null HuggingFace model ID for config resolution.
model_path str \| dict[str, str] \| null null Path to .onnx file, or a {role: path} dict for composite models.
task str \| null null Task type.
device str "auto" Inference device.
precision str "auto" Precision (fp32, fp16, w8a16, etc.).
ep str \| null null EP override.
dataset.path str \| null null HuggingFace dataset path.
dataset.name str \| null null Dataset config name.
dataset.split str "validation" Dataset split.
dataset.samples int 100 Evaluation sample count.
dataset.shuffle bool true Shuffle before sampling.
dataset.seed int 42 Random seed.
output_path str \| null null Path for JSON results output.

Example: Full Config

{
  "loader": {
    "task": "image-classification",
    "model_type": "resnet"
  },
  "export": {
    "opset_version": 17,
    "batch_size": 1
  },
  "optim": {
    "gelu_fusion": true,
    "layer_norm_fusion": true,
    "matmul_add_fusion": true
  },
  "quant": {
    "mode": "qdq",
    "weight_type": "uint8",
    "activation_type": "uint8",
    "samples": 10,
    "calibration_method": "minmax"
  },
  "compile": {
    "ep_config": {
      "provider": "qnn",
      "device": "npu",
      "enable_ep_context": true,
      "embed_context": false
    },
    "validate": true
  },
  "auto": true
}

The auto field

The top-level "auto" field (default: true) controls whether the build pipeline runs the autoconf loop — an iterative analyze → discover → re-optimize cycle that automatically detects which additional graph optimizations the model needs for the target EP.

Value Behavior
true (default) After initial optimization, the analyzer inspects the graph for unsupported or sub-optimal nodes and proposes additional optimization flags. The pipeline re-optimizes using the discovered flags and repeats (up to --max-optim-iterations, default 3). The final optimization result depends on what the analyzer discovers at runtime, so outputs may vary if the model or EP support changes between runs.
false The pipeline applies only the explicit optim flags from the config — no autoconf discovery, no re-optimization loop. Builds are fully deterministic given the same config and input model. Use this for reproducible CI builds or when you have already tuned the optimization flags manually.

When auto is true and the autoconf loop discovers additional flags, the final persisted config (written to the output directory) includes the merged result so you can inspect what was discovered.

See also