# ONNX

[ONNX](https://onnx.ai/) is an open graph format to represent machine learning models. [ONNX Runtime](https://onnxruntime.ai/docs/) is a cross-platform machine-learning model accelerator, with a flexible interface to integrate hardware-specific libraries.

## Model Conversion
The `OnnxConversion` pass converts PyTorch models to ONNX using
[torch.onnx](https://pytorch.org/docs/stable/onnx.html).

Please refer to [OnnxConversion](onnx_conversion) for more details about the pass and its config parameters.

Besides, if you want to convert an existing ONNX model with another target opset, you can use [OnnxOpVersionConversion](onnx_op_version_conversion) pass, similar configs with above case:

### Example Configuration
```json
 {
    "type": "OnnxConversion",
    "target_opset": 13
 },
 {
    "type": "OnnxOpVersionConversion",
    "target_opset": 14
 }
```

For generative models, the alternative conversion pass [ModelBuilder](model_builder) that integrates the
[ONNX Runtime Generative AI](https://github.com/microsoft/onnxruntime-genai) module can be used.

Please refer to [ModelBuilder](model_builder) for more details about the pass and its config parameters.

### Example Configuration
```json
{
    "type": "ModelBuilder",
    "precision": "int4"
}
```

## Float16 Conversion

Converting a model to use Float16 instead of Float32 can decrease the model size and improve performance on some GPUs. The `OnnxFloatToFloat16` pass the [float16 converter from onnxruntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py) to convert the model to float16, which convert most nodes/operators to use Float16 instead of Float32.

Conversion to Float16 is often exposed at multiple stages of optimization, including model conversion and transformer optimization. This stand-alone pass is best suited for models that are not transformer architectures, where fusions may rely on a specific data types in node patterns.

### Example Configuration

a. The most basic configuration, which is suitable for many models, leaves all configuration options set to their default values:
```json
{
    "type": "OnnxFloatToFloat16"
}
```

b. More fine-grained control of the conversion conditions is also possible:
```json
{
    "type": "OnnxFloatToFloat16",
    // Don't convert input/output nodes to Float16
    "keep_io_types": true
}
```

See [Float16 Conversion](https://onnxruntime.ai/docs/performance/model-optimizations/float16.html#float16-conversion) for more detailed description of the available configuration parameters.

## Inputs/Outputs Float16 to Float32 Conversion

Certain environments such as Onnxruntime WebGPU prefers Float32 logits. The `OnnxIOFloat16ToFloat32` pass converts the inputs and outputs to use Float32 instead of Float16.

### Example Configuration

a. The most basic configuration, which is suitable for many models, leaves all configuration options set to their default values:
```json
{
    "type": "OnnxIOFloat16ToFloat32"
}
```

## Mixed Precision Conversion
Converting model to mixed precision.

If float16 conversion is giving poor results, you can convert most of the ops to float16 but leave some in float32. The `OrtMixedPrecision` pass finds a minimal set of ops to skip while retaining a certain level of accuracy.

The default value for `op_block_list` is `["SimplifiedLayerNormalization", "SkipSimplifiedLayerNormalization", "Relu", "Add"]`.

### Example Configuration

a. The most basic configuration, which is suitable for many models, leaves all configuration options set to their default values:
```json
{
    "type": "OrtMixedPrecision"
}
```

b. More fine-grained control of the conversion conditions is also possible:
```json
{
    "type": "OrtMixedPrecision",
    "op_block_list": [
        "Add",
        "LayerNormalization",
        "SkipLayerNormalization",
        "FastGelu",
        "EmbedLayerNormalization",
    ]
}
```

## Convert dynamic shape to fixed shape

In qnn, snpe and other mobile inference scenarios, the input shape of the model is often fixed. The `DynamicToFixedShape` pass converts the dynamic shape of the model to a fixed shape.

For example, often models have a dynamic batch size so that training is more efficient. In mobile scenarios the batch generally has a size of 1. Making the batch size dimension ‘fixed’ by setting it to 1 may allow NNAPI and CoreML to run of the model.

The helper can be used to update specific dimensions, or the entire input shape.

### Example Configuration

a. Making a symbolic dimension fixed
```json
{
    "type": "DynamicToFixedShape",
    "input_dim": ["batch_size"],
    "dim_value": [1]
}
```

b. Making the entire input shape fixed
```json
{
    "type": "DynamicToFixedShape",
    "input_name": ["input"],
    "input_shape": [[1, 3, 224, 224]]
}
```

Note: The `input_dim` and `dim_value` should have the same length, and the `input_name` and `input_shape` should have the same length. Also the `input_dim & dim_value` and `input_name & input_shape` should be exclusive to each other, user cannot specify both of them at the same time.

More details about the pass and its config parameters can be found [here](https://onnxruntime.ai/docs/tutorials/mobile/helpers/make-dynamic-shape-fixed.html).