Run Olive workflows#

The Olive run command allows you to execute any of the 40+ optimizations available in Olive in a sequence you define in a YAML/JSON file called a workflow.

Quickstart#

In this quickstart, you’ll execute the following Olive workflow:

The input into the workflow is the Llama-3.2-1B-Instruct model from Hugging Face. The workflow has the following of passes (steps):

Convert the model into the ONNX format using the OnnxConversion pass.
Quantize using the IncDynamicQuantization pass (Intel® Neural Compressor Dynamic Quantization).
Optimize the ONNX Runtime inference settings using the OrtSessionParamsTuning pass.

The output of the workflow is a Zip file containing the ONNX model and ORT configuration settings.

Define the workflow in a YAML file#

First, define the ‘quickstart workflow’ in a YAML file. Alternatively, you can use a JSON file. For more details about the available options for the configuration file, please refer to this reference:

# quickstart-workflow.yaml
input_model:
  type: HfModel
  model_path: meta-llama/Llama-3.2-1B-Instruct
systems:
  local_system:
    type: LocalSystem
    accelerators:
      - device: cpu
        execution_providers:
          - CPUExecutionProvider
data_configs:
  - name: transformer_token_dummy_data
    type: TransformersTokenDummyDataContainer
passes:
  conversion:
    type: OnnxConversion
    target_opset: 16
    save_as_external_data: true
    all_tensors_to_one_file: true
    save_metadata_for_token_generation: true
  quantize:
    type: IncDynamicQuantization
  session_params_tuning:
    type: OrtSessionParamsTuning
    data_config: transformer_token_dummy_data
    io_bind: true
packaging_config:
  - type: Zipfile
    name: OutputModel
log_severity_level: 0
host: local_system
target: local_system
cache_dir: cache
output_dir: null

Run the workflow#

The workflow is executed using the run command:

olive run --config quickstart-workflow.yaml