Command Line Tools#

Olive provides command line tools that can be invoked using the olive command.

Run#

Run Olive workflow defined in the input .json configuration file.

usage: olive run [-h] --run-config RUN_CONFIG [--list_required_packages]
                 [--tempdir TEMPDIR] [--package-config PACKAGE_CONFIG]
                 [--log_level LOG_LEVEL] [-m MODEL_NAME_OR_PATH] [-t TASK]
                 [--trust_remote_code] [-a ADAPTER_PATH]
                 [--model_script MODEL_SCRIPT] [--script_dir SCRIPT_DIR]
                 [-o OUTPUT_PATH]

Named Arguments#

--run-config, --config

Path to json config file

--list_required_packages

List packages required to run the workflow

Default: False

--tempdir

Root directory for tempfile directories and files

--package-config

For advanced users. Path to optional package (json) config file with location of individual pass module implementation and corresponding dependencies. Configuration might also include user owned/proprietary/private pass implementations.

--log_level

Logging level. Default is None. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Model options (not required)#

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-t, --task

Task for which the huggingface model is used. Default task is text-generation-with-past.

--trust_remote_code

Trust remote code when loading a huggingface model.

Default: False

-a, --adapter_path

Path to the adapters weights saved after peft fine-tuning. Local folder or huggingface id.

--model_script

The script file containing the model definition. Required for the local PyTorch model.

--script_dir

The directory containing the local PyTorch model script file. See https://microsoft.github.io/Olive/reference/cli.html#model-script-file-information for more information.

-o, --output_path

Path to save the command output.

Optimize#

Optimize input model (supports HuggingFace, ONNX, PyTorch, and Azure ML models).

usage: olive optimize [-h] [-m MODEL_NAME_OR_PATH] [-t TASK]
                      [--trust_remote_code] [-a ADAPTER_PATH]
                      [--model_script MODEL_SCRIPT] [--script_dir SCRIPT_DIR]
                      [-o OUTPUT_PATH]
                      [--provider {CPUExecutionProvider,CUDAExecutionProvider,QNNExecutionProvider,VitisAIExecutionProvider,OpenVINOExecutionProvider,WebGpuExecutionProvider,NvTensorRTRTXExecutionProvider}]
                      [--device {cpu,gpu,npu}]
                      [--precision {int4,int8,int16,int32,uint4,uint8,uint16,uint32,fp16,fp32,bf16}]
                      [--act_precision {int8,uint8,int16,uint16}]
                      [--num_split NUM_SPLIT] [--memory MEMORY]
                      [--exporter {model_builder,dynamo_exporter,torchscript_exporter,optimum_exporter}]
                      [--dim_param DIM_PARAM] [--dim_value DIM_VALUE]
                      [--use_qdq_format] [--surgeries [SURGERIES ...]]
                      [--block_size BLOCK_SIZE] [--modality {text}]
                      [--enable_aot] [--qnn_env_path QNN_ENV_PATH]
                      [--extra_mb_options EXTRA_MB_OPTIONS]
                      [--log_level LOG_LEVEL] [--save_config_file] [--dry_run]

Named Arguments#

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-t, --task

Task for which the huggingface model is used. Default task is text-generation-with-past.

--trust_remote_code

Trust remote code when loading a huggingface model.

Default: False

-a, --adapter_path

Path to the adapters weights saved after peft fine-tuning. Local folder or huggingface id.

--model_script

The script file containing the model definition. Required for the local PyTorch model.

--script_dir

The directory containing the local PyTorch model script file. See https://microsoft.github.io/Olive/reference/cli.html#model-script-file-information for more information.

-o, --output_path

Path to save the command output.

Default: optimized-model

--provider

Possible choices: CPUExecutionProvider, CUDAExecutionProvider, QNNExecutionProvider, VitisAIExecutionProvider, OpenVINOExecutionProvider, WebGpuExecutionProvider, NvTensorRTRTXExecutionProvider

Execution provider (EP) to use for optimization.

Default: 'CPUExecutionProvider'

--device

Possible choices: cpu, gpu, npu

Target device for optimization.

--precision

Possible choices: int4, int8, int16, int32, uint4, uint8, uint16, uint32, fp16, fp32, bf16

Target precision for optimization.

Default: 'fp32'

--act_precision

Possible choices: int8, uint8, int16, uint16

Activation precision for quantization (optional).

--num_split

Number of splits for model splitting (optional).

--memory

Available device memory in MB (optional).

--exporter

Possible choices: model_builder, dynamo_exporter, torchscript_exporter, optimum_exporter

Exporter to use for model conversion (optional).

--dim_param

Dynamic parameter names for dynamic to fixed shape conversion (optional).

--dim_value

Fixed dimension values for dynamic to fixed shape conversion (optional).

--use_qdq_format

Use QDQ format for quantization instead of QOperator format.

Default: False

--surgeries

List of graph surgeries to apply (optional).

--block_size

Block size for quantization. Use -1 for per-channel quantization (optional).

--modality

Possible choices: text

Model modality for optimization. Only ‘text’ is currently supported.

Default: 'text'

--enable_aot

Enable Ahead-of-Time (AOT) compilation.

Default: False

--qnn_env_path

Path to QNN environment directory (required when using AOT with QNN).

--extra_mb_options

Extra key-value pairs options to pass to the model builder. e.g., ‘int4_is_symmetric=true,int4_op_types_to_quantize=MatMul/Gemm’.

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

--save_config_file

Generate and save the config file for the command.

Default: False

--dry_run

Enable dry run mode. This will not perform any actual optimization but will validate the configuration.

Default: False

Quantization#

Quantize PyTorch or ONNX model using various Quantization algorithms.

usage: olive quantize [-h] [-m MODEL_NAME_OR_PATH] [-t TASK]
                      [--trust_remote_code] [-a ADAPTER_PATH]
                      [--model_script MODEL_SCRIPT] [--script_dir SCRIPT_DIR]
                      [-o OUTPUT_PATH]
                      [--algorithm {awq,gptq,hqq,rtn,spinquant,quarot,lpbq,seqmse,adaround}]
                      [--precision {int4,int8,int16,int32,uint4,uint8,uint16,uint32,fp4,fp8,fp16,fp32,nf4,bf16}]
                      [--act_precision {int4,int8,int16,int32,uint4,uint8,uint16,uint32,fp4,fp8,fp16,fp32,nf4,bf16}]
                      [--implementation IMPLEMENTATION] [--use_qdq_encoding]
                      [-d DATA_NAME] [--subset SUBSET] [--split SPLIT]
                      [--data_files DATA_FILES]
                      [--text_field TEXT_FIELD | --text_template TEXT_TEMPLATE | --use_chat_template]
                      [--max_seq_len MAX_SEQ_LEN]
                      [--add_special_tokens ADD_SPECIAL_TOKENS]
                      [--max_samples MAX_SAMPLES] [--batch_size BATCH_SIZE]
                      [--input_cols INPUT_COLS [INPUT_COLS ...]]
                      [--account_name ACCOUNT_NAME]
                      [--container_name CONTAINER_NAME]
                      [--log_level LOG_LEVEL] [--save_config_file] [--dry_run]

Named Arguments#

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-t, --task

Task for which the huggingface model is used. Default task is text-generation-with-past.

--trust_remote_code

Trust remote code when loading a huggingface model.

Default: False

-a, --adapter_path

Path to the adapters weights saved after peft fine-tuning. Local folder or huggingface id.

--model_script

The script file containing the model definition. Required for the local PyTorch model.

--script_dir

The directory containing the local PyTorch model script file. See https://microsoft.github.io/Olive/reference/cli.html#model-script-file-information for more information.

-o, --output_path

Path to save the command output.

Default: quantized-model

--algorithm

Possible choices: awq, gptq, hqq, rtn, spinquant, quarot, lpbq, seqmse, adaround

List of quantization algorithms to run.

Default: 'rtn'

--precision

Possible choices: int4, int8, int16, int32, uint4, uint8, uint16, uint32, fp4, fp8, fp16, fp32, nf4, bf16

The precision of the quantized model.

Default: 'int8'

--act_precision

Possible choices: int4, int8, int16, int32, uint4, uint8, uint16, uint32, fp4, fp8, fp16, fp32, nf4, bf16

The precision of the activation quantization for static quantization.

Default: 'int8'

--implementation

The specific implementation of quantization algorithms to use.

Default: 'olive'

--use_qdq_encoding

Use QDQ encoding in ONNX model for the quantized nodes.

Default: False

-d, --data_name

The dataset name.

--subset

The subset of the dataset to use.

--split

The dataset split to use.

--data_files

The dataset files. If multiple files, separate by comma.

--text_field

The text field to use for fine-tuning.

--text_template

Template to generate text field from. E.g. ‘### Question: {prompt} n### Answer: {response}’

--use_chat_template

Use chat template for text field.

Default: False

--max_seq_len

Maximum sequence length for the data.

Default: 1024

--add_special_tokens

Whether to add special tokens during preprocessing.

Default: False

--max_samples

Maximum samples to select from the dataset.

Default: 256

--batch_size

Batch size.

Default: 1

--input_cols

List of input column names. Provide one or more names separated by space. Example: –input_cols sentence1 sentence2

--account_name

Azure storage account name for shared cache.

--container_name

Azure storage container name for shared cache.

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

--save_config_file

Generate and save the config file for the command.

Default: False

--dry_run

Enable dry run mode. This will not perform any actual optimization but will validate the configuration.

Default: False

Capture Onnx Graph#

Capture ONNX graph using PyTorch Exporter or Model Builder from the Huggingface model or PyTorch model.

usage: olive capture-onnx-graph [-h] [-m MODEL_NAME_OR_PATH] [-t TASK]
                                [--trust_remote_code] [-a ADAPTER_PATH]
                                [--model_variant {auto,sd,sdxl,sd3,flux,sana}]
                                [--model_script MODEL_SCRIPT]
                                [--script_dir SCRIPT_DIR] [-o OUTPUT_PATH]
                                [--conversion_device {cpu,gpu}]
                                [--use_dynamo_exporter]
                                [--fixed_param_dict FIXED_PARAM_DICT]
                                [--past_key_value_name PAST_KEY_VALUE_NAME]
                                [--torch_dtype TORCH_DTYPE]
                                [--target_opset TARGET_OPSET]
                                [--use_model_builder]
                                [--precision {fp16,fp32,int4,bf16}]
                                [--int4_block_size {16,32,64,128,256}]
                                [--int4_accuracy_level INT4_ACCURACY_LEVEL]
                                [--exclude_embeds EXCLUDE_EMBEDS]
                                [--exclude_lm_head EXCLUDE_LM_HEAD]
                                [--enable_cuda_graph ENABLE_CUDA_GRAPH]
                                [--extra_mb_options EXTRA_MB_OPTIONS]
                                [--use_ort_genai] [--log_level LOG_LEVEL]
                                [--save_config_file] [--dry_run]
                                [--account_name ACCOUNT_NAME]
                                [--container_name CONTAINER_NAME]

Named Arguments#

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-t, --task

Task for which the huggingface model is used. Default task is text-generation-with-past.

--trust_remote_code

Trust remote code when loading a huggingface model.

Default: False

-a, --adapter_path

Path to the adapters weights saved after peft fine-tuning. Local folder or huggingface id.

--model_variant

Possible choices: auto, sd, sdxl, sd3, flux, sana

Model variant: ‘sd’, ‘sdxl’, ‘sd3’, ‘flux’, ‘sana’, or ‘auto’ for auto-detection.

Default: auto

--model_script

The script file containing the model definition. Required for the local PyTorch model.

--script_dir

The directory containing the local PyTorch model script file. See https://microsoft.github.io/Olive/reference/cli.html#model-script-file-information for more information.

-o, --output_path

Path to save the command output.

Default: onnx-model

--conversion_device

Possible choices: cpu, gpu

The device used to run the model to capture the ONNX graph.

Default: 'cpu'

--use_ort_genai

Use OnnxRuntime generate() API to run the model

Default: False

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

--save_config_file

Generate and save the config file for the command.

Default: False

--dry_run

Enable dry run mode. This will not perform any actual optimization but will validate the configuration.

Default: False

--account_name

Azure storage account name for shared cache.

--container_name

Azure storage container name for shared cache.

PyTorch Exporter options#

--use_dynamo_exporter

Whether to use dynamo_export API to export ONNX model.

Default: False

--fixed_param_dict

Fix dynamic input shapes by providing a dictionary of dimension names and values, e.g., ‘batch_size=1,max_length=128’

--past_key_value_name

The arguments name to point to past key values. For model loaded from huggingface, it is ‘past_key_values’. Basically, it is used only when use_dynamo_exporter is True.

Default: 'past_key_values'

--torch_dtype

The dtype to cast the model to before capturing the ONNX graph, e.g., ‘float32’ or ‘float16’. If not specified will use the model as is.

--target_opset

The target opset version for the ONNX model. Default is 17.

Default: 17

Model Builder options#

--use_model_builder

Whether to use Model Builder to capture ONNX model.

Default: False

--precision

Possible choices: fp16, fp32, int4, bf16

The precision of the ONNX model. This is used by Model Builder

Default: 'fp16'

--int4_block_size

Possible choices: 16, 32, 64, 128, 256

Specify the block_size for int4 quantization. Acceptable values: 16/32/64/128/256.

--int4_accuracy_level

Specify the minimum accuracy level for activation of MatMul in int4 quantization.

--exclude_embeds

Remove embedding layer from your ONNX model.

Default: False

--exclude_lm_head

Remove language modeling head from your ONNX model.

Default: False

--enable_cuda_graph

The model can use CUDA graph capture for CUDA execution provider. If enabled, all nodes being placed on the CUDA EP is the prerequisite for the CUDA graph to be used correctly.

--extra_mb_options

Extra key-value pairs options to pass to the model builder. e.g., ‘int4_is_symmetric=true,int4_op_types_to_quantize=MatMul/Gemm’.

Run Pass#

Run a single pass on the input model (supports HuggingFace, ONNX, PyTorch, and Azure ML models).

usage: olive run-pass [-h] [--pass-name PASS_NAME] [--pass-config PASS_CONFIG]
                      [--list-passes] [-m MODEL_NAME_OR_PATH] [-t TASK]
                      [--trust_remote_code] [-a ADAPTER_PATH]
                      [--model_script MODEL_SCRIPT] [--script_dir SCRIPT_DIR]
                      [-o OUTPUT_PATH] [--device {gpu,cpu,npu}]
                      [--provider {CPUExecutionProvider,CUDAExecutionProvider,DmlExecutionProvider,JsExecutionProvider,MIGraphXExecutionProvider,NvTensorRTRTXExecutionProvider,OpenVINOExecutionProvider,QNNExecutionProvider,ROCMExecutionProvider,TensorrtExecutionProvider,VitisAIExecutionProvider,WebGpuExecutionProvider}]
                      [--memory MEMORY] [--log_level LOG_LEVEL]
                      [--save_config_file] [--dry_run]

Named Arguments#

--pass-name

Name of the pass to run on the input model.

--pass-config

JSON string with pass-specific configuration parameters.

--list-passes

List all available passes and exit.

Default: False

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-t, --task

Task for which the huggingface model is used. Default task is text-generation-with-past.

--trust_remote_code

Trust remote code when loading a huggingface model.

Default: False

-a, --adapter_path

Path to the adapters weights saved after peft fine-tuning. Local folder or huggingface id.

--model_script

The script file containing the model definition. Required for the local PyTorch model.

--script_dir

The directory containing the local PyTorch model script file. See https://microsoft.github.io/Olive/reference/cli.html#model-script-file-information for more information.

-o, --output_path

Path to save the command output.

Default: run-pass-output

--device

Possible choices: gpu, cpu, npu

Target device to run the model. Default is cpu.

Default: 'cpu'

--provider

Possible choices: CPUExecutionProvider, CUDAExecutionProvider, DmlExecutionProvider, JsExecutionProvider, MIGraphXExecutionProvider, NvTensorRTRTXExecutionProvider, OpenVINOExecutionProvider, QNNExecutionProvider, ROCMExecutionProvider, TensorrtExecutionProvider, VitisAIExecutionProvider, WebGpuExecutionProvider

Execution provider to use for ONNX model. Default is CPUExecutionProvider.

Default: 'CPUExecutionProvider'

--memory

Memory limit for the accelerator in bytes. Default is None.

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

--save_config_file

Generate and save the config file for the command.

Default: False

--dry_run

Enable dry run mode. This will not perform any actual optimization but will validate the configuration.

Default: False

Finetune#

Fine-tune a model on a dataset using HuggingFace peft. Huggingface training arguments can be provided along with the defined options.

usage: olive finetune [-h] -m MODEL_NAME_OR_PATH [-t TASK]
                      [--trust_remote_code] [-o OUTPUT_PATH]
                      [--method {lora,qlora}] [--lora_r LORA_R]
                      [--lora_alpha LORA_ALPHA]
                      [--target_modules TARGET_MODULES]
                      [--torch_dtype {bfloat16,float16,float32}] -d DATA_NAME
                      [--train_subset TRAIN_SUBSET]
                      [--train_split TRAIN_SPLIT] [--eval_subset EVAL_SUBSET]
                      [--eval_split EVAL_SPLIT] [--data_files DATA_FILES]
                      [--text_field TEXT_FIELD | --text_template TEXT_TEMPLATE | --use_chat_template]
                      [--max_seq_len MAX_SEQ_LEN]
                      [--add_special_tokens ADD_SPECIAL_TOKENS]
                      [--max_samples MAX_SAMPLES] [--batch_size BATCH_SIZE]
                      [--input_cols INPUT_COLS [INPUT_COLS ...]]
                      [--account_name ACCOUNT_NAME]
                      [--container_name CONTAINER_NAME]
                      [--log_level LOG_LEVEL] [--save_config_file] [--dry_run]

Named Arguments#

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-t, --task

Task for which the huggingface model is used. Default task is text-generation-with-past.

--trust_remote_code

Trust remote code when loading a huggingface model.

Default: False

-o, --output_path

Path to save the command output.

Default: finetuned-adapter

--torch_dtype

Possible choices: bfloat16, float16, float32

The torch dtype to use for training.

Default: 'bfloat16'

-d, --data_name

The dataset name.

--train_subset

The subset to use for training.

--train_split

The split to use for training.

Default: 'train'

--eval_subset

The subset to use for evaluation.

--eval_split

The dataset split to evaluate on.

Default: ''

--data_files

The dataset files. If multiple files, separate by comma.

--text_field

The text field to use for fine-tuning.

--text_template

Template to generate text field from. E.g. ‘### Question: {prompt} n### Answer: {response}’

--use_chat_template

Use chat template for text field.

Default: False

--max_seq_len

Maximum sequence length for the data.

Default: 1024

--add_special_tokens

Whether to add special tokens during preprocessing.

Default: False

--max_samples

Maximum samples to select from the dataset.

Default: 256

--batch_size

Batch size.

Default: 1

--input_cols

List of input column names. Provide one or more names separated by space. Example: –input_cols sentence1 sentence2

--account_name

Azure storage account name for shared cache.

--container_name

Azure storage container name for shared cache.

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

--save_config_file

Generate and save the config file for the command.

Default: False

--dry_run

Enable dry run mode. This will not perform any actual optimization but will validate the configuration.

Default: False

LoRA options#

--method

Possible choices: lora, qlora

The method to use for fine-tuning

Default: 'lora'

--lora_r

LoRA R value.

Default: 64

--lora_alpha

LoRA alpha value.

Default: 16

--target_modules

The target modules for LoRA. If multiple, separate by comma.

Diffusion LoRA#

Train LoRA adapters for diffusion models (Stable Diffusion 1.5, SDXL, Flux). Supports both local image folders and HuggingFace datasets.

usage: olive diffusion-lora [-h] -m MODEL_NAME_OR_PATH [-o OUTPUT_PATH]
                            [--model_variant {auto,sd,sdxl,sd3,flux,sana}]
                            [-r LORA_R] [--alpha ALPHA]
                            [--lora_dropout LORA_DROPOUT]
                            [--target_modules TARGET_MODULES] [--dreambooth]
                            [--instance_prompt INSTANCE_PROMPT]
                            [--with_prior_preservation]
                            [--class_prompt CLASS_PROMPT]
                            [--class_data_dir CLASS_DATA_DIR]
                            [--num_class_images NUM_CLASS_IMAGES]
                            [--prior_loss_weight PRIOR_LOSS_WEIGHT]
                            [-d DATA_DIR] [--data_name DATA_NAME]
                            [--data_split DATA_SPLIT]
                            [--image_column IMAGE_COLUMN]
                            [--caption_column CAPTION_COLUMN]
                            [--base_resolution BASE_RESOLUTION]
                            [--max_train_steps MAX_TRAIN_STEPS]
                            [--learning_rate LEARNING_RATE]
                            [--train_batch_size TRAIN_BATCH_SIZE]
                            [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
                            [--mixed_precision {no,fp16,bf16}]
                            [--lr_scheduler {constant,linear,cosine,cosine_with_restarts,polynomial,constant_with_warmup}]
                            [--lr_warmup_steps LR_WARMUP_STEPS] [--seed SEED]
                            [--guidance_scale GUIDANCE_SCALE] [--merge_lora]
                            [--account_name ACCOUNT_NAME]
                            [--container_name CONTAINER_NAME]
                            [--log_level LOG_LEVEL] [--save_config_file]
                            [--dry_run]

Named Arguments#

-m, --model_name_or_path

HuggingFace model name or local path (e.g., ‘runwayml/stable-diffusion-v1-5’).

-o, --output_path

Output path for the LoRA adapter. Default: diffusion-lora-adapter.

Default: 'diffusion-lora-adapter'

--model_variant

Possible choices: auto, sd, sdxl, sd3, flux, sana

Type of diffusion model. Default: auto-detect.

Default: 'auto'

--merge_lora

Merge LoRA into base model instead of saving adapter only.

Default: False

--account_name

Azure storage account name for shared cache.

--container_name

Azure storage container name for shared cache.

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

--save_config_file

Generate and save the config file for the command.

Default: False

--dry_run

Enable dry run mode. This will not perform any actual optimization but will validate the configuration.

Default: False

LoRA options#

-r, --lora_r

LoRA rank. SD: 4-16, Flux: 16-64. Default: 16.

Default: 16

--alpha

LoRA alpha for scaling. Default: same as r.

--lora_dropout

LoRA dropout probability. Default: 0.0.

Default: 0.0

--target_modules

Target modules for LoRA (comma-separated). Default: auto-detect based on model type.

DreamBooth options#

--dreambooth

Enable DreamBooth training for learning specific subjects.

Default: False

--instance_prompt

Fixed prompt for all images in DreamBooth mode. Required when –dreambooth is set. Example: ‘a photo of sks dog’.

--with_prior_preservation

Enable prior preservation to prevent language drift. Requires –class_prompt.

Default: False

--class_prompt

Prompt for class images in prior preservation. Required when –with_prior_preservation is set. Example: ‘a photo of a dog’.

--class_data_dir

Directory containing class images. If not provided or has fewer than –num_class_images, images will be auto-generated.

--num_class_images

Number of class images for prior preservation. Default: 200.

Default: 200

--prior_loss_weight

Weight of prior preservation loss. Default: 1.0.

Default: 1.0

Data options#

-d, --data_dir

Path to local image folder with training images.

--data_name

HuggingFace dataset name (e.g., ‘linoyts/Tuxemon’).

--data_split

Dataset split to use. Default: train.

Default: 'train'

--image_column

Column name for images in HuggingFace dataset. Default: image.

Default: 'image'

--caption_column

Column name for captions in HuggingFace dataset.

--base_resolution

Base resolution for training. Auto-detected if model_variant is specified (SD1.5: 512, SDXL/Flux: 1024).

Training options#

--max_train_steps

Maximum training steps. Default: 1000.

Default: 1000

--learning_rate

Learning rate. Default: 1e-4.

Default: 0.0001

--train_batch_size

Training batch size. Default: 1.

Default: 1

--gradient_accumulation_steps

Gradient accumulation steps. Default: 4.

Default: 4

--mixed_precision

Possible choices: no, fp16, bf16

Mixed precision training. Default: bf16.

Default: 'bf16'

--lr_scheduler

Possible choices: constant, linear, cosine, cosine_with_restarts, polynomial, constant_with_warmup

Learning rate scheduler. Default: constant.

Default: 'constant'

--lr_warmup_steps

Learning rate warmup steps. Default: 0.

Default: 0

--seed

Random seed for reproducibility.

Flux options#

--guidance_scale

Guidance scale for Flux training. Default: 3.5.

Default: 3.5

Auto-Optimization#

Automatically optimize the input model for the given target and precision.

usage: olive auto-opt [-h] [-m MODEL_NAME_OR_PATH] [-t TASK]
                      [--trust_remote_code] [-a ADAPTER_PATH]
                      [--model_script MODEL_SCRIPT] [--script_dir SCRIPT_DIR]
                      [-o OUTPUT_PATH] [--device {gpu,cpu,npu}]
                      [--provider {CPUExecutionProvider,CUDAExecutionProvider,DmlExecutionProvider,JsExecutionProvider,MIGraphXExecutionProvider,NvTensorRTRTXExecutionProvider,OpenVINOExecutionProvider,QNNExecutionProvider,ROCMExecutionProvider,TensorrtExecutionProvider,VitisAIExecutionProvider,WebGpuExecutionProvider}]
                      [--memory MEMORY] [-d DATA_NAME] [--split SPLIT]
                      [--subset SUBSET] [--input_cols [INPUT_COLS ...]]
                      [--batch_size BATCH_SIZE]
                      [--precision {int4,int8,int16,int32,uint4,uint8,uint16,uint32,fp4,fp8,fp16,fp32,nf4,bf16}]
                      [--use_dynamo_exporter] [--use_model_builder]
                      [--use_qdq_encoding]
                      [--dynamic-to-fixed-shape-dim-param [DYNAMIC_TO_FIXED_SHAPE_DIM_PARAM ...]]
                      [--dynamic-to-fixed-shape-dim-value [DYNAMIC_TO_FIXED_SHAPE_DIM_VALUE ...]]
                      [--num-splits NUM_SPLITS | --cost-model COST_MODEL]
                      [--mixed-precision-overrides-config [MIXED_PRECISION_OVERRIDES_CONFIG ...]]
                      [--use_ort_genai] [--account_name ACCOUNT_NAME]
                      [--container_name CONTAINER_NAME]
                      [--log_level LOG_LEVEL] [--save_config_file] [--dry_run]

Named Arguments#

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-t, --task

Task for which the huggingface model is used. Default task is text-generation-with-past.

--trust_remote_code

Trust remote code when loading a huggingface model.

Default: False

-a, --adapter_path

Path to the adapters weights saved after peft fine-tuning. Local folder or huggingface id.

--model_script

The script file containing the model definition. Required for the local PyTorch model.

--script_dir

The directory containing the local PyTorch model script file. See https://microsoft.github.io/Olive/reference/cli.html#model-script-file-information for more information.

-o, --output_path

Path to save the command output.

Default: auto-opt-output

--device

Possible choices: gpu, cpu, npu

Target device to run the model. Default is cpu.

Default: 'cpu'

--provider

Possible choices: CPUExecutionProvider, CUDAExecutionProvider, DmlExecutionProvider, JsExecutionProvider, MIGraphXExecutionProvider, NvTensorRTRTXExecutionProvider, OpenVINOExecutionProvider, QNNExecutionProvider, ROCMExecutionProvider, TensorrtExecutionProvider, VitisAIExecutionProvider, WebGpuExecutionProvider

Execution provider to use for ONNX model. Default is CPUExecutionProvider.

Default: 'CPUExecutionProvider'

--memory

Memory limit for the accelerator in bytes. Default is None.

-d, --data_name

The dataset name.

--split

The dataset split to use for evaluation.

--subset

The dataset subset to use for evaluation.

--input_cols

The input columns to use for evaluation.

--batch_size

Batch size for evaluation.

Default: 1

--precision

Possible choices: int4, int8, int16, int32, uint4, uint8, uint16, uint32, fp4, fp8, fp16, fp32, nf4, bf16

The output precision of the optimized model. If not specified, the default precision is fp32 for cpu and fp16 for gpu

Default: fp32

--use_dynamo_exporter

Whether to use dynamo_export API to export ONNX model.

Default: False

--use_model_builder

Whether to use model builder pass for optimization, enable only when the model is supported by model builder

Default: False

--use_qdq_encoding

Whether to use QDQ encoding for quantized operators instead of ONNXRuntime contrib operators like MatMulNBits

Default: False

--dynamic-to-fixed-shape-dim-param

Symbolic parameter names to use for dynamic to fixed shape pass. Required only when using QNNExecutionProvider.

--dynamic-to-fixed-shape-dim-value

Symbolic parameter values to use for dynamic to fixed shape pass. Required only when using QNNExecutionProvider.

--num-splits

Number of splits to use for model splitting. Input model must be an HfModel.

--cost-model

Path to the cost model csv file to use for model splitting. Mutually exclusive with num-splits. Must be a csv with headers module,num_params,num_bytes,num_flops where each row corresponds to the name or a module (with no children), the number of parameters, the number of bytes, and the number of FLOPs(batch_size=1, seqlen=1) the module uses when in the desired precision.

--mixed-precision-overrides-config

Dictionary of name to precision. Has to be even number of entreis with even entries being the keys and odd entries being the values. Required only when output precision is “fp16” and MixedPrecisionOverrides pass is enabled.

--use_ort_genai

Use OnnxRuntime generate() API to run the model

Default: False

--account_name

Azure storage account name for shared cache.

--container_name

Azure storage container name for shared cache.

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

--save_config_file

Generate and save the config file for the command.

Default: False

--dry_run

Enable dry run mode. This will not perform any actual optimization but will validate the configuration.

Default: False

Generate Adapters#

Generate ONNX model with adapters as inputs.

usage: olive generate-adapter [-h] -m MODEL_NAME_OR_PATH [-o OUTPUT_PATH]
                              [--adapter_type {lora,dora,loha}]
                              [--adapter_format {pt,numpy,safetensors,onnx_adapter}]
                              [--log_level LOG_LEVEL] [--save_config_file]
                              [--dry_run] [--account_name ACCOUNT_NAME]
                              [--container_name CONTAINER_NAME]

Named Arguments#

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-o, --output_path

Path to save the command output.

Default: optimized-model

--adapter_type

Possible choices: lora, dora, loha

Type of adapters to extract. Default is lora.

Default: lora

--adapter_format

Possible choices: pt, numpy, safetensors, onnx_adapter

Format to save the weights in. Default is onnx_adapter.

Default: 'onnx_adapter'

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

--save_config_file

Generate and save the config file for the command.

Default: False

--dry_run

Enable dry run mode. This will not perform any actual optimization but will validate the configuration.

Default: False

--account_name

Azure storage account name for shared cache.

--container_name

Azure storage container name for shared cache.

Convert Adapters#

Convert LoRA adapter weights to a file that will be consumed by ONNX models generated by Olive ExtractedAdapters pass.

usage: olive convert-adapters [-h] -a ADAPTER_PATH
                              [--adapter_format {pt,numpy,safetensors,onnx_adapter}]
                              -o OUTPUT_PATH [--dtype {float32,float16}]
                              [--quantize_int4]
                              [--int4_block_size {16,32,64,128,256}]
                              [--int4_quantization_mode {symmetric,asymmetric}]
                              [--log_level LOG_LEVEL]

Named Arguments#

-a, --adapter_path

Path to the adapters weights saved after peft fine-tuning. Can be a local folder or huggingface id.

--adapter_format

Possible choices: pt, numpy, safetensors, onnx_adapter

Format to save the weights in. Default is onnx_adapter.

Default: 'onnx_adapter'

-o, --output_path

Path to save the exported weights. Will be saved in the adapter_format format.

--dtype

Possible choices: float32, float16

Data type to save float adapter weights as. If quantize_int4 is True, this is the data type of the quantization scales. Default is float32.

Default: 'float32'

--quantize_int4

Quantize the adapter weights to int4 using blockwise quantization.

Default: False

--int4_block_size

Possible choices: 16, 32, 64, 128, 256

Block size for int4 quantization of adapter weights. Default is 32.

Default: 32

--int4_quantization_mode

Possible choices: symmetric, asymmetric

Quantization mode for int4 quantization of adapter weights. Default is symmetric.

Default: 'symmetric'

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

Tune OnnxRuntime Session Params#

Automatically tune the OnnxRuntime session parameters for a given onnx model. Currently, for onnx model converted from huggingface model and used for generative tasks, user can simply provide the –model onnx_model_path –hf_model_name hf_model_name –device device_type to get the tuned session parameters.

usage: olive tune-session-params [-h] -m MODEL_NAME_OR_PATH [-o OUTPUT_PATH]
                                 [--cpu_cores CPU_CORES] [--io_bind]
                                 [--enable_cuda_graph]
                                 [--execution_mode_list [EXECUTION_MODE_LIST ...]]
                                 [--opt_level_list [OPT_LEVEL_LIST ...]]
                                 [--trt_fp16_enable]
                                 [--intra_thread_num_list [INTRA_THREAD_NUM_LIST ...]]
                                 [--inter_thread_num_list [INTER_THREAD_NUM_LIST ...]]
                                 [--extra_session_config EXTRA_SESSION_CONFIG]
                                 [--disable_force_evaluate_other_eps]
                                 [--enable_profiling]
                                 [--predict_with_kv_cache]
                                 [--device {gpu,cpu,npu}]
                                 [--providers_list [{CPUExecutionProvider,CUDAExecutionProvider,DmlExecutionProvider,JsExecutionProvider,MIGraphXExecutionProvider,NvTensorRTRTXExecutionProvider,OpenVINOExecutionProvider,QNNExecutionProvider,ROCMExecutionProvider,TensorrtExecutionProvider,VitisAIExecutionProvider,WebGpuExecutionProvider} ...]]
                                 [--memory MEMORY] [--log_level LOG_LEVEL]
                                 [--save_config_file] [--dry_run]
                                 [--account_name ACCOUNT_NAME]
                                 [--container_name CONTAINER_NAME]

Named Arguments#

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-o, --output_path

Path to save the command output.

Default: tuned-inference-settings

--cpu_cores

CPU cores used for thread tuning.

--io_bind

Whether enable IOBinding Search for ONNX Runtime inference.

Default: False

--enable_cuda_graph

Whether enable CUDA Graph for CUDA execution provider.

Default: False

--execution_mode_list

Parallelism list between operators.

--opt_level_list

Optimization level list for ONNX Model.

--trt_fp16_enable

Enable TensorRT FP16 mode.

Default: False

--intra_thread_num_list

List of intra thread number for test.

--inter_thread_num_list

List of inter thread number for test.

--extra_session_config

Extra customized session options during tuning process. It should be a json string.E.g. –extra_session_config ‘{“key1”: “value1”, “key2”: “value2”}’

--disable_force_evaluate_other_eps

Whether force to evaluate all execution providers which are different with the associated execution provider.

Default: False

--enable_profiling

Whether enable profiling for ONNX Runtime inference.

Default: False

--predict_with_kv_cache

Whether to use key-value cache for ORT session parameter tuning

Default: False

--device

Possible choices: gpu, cpu, npu

Target device to run the model. Default is cpu.

Default: 'cpu'

--providers_list

Possible choices: CPUExecutionProvider, CUDAExecutionProvider, DmlExecutionProvider, JsExecutionProvider, MIGraphXExecutionProvider, NvTensorRTRTXExecutionProvider, OpenVINOExecutionProvider, QNNExecutionProvider, ROCMExecutionProvider, TensorrtExecutionProvider, VitisAIExecutionProvider, WebGpuExecutionProvider

List of execution providers to use for ONNX model. They are case sensitive. If not provided, all available providers will be used.

--memory

Memory limit for the accelerator in bytes. Default is None.

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

--save_config_file

Generate and save the config file for the command.

Default: False

--dry_run

Enable dry run mode. This will not perform any actual optimization but will validate the configuration.

Default: False

--account_name

Azure storage account name for shared cache.

--container_name

Azure storage container name for shared cache.

Generate Cost Model for Model Splitting#

Generate a cost model for a given model and save it as a csv file. This cost model is consumed by the CaptureSplitInfo pass. Only supports HfModel.

usage: olive generate-cost-model [-h] -m MODEL_NAME_OR_PATH [-t TASK]
                                 [--trust_remote_code] [-o OUTPUT_PATH]
                                 [-p {fp32,fp16,fp8,int32,uint32,int16,uint16,int8,uint8,int4,uint4,nf4,fp4}]

Named Arguments#

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-t, --task

Task for which the huggingface model is used. Default task is text-generation-with-past.

--trust_remote_code

Trust remote code when loading a huggingface model.

Default: False

-o, --output_path

Path to save the command output.

Default: 'cost-model.csv'

-p, --weight_precision

Possible choices: fp32, fp16, fp8, int32, uint32, int16, uint16, int8, uint8, int4, uint4, nf4, fp4

Weight precision

Default: 'fp16'

Configure Qualcomm SDK#

Configure Qualcomm SDK.

usage: olive configure-qualcomm-sdk [-h] --py_version {3.6,3.8}

Named Arguments#

--py_version

Possible choices: 3.6, 3.8

Python version: Use 3.6 for tensorflow 1.15 and 3.8 otherwise

Shared Cache#

Delete Olive model cache stored in the cloud.

usage: olive shared-cache [-h] [--delete] [--all] [-y] --account ACCOUNT
                          --container CONTAINER [--model_hash MODEL_HASH]

Named Arguments#

--delete

Delete a model cache from the shared cache.

Default: False

--all

Delete all model cache from the cloud cache.

Default: False

-y, --yes

Confirm the deletion without prompting for confirmation.

Default: False

--account

The account name for the shared cache.

--container

The container name for the shared cache.

--model_hash

The model hash to remove from the shared cache.

Benchmark#

Run benchmarking using llm-eval.

usage: olive benchmark [-h] [-m MODEL_NAME_OR_PATH] [-t TASK]
                       [--trust_remote_code] [-a ADAPTER_PATH]
                       [--model_script MODEL_SCRIPT] [--script_dir SCRIPT_DIR]
                       [-o OUTPUT_PATH] --tasks [TASKS ...]
                       [--device {cpu,gpu}] [--batch_size BATCH_SIZE]
                       [--max_length MAX_LENGTH] [--limit LIMIT]
                       [--log_level LOG_LEVEL] [--save_config_file]
                       [--dry_run] [--account_name ACCOUNT_NAME]
                       [--container_name CONTAINER_NAME]

Named Arguments#

-m, --model_name_or_path

Path to the input model. See https://microsoft.github.io/Olive/reference/cli.html#providing-input-models for more information.

-t, --task

Task for which the huggingface model is used. Default task is text-generation-with-past.

--trust_remote_code

Trust remote code when loading a huggingface model.

Default: False

-a, --adapter_path

Path to the adapters weights saved after peft fine-tuning. Local folder or huggingface id.

--model_script

The script file containing the model definition. Required for the local PyTorch model.

--script_dir

The directory containing the local PyTorch model script file. See https://microsoft.github.io/Olive/reference/cli.html#model-script-file-information for more information.

-o, --output_path

Path to save the command output.

Default: onnx-model

--log_level

Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL

Default: 3

--save_config_file

Generate and save the config file for the command.

Default: False

--dry_run

Enable dry run mode. This will not perform any actual optimization but will validate the configuration.

Default: False

--account_name

Azure storage account name for shared cache.

--container_name

Azure storage container name for shared cache.

lm-eval evaluator options#

--tasks

List of tasks to evaluate on.

--device

Possible choices: cpu, gpu

Target device for evaluation.

Default: 'cpu'

--batch_size

Batch size.

Default: 1

--max_length

Maximum length of input + output.

Default: 1024

--limit

Number (or percentage of dataset) of samples to use for evaluation.

Default: 1

Providing Input Models#

There are more than one way to supply input model to the Olive commands.

  1. HuggingFace model can be directly used as an input model. For example -m microsoft/Phi-3-mini-4k-instruct.

  2. A model produced by a Olive command can be directly used as an input model. You can specify the model file path using the -m <output_model> option, where <output_model> is the output folder defined by -o <output_model> in the previous Olive command.

  3. Olive commands also accept a local PyTorch model as an input model. You can specify the model file path using the -m model.pt option, and the associated model script using the --model_script script.py option. For example, olive capture-onnx-graph -m model.pt --model_script script.py.

  4. A model from AzureML registry can be directly used as an input model. For example -m azureml://registries/<registry_name>/models/<model_name>/versions/<version>.

  5. An ONNX model available locally can also be used as an input for the Olive commands that accept ONNX model as an input.

Model Script File Information#

Olive commands support custom PyTorch model as an input. Olive requires users to define specific functions to load and process the custom PyTorch model. These functions should be defined in your model script you provide.

  • Model Loader Function (`_model_loader`): Loads the PyTorch model. If the model file path is provided using the -m option, it takes higher priority than the model loader function.

    def _model_loader():
        ...
        return model
    
  • IO Config Function (`_io_config`): Returns the IO configuration for the model. Either _io_config or _dummy_inputs is required for the capture-onnx-graph CLI command.

    def _io_config(model: PyTorchModelHandler):
        ...
        return io_config
    
  • Dummy Inputs Function (`_dummy_inputs`): Provides dummy input tensors for the model. Either _io_config or _dummy_inputs is required for the capture-onnx-graph CLI command.

    def _dummy_inputs(model: PyTorchModelHandler):
        ...
        return dummy_inputs
    
  • Model Format Function (`_model_file_format`): Specifies the format of the model. The default value is PyTorch.EntireModel. For more available options, refer to this.

    def _model_file_format():
        ...
        return model_file_format