Command Line Tools¶
Olive provides command line tools that can be invoked using the olive
command. |
The command line tools are used to perform various tasks such as running an Olive workflow, |
managing AzureML compute, and more.
If olive
is not in your PATH, you can run the command line tools by replacing olive
with python -m olive
.
Input Model¶
Olive Cli Procuded Model¶
The Olive command-line tools support using a model produced by Olive CLI as an input model. You can specify the model file path using the -m <output_model>
option, where <output_model>
is the output folder defined by -o <output_model>
in the previous cli command.
Local PyTorch Model¶
Olive command line tools accept a local PyTorch model as an input model. You can specify the model file path using the -m model.pt
option, and the associated model script using the --model_script script.py
option.
Olive reserves several function names to provide specific inputs for the PyTorch model. These functions should be defined in your model script:
Available Functions¶
Below are the functions that Olive expects in the model script and their purposes:
Model Loader Function (`_model_loader`): Loads the PyTorch model. If the model file path is provided using the -m option, it takes higher priority than the model loader function.
def _model_loader(): ... return model
IO Config Function (`_io_config`): Returns the IO configuration for the model. Either _io_config or _dummy_inputs is required for the capture-onnx-graph CLI command.
def _io_config(model: PyTorchModelHandler): ... return io_config
Dummy Inputs Function (`_dummy_inputs`): Provides dummy input tensors for the model. Either _io_config or _dummy_inputs is required for the capture-onnx-graph CLI command.
def _dummy_inputs(model: PyTorchModelHandler): ... return dummy_inputs
Model Format Function (`_model_file_format`): Specifies the format of the model. The default value is PyTorch.EntireModel. For more available options, refer to this.
def _model_file_format(): ... return model_file_format
Example Usage¶
To use the Olive CLI with a local PyTorch model:
Provide the model path and the script:
python -m olive capture-onnx-graph -m model.pt --model_script script.py
Ensure that the script contains the above functions to handle loading, input/output configuration, dummy inputs, and model format specification as needed.
Argparse Documentation¶
Below is the argparse documentation for the Olive command-line interface:
usage: olive
Sub-commands¶
capture-onnx-graph¶
Capture ONNX graph using PyTorch Exporter or Model Builder from the Huggingface model.
olive capture-onnx-graph [-h] [--log_level LOG_LEVEL] [-m MODEL_NAME_OR_PATH]
[--trust_remote_code] [-t TASK]
[--model_script MODEL_SCRIPT]
[--script_dir SCRIPT_DIR] [--device {cpu,gpu}]
[-o OUTPUT_PATH] [--tempdir TEMPDIR]
[--use_dynamo_exporter] [--use_ort_genai]
[--past_key_value_name PAST_KEY_VALUE_NAME]
[--torch_dtype TORCH_DTYPE]
[--target_opset TARGET_OPSET] [--use_model_builder]
[--precision {fp16,fp32,int4}]
[--int4_block_size {16,32,64,128,256}]
[--int4_accuracy_level INT4_ACCURACY_LEVEL]
[--exclude_embeds EXCLUDE_EMBEDS]
[--exclude_lm_head EXCLUDE_LM_HEAD]
[--enable_cuda_graph ENABLE_CUDA_GRAPH]
[--resource_group RESOURCE_GROUP]
[--workspace_name WORKSPACE_NAME]
[--keyvault_name KEYVAULT_NAME]
[--aml_compute AML_COMPUTE]
Named Arguments¶
- --device
Possible choices: cpu, gpu
The device to use to convert the model to ONNX.If ‘gpu’ is selected, the execution_providers will be set to CUDAExecutionProvider.If ‘cpu’ is selected, the execution_providers will be set to CPUExecutionProvider.For PyTorch Exporter, the device is used to cast the model to before capturing the ONNX graph.
Default: “cpu”
- -o, --output_path
Output path
Default: “onnx-model”
- --tempdir
Root directory for tempfile directories and files
logging options¶
- --log_level
Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL
Default: 3
Model options¶
- -m, --model_name_or_path
The model checkpoint for weights initialization. If using an AzureML Registry model, provide the model path as ‘registry_name:model_name:version’.
- --trust_remote_code
Trust remote code when loading a model.
Default: False
- -t, --task
Task for which the model is used.
- --model_script
The script file containing the model definition. Required for PyTorch model.
- --script_dir
The directory containing the model script file.
PyTorch Exporter options¶
- --use_dynamo_exporter
Whether to use dynamo_export API to export ONNX model.
Default: False
- --use_ort_genai
Use OnnxRuntie generate() API to run the model
Default: False
- --past_key_value_name
The arguments name to point to past key values. For model loaded from huggingface, it is ‘past_key_values’. Basically, it is used only when use_dynamo_exporter is True.
Default: “past_key_values”
- --torch_dtype
The dtype to cast the model to before capturing the ONNX graph, e.g., ‘float32’ or ‘float16’. If not specified will use the model as is.
- --target_opset
The target opset version for the ONNX model. Default is 17.
Default: 17
Model Builder options¶
- --use_model_builder
Whether to use Model Builder to capture ONNX model.
Default: False
- --precision
Possible choices: fp16, fp32, int4
The precision of the ONNX model. This is used by Model Builder
Default: “fp16”
- --int4_block_size
Possible choices: 16, 32, 64, 128, 256
Specify the block_size for int4 quantization. Acceptable values: 16/32/64/128/256.
- --int4_accuracy_level
Specify the minimum accuracy level for activation of MatMul in int4 quantization.
- --exclude_embeds
Remove embedding layer from your ONNX model.
Default: False
- --exclude_lm_head
Remove language modeling head from your ONNX model.
Default: False
- --enable_cuda_graph
The model can use CUDA graph capture for CUDA execution provider. If enabled, all nodes being placed on the CUDA EP is the prerequisite for the CUDA graph to be used correctly.
remote options¶
- --resource_group
Resource group for the AzureML workspace.
- --workspace_name
Workspace name for the AzureML workspace.
- --keyvault_name
The azureml keyvault name with huggingface token to use for remote run. Refer to https://microsoft.github.io/Olive/features/huggingface_model_optimization.html#huggingface-login for more details.
- --aml_compute
The compute name to run the workflow on.
run¶
Run an olive workflow
olive run [-h] [--package-config PACKAGE_CONFIG] --run-config RUN_CONFIG
[--setup] [--packages] [--tempdir TEMPDIR]
Named Arguments¶
- --package-config
For advanced users. Path to optional package (json) config file with location of individual pass module implementation and corresponding dependencies. Configuration might also include user owned/proprietary/private pass implementations.
- --run-config, --config
Path to json config file
- --setup
Whether run environment setup
Default: False
- --packages
List required packages
Default: False
- --tempdir
Root directory for tempfile directories and files
finetune¶
Fine-tune a model on a dataset using peft and optimize the model for ONNX Runtime with adapters as inputs. Huggingface training arguments can be provided along with the defined options.
olive finetune [-h] [--log_level LOG_LEVEL] [--precision {float16,float32}]
[-m MODEL_NAME_OR_PATH] [--trust_remote_code] [-t TASK]
[--model_script MODEL_SCRIPT] [--script_dir SCRIPT_DIR]
[--torch_dtype {bfloat16,float16,float32}] [--use_ort_genai] -d
DATA_NAME [--data_files DATA_FILES] [--train_split TRAIN_SPLIT]
[--eval_split EVAL_SPLIT]
(--text_field TEXT_FIELD | --text_template TEXT_TEMPLATE)
[--max_seq_len MAX_SEQ_LEN] [--method {lora,qlora}]
[--lora_r LORA_R] [--lora_alpha LORA_ALPHA]
[--target_modules TARGET_MODULES] [-o OUTPUT_PATH]
[--tempdir TEMPDIR] [--clean] [--resource_group RESOURCE_GROUP]
[--workspace_name WORKSPACE_NAME]
[--keyvault_name KEYVAULT_NAME] [--aml_compute AML_COMPUTE]
Named Arguments¶
- --precision
Possible choices: float16, float32
The precision of the optimized model and adapters.
Default: “float16”
- --torch_dtype
Possible choices: bfloat16, float16, float32
The torch dtype to use for training.
Default: “bfloat16”
- --use_ort_genai
Use OnnxRuntie generate() API to run the model
Default: False
- -o, --output_path
Output path
Default: “optimized-model”
- --tempdir
Root directory for tempfile directories and files
- --clean
Run in a clean cache directory
Default: False
logging options¶
- --log_level
Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL
Default: 3
Model options¶
- -m, --model_name_or_path
The model checkpoint for weights initialization. If using an AzureML Registry model, provide the model path as ‘registry_name:model_name:version’.
- --trust_remote_code
Trust remote code when loading a model.
Default: False
- -t, --task
Task for which the model is used.
- --model_script
The script file containing the model definition. Required for PyTorch model.
- --script_dir
The directory containing the model script file.
dataset options¶
- -d, --data_name
The dataset name.
- --data_files
The dataset files. If multiple files, separate by comma.
- --train_split
The split to use for training.
Default: “train”
- --eval_split
The dataset split to evaluate on.
Default: “”
- --text_field
The text field to use for fine-tuning.
- --text_template
Template to generate text field from. E.g. ‘### Question: {prompt} n### Answer: {response}’
- --max_seq_len
Maximum sequence length for the data.
Default: 1024
lora options¶
- --method
Possible choices: lora, qlora
The method to use for fine-tuning
Default: “lora”
- --lora_r
LoRA R value.
Default: 64
- --lora_alpha
LoRA alpha value.
Default: 16
- --target_modules
The target modules for LoRA. If multiple, separate by comma.
remote options¶
- --resource_group
Resource group for the AzureML workspace.
- --workspace_name
Workspace name for the AzureML workspace.
- --keyvault_name
The azureml keyvault name with huggingface token to use for remote run. Refer to https://microsoft.github.io/Olive/features/huggingface_model_optimization.html#huggingface-login for more details.
- --aml_compute
The compute name to run the workflow on.
export-adapters¶
Export lora adapter weights to a file that will be consumed by ONNX models generated by Olive ExtractedAdapters pass.
olive export-adapters [-h] --adapter_path ADAPTER_PATH
[--save_format {pt,numpy,safetensors}] --output_path
OUTPUT_PATH [--dtype {float32,float16}] [--pack_weights]
[--quantize_int4] [--int4_block_size {16,32,64,128,256}]
[--int4_quantization_mode {symmetric,asymmetric}]
Named Arguments¶
- --adapter_path
Path to the adapters weights saved after peft fine-tuning. Can be a local folder or huggingface id.
- --save_format
Possible choices: pt, numpy, safetensors
Format to save the weights in. Default is numpy.
Default: “numpy”
- --output_path
Path to save the exported weights. Will be saved in the save_format format.
- --dtype
Possible choices: float32, float16
Data type to save float weights as. If quantize_int4 is True, this is the data type of the quantization scales. Default is float32.
Default: “float32”
- --pack_weights
Whether to pack the weights. If True, the weights for each module type will be packed into a single array.
Default: False
- --quantize_int4
Quantize the weights to int4 using blockwise quantization.
Default: False
int4 quantization options¶
- --int4_block_size
Possible choices: 16, 32, 64, 128, 256
Block size for int4 quantization. Default is 32.
Default: 32
- --int4_quantization_mode
Possible choices: symmetric, asymmetric
Quantization mode for int4 quantization. Default is symmetric.
Default: “symmetric”
configure-qualcomm-sdk¶
Configure Qualcomm SDK for Olive
olive configure-qualcomm-sdk [-h] --py_version {3.6,3.8} --sdk {snpe,qnn}
Named Arguments¶
- --py_version
Possible choices: 3.6, 3.8
Python version: Use 3.6 for tensorflow 1.15 and 3.8 otherwise
- --sdk
Possible choices: snpe, qnn
Qualcomm SDK: snpe or qnn
manage-aml-compute¶
Create new compute in your AzureML workspace
olive manage-aml-compute [-h] (--create | --delete)
[--subscription_id SUBSCRIPTION_ID]
[--resource_group RESOURCE_GROUP]
[--workspace_name WORKSPACE_NAME]
[--aml_config_path AML_CONFIG_PATH] --compute_name
COMPUTE_NAME [--vm_size VM_SIZE]
[--location LOCATION] [--min_nodes MIN_NODES]
[--max_nodes MAX_NODES]
[--idle_time_before_scale_down IDLE_TIME_BEFORE_SCALE_DOWN]
Named Arguments¶
- --create, -c
Create new compute
Default: False
- --delete, -d
Delete existing compute
Default: False
- --subscription_id
Azure subscription ID
- --resource_group
Name of the Azure resource group
- --workspace_name
Name of the AzureML workspace
- --aml_config_path
Path to AzureML config file. If provided, subscription_id, resource_group and workspace_name are ignored
- --compute_name
Name of the new compute
- --vm_size
VM size of the new compute. This is required if you are creating a compute instance
- --location
Location of the new compute. This is required if you are creating a compute instance
- --min_nodes
Minimum number of nodes
Default: 0
- --max_nodes
Maximum number of nodes
Default: 2
- --idle_time_before_scale_down
Idle seconds before scaledown
Default: 120
tune-session-params¶
Automatically tune the session parameters for a given onnx model. Currently, for onnx model converted from huggingface model and used for generative tasks, user can simply provide the –model onnx_model_path –hf_model_name hf_model_name –device device_type to get the tuned session parameters.
olive tune-session-params [-h] [--log_level LOG_LEVEL] [-m MODEL_NAME_OR_PATH]
[--trust_remote_code] [-t TASK]
[--model_script MODEL_SCRIPT]
[--script_dir SCRIPT_DIR]
[--data_config_path DATA_CONFIG_PATH]
[--predict_with_kv_cache]
[--hf_model_name HF_MODEL_NAME]
[--batch_size BATCH_SIZE] [--seq_len SEQ_LEN]
[--past_seq_len PAST_SEQ_LEN]
[--max_seq_len MAX_SEQ_LEN] [--shared_kv]
[--generative]
[--ort_past_key_name ORT_PAST_KEY_NAME]
[--ort_past_value_name ORT_PAST_VALUE_NAME]
[--max_samples MAX_SAMPLES]
[--fields_no_batch [FIELDS_NO_BATCH [FIELDS_NO_BATCH ...]]]
[--device {gpu,cpu}] [--cpu_cores CPU_CORES]
[--io_bind] [--enable_cuda_graph]
[--providers_list [PROVIDERS_LIST [PROVIDERS_LIST ...]]]
[--execution_mode_list [EXECUTION_MODE_LIST [EXECUTION_MODE_LIST ...]]]
[--opt_level_list [OPT_LEVEL_LIST [OPT_LEVEL_LIST ...]]]
[--trt_fp16_enable]
[--intra_thread_num_list [INTRA_THREAD_NUM_LIST [INTRA_THREAD_NUM_LIST ...]]]
[--inter_thread_num_list [INTER_THREAD_NUM_LIST [INTER_THREAD_NUM_LIST ...]]]
[--extra_session_config EXTRA_SESSION_CONFIG]
[--disable_force_evaluate_other_eps]
[--enable_profiling] [--output_path OUTPUT_PATH]
[--tempdir TEMPDIR]
[--resource_group RESOURCE_GROUP]
[--workspace_name WORKSPACE_NAME]
[--keyvault_name KEYVAULT_NAME]
[--aml_compute AML_COMPUTE]
Named Arguments¶
- --output_path
Path to save the tuned inference settings.
Default: “perf_tuning_output”
- --tempdir
Root directory for tempfile directories and files
logging options¶
- --log_level
Logging level. Default is 3. level 0: DEBUG, 1: INFO, 2: WARNING, 3: ERROR, 4: CRITICAL
Default: 3
Model options¶
- -m, --model_name_or_path
The model checkpoint for weights initialization. If using an AzureML Registry model, provide the model path as ‘registry_name:model_name:version’.
- --trust_remote_code
Trust remote code when loading a model.
Default: False
- -t, --task
Task for which the model is used.
- --model_script
The script file containing the model definition. Required for PyTorch model.
- --script_dir
The directory containing the model script file.
dataset options, which mutually exclusive with huggingface dataset options¶
- --data_config_path
Path to the data config file. It allows to customize the data config(json/yaml) for the model.
huggingface dataset options, if dataset options are not provided, user should provide the following options to modify the default data config. Please refer to olive.data.container.TransformersTokenDummyDataContainer for more details.¶
- --predict_with_kv_cache
Whether to use key-value cache for perf_tuning
Default: False
- --hf_model_name
Huggingface model name used to load model configs from huggingface.
- --batch_size
Batch size of the input data.
- --seq_len
Sequence length to use for the input data.
- --past_seq_len
Past sequence length to use for the input data.
- --max_seq_len
Max sequence length to use for the input data.
- --shared_kv
Whether to enable share kv cache in the input data.
Default: False
- --generative
Whether to enable generative mode in the input data.
Default: False
- --ort_past_key_name
Past key name for the input data.
- --ort_past_value_name
Past value name for the input data.
- --max_samples
Max samples to use for the input data.
- --fields_no_batch
List of fields that should not be batched.
pass options¶
- --device
Possible choices: gpu, cpu
Device to use for the model.
Default: “cpu”
- --cpu_cores
CPU cores used for thread tuning.
- --io_bind
Whether enable IOBinding Search for ONNX Runtime inference.
Default: False
- --enable_cuda_graph
Whether enable CUDA Graph for CUDA execution provider.
Default: False
- --providers_list
List of execution providers to use for ONNX model. They are case sensitive. If not provided, all available providers will be used.
- --execution_mode_list
Parallelism list between operators.
- --opt_level_list
Optimization level list for ONNX Model.
- --trt_fp16_enable
Enable TensorRT FP16 mode.
Default: False
- --intra_thread_num_list
List of intra thread number for test.
- --inter_thread_num_list
List of inter thread number for test.
- --extra_session_config
Extra customized session options during tuning process. It should be a json string.E.g. –extra_session_config ‘{“key1”: “value1”, “key2”: “value2”}’
- --disable_force_evaluate_other_eps
Whether force to evaluate all execution providers which are different with the associated execution provider.
Default: False
- --enable_profiling
Whether enable profiling for ONNX Runtime inference.
Default: False
remote options¶
- --resource_group
Resource group for the AzureML workspace.
- --workspace_name
Workspace name for the AzureML workspace.
- --keyvault_name
The azureml keyvault name with huggingface token to use for remote run. Refer to https://microsoft.github.io/Olive/features/huggingface_model_optimization.html#huggingface-login for more details.
- --aml_compute
The compute name to run the workflow on.
cloud-cache¶
Cloud cache model operations
olive cloud-cache [-h] [--delete] --account ACCOUNT --container CONTAINER
--model_hash MODEL_HASH
Named Arguments¶
- --delete
Delete a model cache from the cloud cache.
Default: False
- --account
The account name for the cloud cache.
- --container
The container name for the cloud cache.
- --model_hash
The model hash to remove from the cloud cache.