How To Use optimize Command#

The olive optimize command optimizes a PyTorch/Hugging Face model so that it runs with quality and efficiency on the ONNX Runtime.

Quickstart#

The Olive optimization command (optimize) can pull models from Hugging Face, Local disk, or the Azure AI Model Catalog. Following optimize command that will download the model, quantize models weights to use int4, convert the model to ONNX and optimize the ONNX graph.

olive optimize \
    --model_name_or_path microsoft/Phi-3.5-mini-instruct \
    --device cpu \
    --provider CPUExecutionProvider \
    --precision int4 \

Optimize NPU models#

You can use olive optimize command to optimize a model for NPUs.

olive optimize \
    --model_name_or_path microsoft/Phi-3.5-mini-instruct \
    --precision int4 \
    --act_precision int8 \
    --provider QNNExecutionProvider \

This command will quantize weights into int4 precision before converting the model into ONNX format. The model will be further processed to use int8 precision for activation and use static shapes.

Customizing model optimization process#

olive optimize primarily requests desired model precision and intended ExecutionProvider that will be used to run the optimized model. Based on these information, olive optimize command will generate model optimiation recipe as per user request and execute the recipe to produce to output model. Advanced users can use --dry_run option to save the config.json file on the disk. See comprehensive list of options you can use to customize the model optimization process further by modifying the config.json file produced by the olive optimize --dry_run ... command.

Additional details#

See olive optimize reference for the complete list of supported options.