BERT optimization with PTQ on CPU¶
This is a sample use case of Olive to optimize a Bert model using onnx conversion, onnx transformers optimization, onnx quantization tuner and performance tuning.
Performs optimization pipeline:
PyTorch Model -> Onnx Model -> Transformers Optimized Onnx Model -> Quantized Onnx Model -> Tune performance
Outputs the best metrics, model, and corresponding Olive config.
Prerequisites¶
Please go to example repository Quickstart Bert Example
Pip requirements¶
Install the necessary python packages:
python -m pip install -r requirements.txt
Run sample using config. The optimization techniques to run are specified in bert_config.json¶
First, install required packages according to passes.
python -m olive.workflows.run --config bert_config.json --setup
Then, optimize the model
python -m olive.workflows.run --config bert_config.json
or run simply with python code:
from olive.workflows import run as olive_run
olive_run("bert_config.json")
Optimize model automatically without selecting any optimization technique.¶
First, install required packages according to passes.
python -m olive.workflows.run --config auto_bert_config.json --setup
Then, optimize the model
python -m olive.workflows.run --config auto_bert_config.json
or run simply with python code:
from olive.workflows import run as olive_run
olive_run("auto_bert_config.json")
After running the above command, the model candidates and corresponding config will be saved in the output directory. You can then select the best model and config from the candidates and run the model with the selected config.