BERT optimization with PTQ on CPU¶
This is a sample use case of olive to optimize a Bert model using onnx conversion, onnx transformers optimization, onnx quantization tuner and performance tuning.
Performs optimization pipeline:
PyTorch Model -> Onnx Model -> Transformers Optimized Onnx Model -> Quantized Onnx Model -> Tune performance
Outputs the best metrics, model, and corresponding Olive config.
Prerequisites¶
Please go to example repository Quickstart Bert Example
Pip requirements¶
Install the necessary python packages:
python -m pip install -r requirements.txt
Run sample using config. The optimization techniques to run are specified in bert_config.json¶
python -m olive.workflows.run --config bert_config.json
or run simply with python code:
from olive.workflows import run as olive_run
olive_run("bert_config.json")
Optimize model automatically without selecting any optimization technique.¶
python -m olive.workflows.run --config auto_bert_config.json
or run simply with python code:
from olive.workflows import run as olive_run
olive_run("auto_bert_config.json")