BERT optimization with PTQ on CPU

This is a sample use case of olive to optimize a Bert model using onnx conversion, onnx transformers optimization, onnx quantization tuner and performance tuning.

Performs optimization pipeline:

  • PyTorch Model -> Onnx Model -> Transformers Optimized Onnx Model -> Quantized Onnx Model -> Tune performance

Outputs the best metrics, model, and corresponding Olive config.

Prerequisites

Please go to example repository Quickstart Bert Example

Pip requirements

Install the necessary python packages:

python -m pip install -r requirements.txt

Run sample using config. The optimization techniques to run are specified in bert_config.json

python -m olive.workflows.run --config bert_config.json

or run simply with python code:

from olive.workflows import run as olive_run
olive_run("bert_config.json")

Optimize model automatically without selecting any optimization technique.

python -m olive.workflows.run --config auto_bert_config.json

or run simply with python code:

from olive.workflows import run as olive_run
olive_run("auto_bert_config.json")