Quick Tour

Below is a quick guide to get the packages installed to use Olive for model optimization. We will start with a PyTorch model and then convert and quantize it to ONNX. If you are new to Olive and model optimization, we recommend checking the Design and Tutorials sections for more in-depth explanations.

Install Olive and dependencies

Before you begin, install Olive and the necessary packages.

pip install olive-ai

You will also need to install your preferred build of onnxruntime. Let’s choose the default CPU package for this tour.

pip install onnxruntime

Refer to the Installation section for more details.

Model Optimization Workflow

Olive model optimization workflows are defined using config JSON files. You can use the Olive CLI to run the pipeline:

First, install required packages according to passes.

python -m olive.workflows.run --config user_provided_info.json --setup

Then, optimize the model

python -m olive.workflows.run --config user_provided_info.json

or in python code:

from olive.workflows import run as olive_run
olive_run("user_provided_info.json")

Note

olive.workflows.run in python code also accepts python dictionary equivalent of the config JSON object.

Now, let’s take a look at the information you can provide to Olive to optimize your model.

Input Model

You provide input model location and type. PyTorchModel, ONNXModel, OpenVINOModel and SNPEModel are supported model types.

"input_model":{
    "type": "PyTorchModel",
    "config": {
        "model_path": "resnet.pt",
        "model_storage_kind": "file",
        "io_config": {
            "input_names": ["input"],
            "input_shapes": [[1, 3, 32, 32]],
            "output_names": ["output"],
            "dynamic_axes": {
                "input": {"0": "batch_size"},
                "output": {"0": "batch_size"}
            }
        }
    }
}

Host and Target Systems

An optimization technique, which we call a Pass, can be run on a variety of host systems and the resulting model evaluated on desired target systems. More details for the available systems can be found at OliveSystems api reference.

In this guide, you will use your local system as both the hosts for passes and target for evaluation.

"systems": {
    "local_system": {"type": "LocalSystem"}
}

Evaluator

In order to chose the set of Pass configuration parameters that lead to the “best” model, Olive requires an evaluator that returns metrics values for each output model.

"evaluators": {
    "common_evaluator":{
        "metrics":[
            {
                "name": "latency",
                "type": "latency",
                "sub_type": "avg",
                "user_config":{
                    "user_script": "user_script.py",
                    "data_dir": "data",
                    "dataloader_func": "create_dataloader",
                    "batch_size": 16
                }
            }
        ]
    }
}

latency_metric requires you to provide a function as value for dataloader_func that returns a dataloader object when called on data_dir and batch_size. You can provide the function object directly but here, let’s give it a function name "create_dataloader" that can be imported from user_script.

This file has an example of how to write user scripts. Refer to How to write user_script for more details on user scripts.

You can provide more than one metric to the evaluator metrics list.

Engine

The engine which handles the auto-tuning process. You can select search strategy here.

"engine": {
    "host": {"type": "LocalSystem"},
    "target": {"type": "LocalSystem"},
    "cache_dir": ".cache",
    "search_strategy": {
        "execution_order": "joint",
        "search_algorithm": "exhaustive",
    }
}

Passes

You list the Passes that you want to apply on the input model. In this example, let us first convert the pytorch model to ONNX and quantize it.

"onnx_conversion": {
    "type": "OnnxConversion",
    "config": {
        "target_opset": 13
    }
}
"onnx_quantization": {
    "type": "OnnxDynamicQuantization",
    "config": {
        "user_script": "user_script.py",
        "data_dir": "data",
        "dataloader_func": "resnet_calibration_reader",
        "weight_type" : "QUInt8"
    }
}

Example JSON

Here is the complete json configuration file as we discussed above which you use to optimizer your input model using following command

python -m olive.workflows.run --config config.json
{
    "verbose": true,
    "input_model":{
        "type": "PyTorchModel",
        "config": {
            "model_path": "resnet.pt",
            "model_storage_kind": "file",
            "io_config": {
                "input_names": ["input"],
                "input_shapes": [[1, 3, 32, 32]],
                "output_names": ["output"],
                "dynamic_axes": {
                    "input": {"0": "batch_size"},
                    "output": {"0": "batch_size"}
                }
            }
        }
    },
    "systems": {
        "local_system": {"type": "LocalSystem"}
    },
    "evaluators": {
        "common_evaluator":{
            "metrics":[
                {
                    "name": "latency",
                    "type": "latency",
                    "sub_type": "avg",
                    "user_config":{
                        "user_script": "user_script.py",
                        "data_dir": "data",
                        "dataloader_func": "create_dataloader",
                        "batch_size": 16
                    }
                }
            ]
        }
    },
    "passes": {
        "onnx_conversion": {
            "type": "OnnxConversion",
            "config": {
                "target_opset": 13
            }
        },
        "onnx_quantization": {
            "type": "OnnxDynamicQuantization",
            "config": {
                "user_script": "user_script.py",
                "data_dir": "data",
                "dataloader_func": "resnet_calibration_reader",
                "weight_type" : "QUInt8"
            }
        }
    },
    "engine": {
        "search_strategy": {
            "execution_order": "joint",
            "search_algorithm": "exhaustive"
        },
        "evaluator": "common_evaluator",
        "host": {"type": "LocalSystem"},
        "target": {"type": "LocalSystem"}
    }
}

Olive Footprint

When the optimization process is complete, Olive will generate a report(json) under the output_dir if you specified already in engine.run. The report contains the:

  • footprints.json: A dictionary of all the footprints generated during the optimization process. The structure of footprints value is:

class FootprintNode(ConfigBase):
    # None for no parent which means current model is the input model
    parent_model_id: str = None
    model_id: str
    model_config: Dict = None
    from_pass: str = None
    pass_run_config: Dict = None
    is_pareto_frontier: bool = False
    metrics: FootprintNodeMetric = FootprintNodeMetric()
    date_time: float = datetime.now().timestamp()

class FootprintNodeMetric(ConfigBase):
    """
    value: {"metric_name": metrics_value, ...}
    cmp_direction: will be auto suggested. The format will be like: {"metric_name": 1, ...},
        1: higher is better, -1: lower is better
    is_goals_met: if the goals set by users is met
    """
    value: Dict = None
    cmp_direction: Dict = None
    is_goals_met: bool = False
  • pareto_frontier_footprints.json: A dictionary of the footprints that are on the Pareto frontier based on the metrics goal you set in config of evaluators.metrics.

Here is an example of that:

{
    "24_OrtTransformersOptimization-23-28b039f9e50b7a04f9ab69bcfe75b9a2": {
        "parent_model_id": "23_OnnxConversion-9d98a0131bcdfd593432adfa2190016b-fa609d8c8586ea9b21b129a124e3fdb0",
        "model_id": "24_OrtTransformersOptimization-23-28b039f9e50b7a04f9ab69bcfe75b9a2",
        "model_config": {
            "type": "ONNXModel",
            "config": {
                "model_path": "path",
                "name": null,
                "model_storage_kind": "file",
                "version": null,
                "inference_settings": null
            }
        },
        "from_pass": "OrtTransformersOptimization",
        "pass_run_config": {
            "model_type": "bert",
            "num_heads": 0,
            "hidden_size": 0,
            "optimization_options": null,
            "opt_level": null,
            "use_gpu": false,
            "only_onnxruntime": false,
            "float16": false,
            "input_int32": false,
            "use_external_data_format": false
        },
        "is_pareto_frontier": true,
        "metrics": {
            "value": {
                "accuracy": 0.8602941036224365,
                "latency": 87.4454
            },
            "cmp_direction": {
                "accuracy": 1,
                "latency": -1
            },
            "is_goals_met": true
        },
        "date_time": 1681211541.682187
    }
}

You can also call the following methods to plot the Pareto frontier footprints. Also please make sure you installed plotly successfully.

from olive.engine.footprint import Footprint
footprint = Footprint.from_file("footprints.json")
footprint.plot_pareto_frontier()