Quick Tour

Below is a quick guide to get the packages installed to use Olive for model optimization. We will start with a PyTorch model and then convert and quantize it to ONNX. If you are new to Olive and model optimization, we recommend checking the Design and Tutorials sections for more in-depth explanations.

Install Olive and dependencies

Before you begin, install Olive and the necessary packages.

pip install olive-ai

You will also need to install your preferred build of onnxruntime. Let’s choose the default CPU package for this tour.

pip install onnxruntime

Refer to the Installation section for more details.

Model Optimization Workflow

Olive model optimization workflows are defined using config JSON files. You can use the Olive CLI to run the pipeline:

python -m olive.workflows.run --config user_provided_info.json

or in python code:

from olive.workflows import run as olive_run
olive_run("user_provided_info.json")

Note

olive.workflows.run in python code also accepts python dictionary equivalent of the config JSON object.

Now, let’s take a look at the information you can provide to Olive to optimize your model.

Input Model

You provide input model location and type. PyTorchModel, ONNXModel, OpenVINOModel and SNPEModel are supported model types.

"input_model":{
    "type": "PyTorchModel",
    "config": {
        "model_path": "resnet.pt",
        "is_file": true
    }
}

Host and Target Systems

An optimization technique, which we call a Pass, can be run on a variety of host systems and the resulting model evaluated on desired target systems. More details for the available systems can be found at OliveSystems api reference.

In this guide, you will use your local system as both the hosts for passes and target for evaluation.

"systems": {
    "local_system": {"type": "LocalSystem"}
}

Evaluator

In order to chose the set of Pass configuration parameters that lead to the “best” model, Olive requires an evaluator that returns metrics values for each output model.

"evaluators": {
    "common_evaluator":{
        "metrics":[
            {
                "name": "latency",
                "type": "latency",
                "sub_type": "avg",
                "user_config":{
                    "user_script": "user_script.py",
                    "data_dir": "data",
                    "dataloader_func": "create_dataloader",
                    "batch_size": 16
                }
            }
        ],
        "target": "local_system"
    }
}

latency_metric requires you to provide a function as value for dataloader_func that returns a dataloader object when called on data_dir and batch_size. You can provide the function object directly but here, let’s give it a function name "create_dataloader" that can be imported from user_script.

This file has an example of how to write user scripts. Refer to How to write user_script for more details on user scripts.

You can provide more than one metric to the evaluator metrics list.

Engine

The engine which handles the auto-tuning process. You can select search strategy here.

"engine": {
    "cache_dir": ".cache"
    "search_strategy": {
        "execution_order": "joint",
        "search_algorithm": "exhaustive",
    }
}

Passes

You list the Passes that you want to apply on the input model. In this example, let us first convert the pytorch model to ONNX and quantize it.

"onnx_conversion": {
    "type": "OnnxConversion",
    "config": {
        "input_names": ["input"],
        "input_shapes": [[1, 3, 32, 32]],
        "output_names": ["output"],
        "dynamic_axes": {
            "input": {"0": "batch_size"},
            "output": {"0": "batch_size"}
        },
        "target_opset": 13
    },
    "host": {"type": "LocalSystem"}
}
"onnx_quantization": {
    "type": "OnnxDynamicQuantization",
    "config": {
        "user_script": "user_script.py",
        "data_dir": "data",
        "dataloader_func": "resnet_calibration_reader",
        "weight_type" : "QUInt8"
    },
    "default_to_search": true
}

Example JSON

Here is the complete json configuration file as we discussed above which you use to optimizer your input model using following command

python -m olive.workflows.run --config config.json
{
    "verbose": true,
    "input_model":{
        "type": "PyTorchModel",
        "config": {
            "model_path": "resnet.pt",
            "is_file": true
        }
    },
    "systems": {
        "local_system": {"type": "LocalSystem"}
    },
    "evaluators": {
        "common_evaluator":{
            "metrics":[
                {
                    "name": "latency",
                    "type": "latency",
                    "sub_type": "avg",
                    "user_config":{
                        "user_script": "user_script.py",
                        "data_dir": "data",
                        "dataloader_func": "create_dataloader",
                        "batch_size": 16
                    }
                }
            ],
            "target": "local_system"
        }
    },
    "passes": {
        "onnx_conversion": {
            "type": "OnnxConversion",
            "config": {
                "input_names": ["input"],
                "input_shapes": [[1, 3, 32, 32]],
                "output_names": ["output"],
                "dynamic_axes": {
                    "input": {"0": "batch_size"},
                    "output": {"0": "batch_size"}
                },
                "target_opset": 13
            },
            "host": {"type": "LocalSystem"}
        },
        "onnx_quantization": {
            "type": "OnnxDynamicQuantization",
            "config": {
                "user_script": "user_script.py",
                "data_dir": "data",
                "dataloader_func": "resnet_calibration_reader",
                "weight_type" : "QUInt8"
            },
            "default_to_search": true
        }
    },
    "engine": {
        "search_strategy": {
            "execution_order": "joint",
            "search_algorithm": "exhaustive"
        },
        "evaluator": "common_evaluator",
        "host": {"type": "LocalSystem"},
    }
}