mlos_bench
mlos_bench is a framework to help automate benchmarking and OS/application parameter autotuning and the data management of the results.
Overview
mlos_bench
can be installed from pypi
via pip install mlos-bench
and executed using the mlos_bench
command using a collection of json configs.
It is intended to be used with mlos_core
via
MlosCoreOptimizer
to help
navigate complex parameter spaces more efficiently, though other
optimizers
are also available to help customize the search
process easily by simply swapping out the
Optimizer
class in the associated
json configs. For instance,
GridSearchOptimizer
can be
used to perform a grid search over the parameter space instead, or no tuning at all
and mlos_bench
can be used as only a repeatable benchmarking tool.
Goals
The overall goal of the MLOS project is to enable reproducible and trackable benchmarking and efficient autotuning for systems software.
In this, automation of the benchmarking process is a key component that
mlos_bench
seeks to enable.
Interaction
Users are expected to provide JSON mlos_bench.config
s that instruct the
framework how to automate their benchmark, autotuning, or other Experiment.
This may involve several steps such as
deploying a VM
installing some software
loading a dataset
running a benchmark
collecting and storing the results
repeating for statistical and noise measures (we call each iteration a
Trial
)(optionally) repeating with a different configuration (e.g., for autotuning purposes)
analyzing the results
Since many of these phases are common across different benchmarks, the framework is intended to be modular and composable to allow reuse and faster development of new benchmarking environments or autotuning experiments.
Where possible, the framework will provide common configs for reuse (e.g., deploying a VM on Azure, or run benchbase against a database system) to allow users to focus on the specifics of their experiments.
Where none are currently available, one can create them external to MLOS, however users are also encouraged to submit PRs or Issues to add new classes or config and script snippets for others to use as well.
For more details on the configuration files, please see the documentation in the
mlos_bench.config
module.
Classes Overview
The other core classes in this package are:
environments
which provide abstractions for representing an execution environment.These are generally the target of the optimization process and are used to evaluate the performance of a given configuration, though can also be used to simply run a single benchmark. They can be used, for instance, to provision a
VM
, run benchmarks or execute any other arbitrary code on aremote machine
, and many other things.Environments are often associated with
tunables
which provide a language for specifying the set of configuration parameters that can be optimized or searched over with theoptimizers
.services
provide the necessary abstractions to run interact with theenvironments
in different settings.For instance, the
AzureVMService
can be used to run commands on Azure VMs for a remoteVMEnv
.Alternatively, one could swap out that service for
SshHostService
in order to target a different VM without having to change theEnvironment
configuration at all since they both implement the sameSupportsRemoteExec
Services type
interfaces.This is particularly useful when running the same benchmark on different ecosystems and makes the configs more modular and composable.
storage
which provides abstractions for storing and retrieving data from the experiments.For instance, nearly any
SQL
backend that sqlalchemy supports can be used.
The data management and automation portions of experiment data is a key component of MLOS as it provides a unified way to manage experiment data across different Environments, enabling more reusable visualization and analysis by mapping benchmark metrics into common semantic types (e.g., via OpenTelemetry).
Without this most experiments are effectively siloed and require custom, and more critically, non-reusable scripts to setup and later parse results and are hence harder to scale to many users.
With these features as a part of the MLOS ecosystem, benchmarking can become a service that any developer, admin, research, etc. can use and adapt.
See below for more information on the classes in this package.
Notes
Note that while the docstrings in this package are generated from the source code and hence sometimes more focused on the implementation details, most user interactions with the package will be through the json configs. Even so it may be useful to look at the source code to understand how those are interpreted.
Examples
Here is an example that shows how to run a simple benchmark using the command line.
The entry point for these configs can be found here.
>>> from subprocess import run
>>> # Note: we show the command wrapped in python here for testing purposes.
>>> # Alternatively replace test-cli-local-env-bench.jsonc with
>>> # test-cli-local-env-opt.jsonc for one that does an optimization loop.
>>> cmd = "mlos_bench \
... --config mlos_bench/mlos_bench/tests/config/cli/test-cli-local-env-bench.jsonc \
... --globals experiment_test_local.jsonc \
... --tunable_values tunable-values/tunable-values-local.jsonc"
>>> print(f"Here's the shell command you'd actually run:\n# {cmd}")
Here's the shell command you'd actually run:
# mlos_bench --config mlos_bench/mlos_bench/tests/config/cli/test-cli-local-env-bench.jsonc --globals experiment_test_local.jsonc --tunable_values tunable-values/tunable-values-local.jsonc
>>> # Now we run the command and check the output.
>>> result = run(cmd, shell=True, capture_output=True, text=True, check=True)
>>> assert result.returncode == 0
>>> lines = result.stderr.splitlines()
>>> first_line = lines[0]
>>> last_line = lines[-1]
>>> expected = "INFO Launch: mlos_bench"
>>> assert first_line.endswith(expected)
>>> expected = "INFO Final score: {'score': 123.4, 'total_time': 123.4, 'throughput': 1234567.0}"
>>> assert last_line.endswith(expected)
Notes
mlos_bench/README.md for additional documentation and examples in the source tree.
mlos_bench/DEVNOTES.md for additional developer notes in the source tree.
There is also a working example of using
mlos_bench
in a separate config repo (the more expected case for most users) in the sqlite-autotuning repo.