mlos_bench.storage

Interfaces to the storage backends for mlos_bench.

Storage backends (for instance sql) are used to store and retrieve the results of experiments and implement a persistent queue for schedulers.

The Storage class is the main interface and provides the ability to

Create or reload a new Experiment with one or more associated Trial instances which are used by the schedulers during mlos_bench run time to execute Trials.

In MLOS terms, an Experiment is a group of Trials that share the same scripts and target system.

A Trial is a single run of the target system with a specific Configuration (e.g., set of tunable parameter values). (Note: other systems may call this a sample)
Retrieve the TrialData results with the trials property on a ExperimentData instance via the Storage instance’s experiments property.

These can be especially useful with mlos_viz for interactive exploration in a Jupyter Notebook interface, for instance.

The from_config() storage_factory function can be used to get a Storage instance from a STORAGE type json config.

Example

Here’s a very basic example of the Storage APIs.

>>> # Create a new storage object from a JSON config.
>>> # Normally, we'd load these from a file, but for this example we'll use a string.
>>> global_config = '''
... {
...     // Additional global configuration parameters can be added here.
...     /* For instance:
...     "storage_host": "some-remote-host",
...     "storage_user": "mlos_bench",
...     "storage_pass": "SuperSecretPassword",
...     */
... }
... '''
>>> storage_config = '''
... {
...     "class": "mlos_bench.storage.sql.storage.SqlStorage",
...     "config": {
...         // Don't create the schema until we actually need it.
...         // (helps speed up initial launch and tests)
...         "lazy_schema_create": true,
...         // Parameters below must match kwargs of `sqlalchemy.URL.create()`:
...         // Normally, we'd specify a real database, but for testing examples
...         // we'll use an in-memory one.
...         "drivername": "sqlite",
...         "database": ":memory:"
...         // Otherwise we might use something like the following
...         // to pull the values from the globals:
...         /*
...         "host": "$storage_host",
...         "username": "$storage_user",
...         "password": "$storage_pass",
...         */
...     }
... }
... '''
>>> from mlos_bench.storage import from_config
>>> storage = from_config(storage_config, global_configs=[global_config])
>>> storage
sqlite::memory:
>>> #
>>> # Internally, mlos_bench will use this config and storage backend to track
>>> # Experiments and Trials it creates.
>>> # Most users won't need to do that, but it works something like the following:
>>> # Create a new experiment with a single trial.
>>> # (Normally, we'd use a real environment config, but for this example we'll use a string.)
>>> #
>>> # Create a dummy tunable group.
>>> from mlos_bench.services.config_persistence import ConfigPersistenceService
>>> config_persistence_service = ConfigPersistenceService()
>>> tunables_config = '''
... {
...   "param_group": {
...     "cost": 1,
...     "params": {
...       "param1": {
...         "type": "int",
...         "range": [0, 100],
...         "default": 50
...       }
...     }
...   }
... }
... '''
>>> tunables = config_persistence_service.load_tunables([tunables_config])
>>> from mlos_bench.environments.status import Status
>>> from datetime import datetime
>>> with storage.experiment(
...   experiment_id="my_experiment_id",
...   trial_id=1,
...   root_env_config="root_env_config_info",
...   description="some description",
...   tunables=tunables,
...   opt_targets={"objective_metric": "min"},
... ) as experiment:
...     # Create a dummy trial.
...     trial = experiment.new_trial(tunables=tunables)
...     # Pretend something ran with that trial and we have the results now.
...     # NOTE: Normally this would run through a TrialRunner via a Scheduler.
...     _ = trial.update(Status.SUCCEEDED, datetime.now(), {"objective_metric": 42})
>>> #
>>> # Now, once there's data to look at, in a Jupyter notebook or similar,
>>> # we can also use the storage object to view the results.
>>> #
>>> storage.experiments
{'my_experiment_id': Experiment :: my_experiment_id: 'some description'}
>>> # Access ExperimentData by experiment id.
>>> experiment_data = storage.experiments["my_experiment_id"]
>>> experiment_data.trials
{1: Trial :: my_experiment_id:1 cid:1 rid:None SUCCEEDED}
>>> # Access TrialData for an Experiment by trial id.
>>> trial_data = experiment_data.trials[1]
>>> assert trial_data.status == Status.SUCCEEDED
>>> # Retrieve the tunable configuration from the TrialData as a dictionary.
>>> trial_config_data = trial_data.tunable_config
>>> trial_config_data.config_dict
{'param1': 50}
>>> # Retrieve the results from the TrialData as a dictionary.
>>> trial_data.results_dict
{'objective_metric': 42}
>>> # Retrieve the results of all Trials in the Experiment as a DataFrame.
>>> experiment_data.results_df.columns.tolist()
['trial_id', 'ts_start', 'ts_end', 'tunable_config_id', 'tunable_config_trial_group_id', 'status', 'trial_runner_id', 'config.param1', 'result.objective_metric']
>>> # Drop the timestamp columns to make it a repeatable test.
>>> experiment_data.results_df.drop(columns=["ts_start", "ts_end"])
   trial_id  tunable_config_id  tunable_config_trial_group_id     status trial_runner_id  config.param1  result.objective_metric
0         1                  1                              1  SUCCEEDED            None             50                       42

[1 rows x 7 columns]

mlos_bench.storage

Submodules