mlos_bench.storage
==================

.. py:module:: mlos_bench.storage

.. autoapi-nested-parse::

   Interfaces to the storage backends for mlos_bench.

   Storage backends (for instance :py:mod:`~mlos_bench.storage.sql`) are used to store
   and retrieve the results of experiments and implement a persistent queue for
   :py:mod:`~mlos_bench.schedulers`.

   The :py:class:`~mlos_bench.storage.base_storage.Storage` class is the main interface
   and provides the ability to

   - Create or reload a new :py:class:`~.Storage.Experiment` with one or more
     associated :py:class:`~.Storage.Trial` instances which are used by the
     :py:mod:`~mlos_bench.schedulers` during ``mlos_bench`` run time to execute
     `Trials`.

     In MLOS terms, an *Experiment* is a group of *Trials* that share the same scripts
     and target system.

     A *Trial* is a single run of the target system with a specific *Configuration*
     (e.g., set of tunable parameter values).
     (Note: other systems may call this a *sample*)

   - Retrieve the :py:class:`~mlos_bench.storage.base_trial_data.TrialData` results
     with the :py:attr:`~mlos_bench.storage.base_experiment_data.ExperimentData.trials`
     property on a :py:class:`~mlos_bench.storage.base_experiment_data.ExperimentData`
     instance via the :py:class:`~.Storage` instance's
     :py:attr:`~mlos_bench.storage.base_storage.Storage.experiments` property.

     These can be especially useful with :py:mod:`mlos_viz` for interactive exploration
     in a Jupyter Notebook interface, for instance.

   The :py:func:`.from_config` :py:mod:`.storage_factory` function can be used to get a
   :py:class:`.Storage` instance from a
   :py:attr:`~mlos_bench.config.schemas.config_schemas.ConfigSchema.STORAGE` type json
   config.

   .. rubric:: Example

   Here's a very basic example of the Storage APIs.

   >>> # Create a new storage object from a JSON config.
   >>> # Normally, we'd load these from a file, but for this example we'll use a string.
   >>> global_config = '''
   ... {
   ...     // Additional global configuration parameters can be added here.
   ...     /* For instance:
   ...     "storage_host": "some-remote-host",
   ...     "storage_user": "mlos_bench",
   ...     "storage_pass": "SuperSecretPassword",
   ...     */
   ... }
   ... '''
   >>> storage_config = '''
   ... {
   ...     "class": "mlos_bench.storage.sql.storage.SqlStorage",
   ...     "config": {
   ...         // Don't create the schema until we actually need it.
   ...         // (helps speed up initial launch and tests)
   ...         "lazy_schema_create": true,
   ...         // Parameters below must match kwargs of `sqlalchemy.URL.create()`:
   ...         // Normally, we'd specify a real database, but for testing examples
   ...         // we'll use an in-memory one.
   ...         "drivername": "sqlite",
   ...         "database": ":memory:"
   ...         // Otherwise we might use something like the following
   ...         // to pull the values from the globals:
   ...         /*
   ...         "host": "$storage_host",
   ...         "username": "$storage_user",
   ...         "password": "$storage_pass",
   ...         */
   ...     }
   ... }
   ... '''
   >>> from mlos_bench.storage import from_config
   >>> storage = from_config(storage_config, global_configs=[global_config])
   >>> storage
   sqlite::memory:
   >>> #
   >>> # Internally, mlos_bench will use this config and storage backend to track
   >>> # Experiments and Trials it creates.
   >>> # Most users won't need to do that, but it works something like the following:
   >>> # Create a new experiment with a single trial.
   >>> # (Normally, we'd use a real environment config, but for this example we'll use a string.)
   >>> #
   >>> # Create a dummy tunable group.
   >>> from mlos_bench.services.config_persistence import ConfigPersistenceService
   >>> config_persistence_service = ConfigPersistenceService()
   >>> tunables_config = '''
   ... {
   ...   "param_group": {
   ...     "cost": 1,
   ...     "params": {
   ...       "param1": {
   ...         "type": "int",
   ...         "range": [0, 100],
   ...         "default": 50
   ...       }
   ...     }
   ...   }
   ... }
   ... '''
   >>> tunables = config_persistence_service.load_tunables([tunables_config])
   >>> from mlos_bench.environments.status import Status
   >>> from datetime import datetime
   >>> with storage.experiment(
   ...   experiment_id="my_experiment_id",
   ...   trial_id=1,
   ...   root_env_config="root_env_config_info",
   ...   description="some description",
   ...   tunables=tunables,
   ...   opt_targets={"objective_metric": "min"},
   ... ) as experiment:
   ...     # Create a dummy trial.
   ...     trial = experiment.new_trial(tunables=tunables)
   ...     # Pretend something ran with that trial and we have the results now.
   ...     # NOTE: Normally this would run through a TrialRunner via a Scheduler.
   ...     _ = trial.update(Status.SUCCEEDED, datetime.now(), {"objective_metric": 42})
   >>> #
   >>> # Now, once there's data to look at, in a Jupyter notebook or similar,
   >>> # we can also use the storage object to view the results.
   >>> #
   >>> storage.experiments
   {'my_experiment_id': Experiment :: my_experiment_id: 'some description'}
   >>> # Access ExperimentData by experiment id.
   >>> experiment_data = storage.experiments["my_experiment_id"]
   >>> experiment_data.trials
   {1: Trial :: my_experiment_id:1 cid:1 rid:None SUCCEEDED}
   >>> # Access TrialData for an Experiment by trial id.
   >>> trial_data = experiment_data.trials[1]
   >>> assert trial_data.status == Status.SUCCEEDED
   >>> # Retrieve the tunable configuration from the TrialData as a dictionary.
   >>> trial_config_data = trial_data.tunable_config
   >>> trial_config_data.config_dict
   {'param1': 50}
   >>> # Retrieve the results from the TrialData as a dictionary.
   >>> trial_data.results_dict
   {'objective_metric': 42}
   >>> # Retrieve the results of all Trials in the Experiment as a DataFrame.
   >>> experiment_data.results_df.columns.tolist()
   ['trial_id', 'ts_start', 'ts_end', 'tunable_config_id', 'tunable_config_trial_group_id', 'status', 'trial_runner_id', 'config.param1', 'result.objective_metric']
   >>> # Drop the timestamp columns to make it a repeatable test.
   >>> experiment_data.results_df.drop(columns=["ts_start", "ts_end"])
      trial_id  tunable_config_id  tunable_config_trial_group_id     status trial_runner_id  config.param1  result.objective_metric
   0         1                  1                              1  SUCCEEDED            None             50                       42

   [1 rows x 7 columns]

   .. seealso::

      :py:obj:`mlos_bench.storage.base_storage`
          Base interface for backends.

      :py:obj:`mlos_bench.storage.base_experiment_data`
          Base interface for ExperimentData.

      :py:obj:`mlos_bench.storage.base_trial_data`
          Base interface for TrialData.

   .. rubric:: Notes

   - See `sqlite-autotuning notebooks
     <https://github.com/Microsoft-CISL/sqlite-autotuning/blob/main/mlos_demo_sqlite_teachers.ipynb>`_
     for additional examples.



Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/mlos_bench/storage/base_experiment_data/index
   /autoapi/mlos_bench/storage/base_storage/index
   /autoapi/mlos_bench/storage/base_trial_data/index
   /autoapi/mlos_bench/storage/base_tunable_config_data/index
   /autoapi/mlos_bench/storage/base_tunable_config_trial_group_data/index
   /autoapi/mlos_bench/storage/sql/index
   /autoapi/mlos_bench/storage/storage_factory/index
   /autoapi/mlos_bench/storage/util/index