mlos_viz.base

Base functions for visualizing, explain, and gain insights from results.

Functions

augment_results_df_with_config_trial_group_stats(...)

Add a number of useful statistical measure columns to the results dataframe.

ignore_plotter_warnings(→ None)

Suppress some annoying warnings from third-party data visualization packages by

limit_top_n_configs(→ Tuple[pandas.DataFrame, ...)

Utility function to process the results and determine the best performing configs

plot_optimizer_trends(→ None)

Plots the optimizer trends for the Experiment.

plot_top_n_configs(→ None)

Plots the top-N configs along with the default config for the given

Module Contents

mlos_viz.base.augment_results_df_with_config_trial_group_stats(exp_data: mlos_bench.storage.base_experiment_data.ExperimentData | None = None, *, results_df: pandas.DataFrame | None = None, requested_result_cols: Iterable[str] | None = None) pandas.DataFrame[source]

Add a number of useful statistical measure columns to the results dataframe.

In particular, for each numeric result, we add the following columns for each requested result column:

  • “.p50”: the median of each config trial group results

  • “.p75”: the p75 of each config trial group results

  • “.p90”: the p90 of each config trial group results

  • “.p95”: the p95 of each config trial group results

  • “.p99”: the p95 of each config trial group results

  • “.mean”: the mean of each config trial group results

  • “.stddev”: the mean of each config trial group results

  • “.var”: the variance of each config trial group results

  • “.var_zscore”: the zscore of this group (i.e., variance relative to the stddev of all group variances). This can be useful for filtering out outliers (e.g., configs with high variance relative to others by restricting to abs < 2 to remove those two standard deviations from the mean variance across all config trial groups).

Additionally, we add a “tunable_config_trial_group_size” column that indicates the number of trials using a particular config.

Parameters:
  • exp_data (ExperimentData) – The ExperimentData (e.g., obtained from the storage layer) to plot.

  • results_df (Optional[pandas.DataFrame]) – The results dataframe to augment, by default None to use the results_df property.

  • requested_result_cols (Optional[Iterable[str]]) – Which results columns to augment, by default None to use all results columns that look numeric.

Returns:

The augmented results dataframe.

Return type:

pandas.DataFrame

mlos_viz.base.ignore_plotter_warnings() None[source]

Suppress some annoying warnings from third-party data visualization packages by adding them to the warnings filter.

Return type:

None

mlos_viz.base.limit_top_n_configs(exp_data: mlos_bench.storage.base_experiment_data.ExperimentData | None = None, *, results_df: pandas.DataFrame | None = None, objectives: Dict[str, Literal['min', 'max']] | None = None, top_n_configs: int = 10, method: Literal['mean', 'p50', 'p75', 'p90', 'p95', 'p99'] = 'mean') Tuple[pandas.DataFrame, List[int], Dict[str, bool]][source]

Utility function to process the results and determine the best performing configs including potential repeats to help assess variability.

Parameters:
  • exp_data (Optional[ExperimentData]) – The ExperimentData (e.g., obtained from the storage layer) to operate on.

  • results_df (Optional[pandas.DataFrame]) – The results dataframe to augment, by default None to use ExperimentData.results_df property.

  • objectives (Iterable[str]) – Which result column(s) to use for sorting the configs, and in which direction (“min” or “max”). By default None to automatically select the ExperimentData.objectives.

  • top_n_configs (int) – How many configs to return, including the default, by default 10.

  • method (Literal["mean", "median", "p50", "p75", "p90", "p95", "p99"] = "mean",) – Which statistical method to use when sorting the config groups before determining the cutoff, by default “mean”.

Returns:

  • (top_n_config_results_df, top_n_config_ids, orderby_cols)

  • Tuple[pandas.DataFrame, List[int], Dict[str, bool]] – The filtered results dataframe, the config ids, and the columns used to order the configs.

Return type:

Tuple[pandas.DataFrame, List[int], Dict[str, bool]]

Plots the optimizer trends for the Experiment.

Parameters:
Return type:

None

mlos_viz.base.plot_top_n_configs(exp_data: mlos_bench.storage.base_experiment_data.ExperimentData | None = None, *, results_df: pandas.DataFrame | None = None, objectives: Dict[str, Literal['min', 'max']] | None = None, with_scatter_plot: bool = False, **kwargs: Any) None[source]

Plots the top-N configs along with the default config for the given ExperimentData.

Intended to be used from a Jupyter notebook.

Parameters:
Return type:

None