mlos_viz.base
Base functions for visualizing, explain, and gain insights from results.
Functions
Add a number of useful statistical measure columns to the results dataframe. |
|
|
Suppress some annoying warnings from third-party data visualization packages by |
|
Utility function to process the results and determine the best performing configs |
|
Plots the optimizer trends for the Experiment. |
|
Plots the top-N configs along with the default config for the given |
Module Contents
- mlos_viz.base.augment_results_df_with_config_trial_group_stats(exp_data: mlos_bench.storage.base_experiment_data.ExperimentData | None = None, *, results_df: pandas.DataFrame | None = None, requested_result_cols: Iterable[str] | None = None) pandas.DataFrame [source]
Add a number of useful statistical measure columns to the results dataframe.
In particular, for each numeric result, we add the following columns for each requested result column:
“.p50”: the median of each config trial group results
“.p75”: the p75 of each config trial group results
“.p90”: the p90 of each config trial group results
“.p95”: the p95 of each config trial group results
“.p99”: the p95 of each config trial group results
“.mean”: the mean of each config trial group results
“.stddev”: the mean of each config trial group results
“.var”: the variance of each config trial group results
“.var_zscore”: the zscore of this group (i.e., variance relative to the stddev of all group variances). This can be useful for filtering out outliers (e.g., configs with high variance relative to others by restricting to abs < 2 to remove those two standard deviations from the mean variance across all config trial groups).
Additionally, we add a “tunable_config_trial_group_size” column that indicates the number of trials using a particular config.
- Parameters:
exp_data (ExperimentData) – The ExperimentData (e.g., obtained from the storage layer) to plot.
results_df (Optional[pandas.DataFrame]) – The results dataframe to augment, by default None to use the results_df property.
requested_result_cols (Optional[Iterable[str]]) – Which results columns to augment, by default None to use all results columns that look numeric.
- Returns:
The augmented results dataframe.
- Return type:
- mlos_viz.base.ignore_plotter_warnings() None [source]
Suppress some annoying warnings from third-party data visualization packages by adding them to the warnings filter.
- Return type:
None
- mlos_viz.base.limit_top_n_configs(exp_data: mlos_bench.storage.base_experiment_data.ExperimentData | None = None, *, results_df: pandas.DataFrame | None = None, objectives: Dict[str, Literal['min', 'max']] | None = None, top_n_configs: int = 10, method: Literal['mean', 'p50', 'p75', 'p90', 'p95', 'p99'] = 'mean') Tuple[pandas.DataFrame, List[int], Dict[str, bool]] [source]
Utility function to process the results and determine the best performing configs including potential repeats to help assess variability.
- Parameters:
exp_data (Optional[ExperimentData]) – The ExperimentData (e.g., obtained from the storage layer) to operate on.
results_df (Optional[pandas.DataFrame]) – The results dataframe to augment, by default None to use
ExperimentData.results_df
property.objectives (Iterable[str]) – Which result column(s) to use for sorting the configs, and in which direction (“min” or “max”). By default None to automatically select the
ExperimentData.objectives
.top_n_configs (int) – How many configs to return, including the default, by default 10.
method (Literal["mean", "median", "p50", "p75", "p90", "p95", "p99"] = "mean",) – Which statistical method to use when sorting the config groups before determining the cutoff, by default “mean”.
- Returns:
(top_n_config_results_df, top_n_config_ids, orderby_cols)
Tuple[pandas.DataFrame, List[int], Dict[str, bool]] – The filtered results dataframe, the config ids, and the columns used to order the configs.
- Return type:
Tuple[pandas.DataFrame, List[int], Dict[str, bool]]
- mlos_viz.base.plot_optimizer_trends(exp_data: mlos_bench.storage.base_experiment_data.ExperimentData | None = None, *, results_df: pandas.DataFrame | None = None, objectives: Dict[str, Literal['min', 'max']] | None = None) None [source]
Plots the optimizer trends for the Experiment.
- Parameters:
exp_data (ExperimentData) – The ExperimentData (e.g., obtained from the storage layer) to plot.
results_df (Optional[pandas.DataFrame]) – Optional results_df to plot. If not provided, defaults to
ExperimentData.results_df
property.objectives (Optional[Dict[str, Literal["min", "max"]]]) – Optional objectives to plot. If not provided, defaults to
ExperimentData.objectives
property.
- Return type:
None
- mlos_viz.base.plot_top_n_configs(exp_data: mlos_bench.storage.base_experiment_data.ExperimentData | None = None, *, results_df: pandas.DataFrame | None = None, objectives: Dict[str, Literal['min', 'max']] | None = None, with_scatter_plot: bool = False, **kwargs: Any) None [source]
Plots the top-N configs along with the default config for the given
ExperimentData
.Intended to be used from a Jupyter notebook.
- Parameters:
exp_data (ExperimentData) – The experiment data to plot.
results_df (Optional[pandas.DataFrame]) – Optional results_df to plot. If not provided, defaults to
ExperimentData.results_df
property.objectives (Optional[Dict[str, Literal["min", "max"]]]) – Optional objectives to plot. If not provided, defaults to
ExperimentData.objectives
property.with_scatter_plot (bool) – Whether to also add scatter plot to the output figure.
kwargs (dict) – Remaining keyword arguments are passed along to the
limit_top_n_configs()
function.
- Return type:
None