Plotting API Reference¶
Core plotting functions for SciStanPy visualization and analysis.
This module implements the primary plotting functionality for SciStanPy, providing specialized visualization tools for Bayesian analysis, model diagnostics, and statistical relationships. As with all submodules in the plotting subpackage, the functions here are intended for internal use to support higher-level plotting operations and are not typically called directly by end users.
The module leverages HoloViews and hvplot for flexible, interactive visualizations that can be easily customized and extended. All plotting functions support both standard NumPy arrays and interactive widgets for dynamic exploration of model results.
- Key Features:
ECDF and KDE plots for distribution visualization
Quantile plots with confidence intervals
Model calibration diagnostics
Hexagonal binning for large datasets
Interactive plotting with widget support
Customizable styling and overlays
Functions are organized by visualization type and complexity, from simple distribution plots to sophisticated multi-panel diagnostic displays.
DataFrame Construction¶
The below functions are used to construct DataFrames used by plotting functions in this module.
- scistanpy.plotting.plotting.aggregate_data(
- data: ndarray[tuple[int, ...], dtype[_ScalarType_co]],
- independent_dim: int | None = None,
Aggregate multi-dimensional data for plotting purposes.
This function reshapes multi-dimensional arrays according to specified aggregation rules, preparing data for visualization functions that expect specific array structures.
- Parameters:
data (npt.NDArray) – Input data array to aggregate
independent_dim (Optional[int]) – Dimension to preserve during aggregation. If None, flattens entire array. (Default: None)
- Returns:
Aggregated data array
- Return type:
npt.NDArray
- Aggregation Rules:
If independent_dim is None: Returns flattened 1D array
If independent_dim is specified: Returns 2D array with shape (-1, n_independent) where -1 represents the product of all other dimensions
- Example:
>>> data = np.random.randn(10, 5, 3) >>> # Flatten completely >>> flat = aggregate_data(data) # Shape: (150,) >>> # Preserve last dimension >>> agg = aggregate_data(data, independent_dim=2) # Shape: (50, 3)
- scistanpy.plotting.plotting.build_plotting_df(
- samples: npt.NDArray,
- paramname: str = 'param',
- independent_dim: 'custom_types.Integer' | None = None,
- independent_labels: npt.NDArray | None = None,
Construct DataFrame optimized for plotting functions.
This function transforms raw sample arrays into structured DataFrames with appropriate columns and formatting for visualization functions. It handles various data structures and automatically generates necessary metadata for plotting.
- Parameters:
samples (npt.NDArray) – Raw sample data to structure for plotting
paramname (str) – Column name to assign for the parameter values (Default: “param”)
independent_dim (Optional[custom_types.Integer]) – Dimension representing independent variable (Default: None)
independent_labels (Optional[npt.NDArray]) – Labels for independent variable values (Default: None)
- Returns:
Structured DataFrame ready for plotting functions
- Return type:
pd.DataFrame
- The function handles:
Data aggregation according to independent dimension
Automatic label generation when not provided
ECDF calculation for cumulative plots
Trace separation with NaN boundaries for line plots
Proper sorting for visualization functions
- Example:
# Samples from a model with 100 traces, 50 time points, and 10 parameters samples = np.random.randn(100, 50, 10) # 100 traces, 50 time points, 10 params # Build DataFrame for plotting parameter 'measurement' with time as # independent variable df = build_plotting_df(samples, 'measurement', independent_dim=1) # df now contains columns for 'measurement' and 'Independent Label' # separated by rows of NaN for trace boundaries, ready for plotting.
Distribution Visualization¶
The below functions are used to visualize distributions, particularly for samples from parameters in Bayesian models.
- scistanpy.plotting.plotting.plot_ecdf_kde(
- plotting_df: DataFrame,
- /,
- paramname: str,
- scistanpy.plotting.plotting.plot_ecdf_kde(
- plotting_df: Interactive,
- /,
- paramname: Select,
Create empirical CDF and kernel density estimate plots.
This function generates complementary ECDF and KDE visualizations for univariate data, providing both cumulative and density perspectives on the data distribution.
- Parameters:
plotting_df (Union[pd.DataFrame, hvplot.interactive.Interactive]) – DataFrame containing the data to plot
paramname (Union[str, pnw.Select]) – Name of the parameter/column to visualize
- Returns:
List containing KDE and ECDF plots, or interactive plot
- Return type:
Union[list[HVType], hvplot.interactive.Interactive]
- The function creates:
KDE plot: Smooth density estimate with automatic bandwidth
ECDF plot: Step function showing cumulative probabilities
- Example:
>>> df = pd.DataFrame({'param': np.random.normal(0, 1, 1000)}) >>> plots = plot_ecdf_kde(df, 'param') >>> # plots[0] is KDE, plots[1] is ECDF
- scistanpy.plotting.plotting.plot_ecdf_violin(
- plotting_df: DataFrame,
- /,
- paramname: str,
- scistanpy.plotting.plotting.plot_ecdf_violin(
- plotting_df: Interactive,
- /,
- paramname: Select,
Create ECDF and violin plots for multi-group data comparison.
This function visualizes distributions across multiple groups or categories, combining empirical CDFs with violin plots.
- Parameters:
plotting_df (Union[pd.DataFrame, hvplot.interactive.Interactive]) – DataFrame with grouped data including ‘Independent Label’ and ‘Cumulative Probability’ columns.
paramname (Union[str, pnw.Select]) – Name of the parameter/column to visualize
- Returns:
Combined ECDF and violin plot overlay
- Return type:
Union[list[HVType], hvplot.interactive.Interactive]
- The visualization includes:
Multi-line ECDF plot: One curve per group with color coding
Violin plot: Density distributions by group with colorbar
Groups are automatically colored using the Inferno colormap.
- Example:
>>> # DataFrame with 'param' values and 'Independent Label' grouping >>> plots = plot_ecdf_violin(grouped_df, 'param')
- scistanpy.plotting.plotting.plot_relationship(
- plotting_df: DataFrame,
- /,
- paramname: str,
- datashade: bool,
- scistanpy.plotting.plotting.plot_relationship(
- plotting_df: Interactive,
- /,
- paramname: Select,
- datashade: bool,
Visualize relationships between parameters and independent variables.
This function creates line plots showing how parameters vary with respect to independent variables, with optional datashading for large datasets to improve performance and readability.
- Parameters:
plotting_df (Union[pd.DataFrame, hvplot.interactive.Interactive]) – DataFrame with ‘Independent Label’ and parameter columns. Different groups are separated by NaN rows.
paramname (Union[str, pnw.Select]) – Name of the dependent parameter to plot
datashade (bool) – Whether to use datashading for large datasets (Default: True)
- Returns:
Line plot showing parameter relationships
- Return type:
Union[HVType, hvplot.interactive.Interactive]
- Datashading options:
True: Uses count aggregation with Inferno colormap (large data)
False: Uses dynamic line plotting with lime color (small data)
- Example:
>>> # Plot parameter evolution over time/conditions >>> plot = plot_relationship(time_series_df, 'param', datashade=True)
- scistanpy.plotting.plotting.choose_plotting_function(
- independent_dim: 'custom_types.Integer' | None,
- independent_labels: npt.NDArray | None,
- datashade: bool = True,
A utility function that selects an appropriate plotting function (
plot_ecdf_kde()
,plot_ecdf_violin()
, orplot_relationship()
) based on data characteristics.- Parameters:
independent_dim (Optional[custom_types.Integer]) – Dimension index for independent variable, if any
independent_labels (Optional[npt.NDArray]) – Labels for independent variable values
datashade (bool) – Whether to enable datashading for large datasets (Default: True)
- Returns:
Appropriate plotting function for the data structure
- Return type:
Callable
- Selection Logic:
No independent_dim: Returns
plot_ecdf_kde
(univariate analysis)Independent_dim but no labels: Returns
plot_ecdf_violin
(multi-group)Both independent_dim and labels: Returns
plot_relationship
(dependency)
- Example:
>>> plotter = choose_plotting_function(None, None) # ECDF/KDE >>> plotter = choose_plotting_function(1, None) # ECDF/Violin >>> plotter = choose_plotting_function(1, time_labels) # Relationship
- scistanpy.plotting.plotting.plot_distribution(
- samples: npt.NDArray | torch.Tensor,
- overlay: npt.NDArray | None = None,
- paramname: str = 'param',
- independent_dim: 'custom_types.Integer' | None = None,
- independent_labels: npt.NDArray | None = None,
The main entrypoint for creating distribution plots.
This function automatically selects appropriate visualization types based on data structure and allows for optional ground truth or reference overlays.
- Parameters:
samples (Union[npt.NDArray, torch.Tensor]) – Sample data from model simulations or posterior draws
overlay (Optional[npt.NDArray]) – Optional reference data to overlay on the plot (Default: None)
paramname (str) – Name to assign for the parameter being plotted (Default: “param”)
independent_dim (Optional[custom_types.Integer]) – Dimension index for independent variable (Default: None)
independent_labels (Optional[npt.NDArray]) – Labels for independent variable values (Default: None)
- Returns:
Plot or list of plots showing data distribution
- Return type:
Union[HVType, list[HVType]]
- Raises:
ValueError – If overlay dimensions don’t match sample dimensions
- Example:
>>> # Simple distribution plot >>> plot = plot_distribution(posterior_samples, paramname='mu') >>> # With ground truth overlay >>> plot = plot_distribution(samples, overlay=true_values, paramname='sigma') >>> # Distribution plot with independent variable (e.g., time series) >>> plot = plot_distribution( >>> samples, >>> paramname='beta', >>> independent_dim=1, >>> independent_labels=time_points >>> )
Model Fit Analysis¶
The below functions are used to analyze model fit and calibration. They are used under the hood by SciStanPy’s results objects.
- scistanpy.plotting.plotting.calculate_relative_quantiles(
- reference: ndarray[tuple[int, ...], dtype[_ScalarType_co]],
- observed: ndarray[tuple[int, ...], dtype[_ScalarType_co]],
Calculate quantiles of observed values relative to reference distribution.
For each observed value, this function computes the quantile it would occupy within the corresponding reference distribution. This is essential for calibration analysis and model validation.
- Parameters:
reference (npt.NDArray) – Reference observations with shape (n_samples, feat1, …, featN). First dimension is samples, remaining are feature dimensions.
observed (npt.NDArray) – Observed values with shape (n_obs, feat1, …, featN). Feature dimensions must match reference.
- Returns:
Quantiles of observed values relative to reference. Has the same shape as
observed
, with values between 0 and 1 indicating quantile positions of each observed value within the reference distribution.- Return type:
npt.NDArray
- Raises:
ValueError – If arrays have incompatible dimensions
The calculation determines, for each observed value, what fraction of reference values in the corresponding position are less than or equal to the observed value. This produces values between 0 and 1.
Mathematical Definition: For a single observed value \(x\) and reference distribution \(R \in \mathbb{R}^N\),:
\[\begin{split}\textrm{quantile} = P(R_i <= x) = \frac{1}{N} \sum^N_{i=1} \begin{cases} 1 & \text{if } R_i \leq x \\ 0 & \text{if } R_i \gt x \\ \end{cases}\end{split}\]- Example:
>>> ref = np.random.normal(0, 1, (1000, 10)) # 1000 samples, 10 features >>> obs = np.random.normal(0.5, 1, (5, 10)) # 5 observations, 10 features >>> quantiles = calculate_relative_quantiles(ref, obs) >>> # quantiles.shape == (5, 10), values between 0 and 1
- scistanpy.plotting.plotting.plot_calibration(
- reference: ndarray[tuple[int, ...], dtype[_ScalarType_co]],
- observed: ndarray[tuple[int, ...], dtype[_ScalarType_co]],
- **kwargs,
Generate calibration plots for model validation.
This function creates empirical cumulative distribution plots of relative quantiles to assess model calibration. Well-calibrated models should produce observed values that are uniformly distributed across quantiles of the reference distribution. See
calculate_relative_quantiles()
for quantile calculation details.- Parameters:
reference (npt.NDArray) – Reference observations for calibration assessment
observed (npt.NDArray) – Observed values to assess against reference
kwargs – Additional styling options passed to hvplot.Curve
- Returns:
Tuple of (calibration plot overlay, deviance statistics)
- Return type:
tuple[hv.Overlay, npt.NDArray[np.floating]]
- The calibration plot shows:
- ECDF curves for each observation. Note that the curve represents observations
for the full set of parameters, not individual parameters.
Ideal calibration line (diagonal from (0,0) to (1,1))
Area of the deviation from ideal, which is the absolute difference in area between the observed ECDF and the ideal uniform ECDF using the trapezoidal rule for numerical integration. The lower the deviance, the better the calibration.
- Interpretation:
Points near diagonal: Well-calibrated
Narrow (overrepresentation of mid quartiles) but symmetric ECDF curve: Underdispersed model (model is too confident).
Wide (overrepresentation of extreme quartiles) but symmetric ECDF curve: Overdispersed model (model is not confident enough).
Asymmetric ECDF curve: Systematic bias in model predictions.
Note
If you have highly constrained variables, this plot may be misleading at the extremes. For example, if a variable is constrained to be \(\ge0\) and the reference distribution has all values at zero, then any observed value will be in the 100th percentile, even if that observation is also zero. This will present as a strong overrepresentation of extreme quantiles, but is in fact a perfectly calibrated outcome.
- Example:
>>> ref_data = posterior_predictive_samples # Shape: (1000, 100) >>> obs_data = actual_observations # Shape: (10, 100) >>> plot, deviances = plot_calibration(ref_data, obs_data) >>> print(f"Mean deviance: {deviances.mean():.3f}")
- scistanpy.plotting.plotting.quantile_plot(
- x: ndarray[tuple[int, ...], dtype[_ScalarType_co]],
- reference: ndarray[tuple[int, ...], dtype[_ScalarType_co]],
- quantiles: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes],
- *,
- observed: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None,
- labels: dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]] | None,
- include_median: bool,
- overwrite_input: bool,
- return_quantiles: Literal[False],
- observed_type: Literal['line', 'scatter'],
- area_kwargs: dict[str, Any] | None,
- median_kwargs: dict[str, Any] | None,
- observed_kwargs: dict[str, Any] | None,
- allow_nan: bool,
- scistanpy.plotting.plotting.quantile_plot(
- x: ndarray[tuple[int, ...], dtype[_ScalarType_co]],
- reference: ndarray[tuple[int, ...], dtype[_ScalarType_co]],
- quantiles: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes],
- *,
- observed: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None,
- labels: dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]] | None,
- include_median: bool,
- overwrite_input: bool,
- return_quantiles: Literal[True],
- observed_type: Literal['line', 'scatter'],
- area_kwargs: dict[str, Any] | None,
- median_kwargs: dict[str, Any] | None,
- observed_kwargs: dict[str, Any] | None,
- allow_nan: bool,
Create quantile plots with confidence intervals and optional overlays.
This function generates area plots showing quantile ranges of reference data along with optional median lines and observed data overlays. It’s particularly useful for visualizing uncertainty bands around model predictions.
- Parameters:
x (npt.NDArray) – X-axis values (independent variable)
reference (npt.NDArray) – Reference data with shape (n_samples, n_points)
quantiles (npt.ArrayLike) – Quantile values to calculate and plot (0 < q < 1)
observed (Optional[npt.ArrayLike]) – Optional observed data to overlay. Must be 1D or 2D with last dimension matching that of the reference data (Default: None).
labels (Optional[dict[str, npt.ArrayLike]]) – Optional labels for hover tooltips (Default: None).
include_median (bool) – Whether to include median line (Default: True)
overwrite_input (bool) – Whether to overwrite reference array during calculations. This can help save memory by avoiding the creation of intermediate copies. (Default: False)
return_quantiles (bool) – Whether to return calculated quantiles along with plot. (Default: False)
observed_type (Literal["line", "scatter"]) – Type of overlay plot (‘line’ or ‘scatter’) (Default: ‘line’)
area_kwargs (Optional[dict[str, Any]]) – Styling options for quantile areas. See hv.opts.Area.
median_kwargs (Optional[dict[str, Any]]) – Styling options for median line. See hv.opts.Line.
observed_kwargs (Optional[dict[str, Any]]) – Styling options for observed overlay. See hv.opts.Curve or hv.opts.Scatter depending on choice of observed_type.
allow_nan (bool) – If True, uses np.nanquantile for quantile calculation. Otherwise, uses np.quantile (Default: False).
- Returns:
Quantile plot overlay, optionally with calculated quantiles
- Return type:
Union[hv.Overlay, tuple[hv.Overlay, npt.NDArray[np.floating]]]
- Raises:
ValueError – If quantiles are not between 0 and 1, or if array dimensions are invalid
- Features:
Automatic quantile symmetrization (adds complement quantiles)
Nested confidence intervals with graduated transparency
Customizable styling for all plot components
Optional hover labels for interactive exploration
- Example:
>>> x = np.linspace(0, 10, 100) >>> ref = np.random.normal(np.sin(x), 0.1, (1000, 100)) >>> obs = np.sin(x) + 0.05 * np.random.randn(100) >>> plot = quantile_plot(x, ref, [0.025, 0.25], observed=obs)
Utility Functions¶
- scistanpy.plotting.plotting.hexgrid_with_mean(
- x: npt.NDArray[np.floating],
- y: npt.NDArray[np.floating],
- *,
- mean_windowsize: 'custom_types.Integer' | None = None,
- hex_kwargs: dict[str, Any] | None = None,
- mean_kwargs: dict[str, Any] | None = None,
Create hexagonal binning plot with rolling mean overlay.
This function generates a hexagonal heatmap showing data density combined with a rolling mean trend line, useful for visualizing large datasets with underlying trends.
- Parameters:
x (npt.NDArray[np.floating]) – X-axis data values
y (npt.NDArray[np.floating]) – Y-axis data values
mean_windowsize (Optional[custom_types.Integer]) – Window size for rolling mean calculation. Defaults to x.size // 100 if not specified.
hex_kwargs (Optional[dict[str, Any]]) – Styling options for hexagonal tiles. See hv.opts.HexTiles.
mean_kwargs (Optional[dict[str, Any]]) – Styling options for rolling mean line. See hv.opts.Line.
- Returns:
Overlay combining hexagonal heatmap and rolling mean
- Return type:
hv.Overlay
- Raises:
ValueError – If x and y arrays have different shapes or are not 1D
- The hexagonal binning:
Aggregates points into hexagonal cells
Colors cells by point density using viridis colormap
Includes colorbar for density interpretation
- The rolling mean:
Computed over sorted x values to show trend
Window size automatically scaled to data size
Styled for clear visibility over density plot
- Example:
>>> # Large dataset with trend >>> x = np.random.randn(10000) >>> y = 2*x + 0.5*np.random.randn(10000) >>> plot = hexgrid_with_mean(x, y, mean_windowsize=200)
- scistanpy.plotting.plotting.allow_interactive(
- plotting_func: Callable[[P], T],
Decorator to enable both static and interactive plotting capabilities with plotting functions.
This decorator modifies plotting functions to handle both static DataFrames and interactive hvplot objects, automatically configuring the appropriate display options for each case.
- Parameters:
plotting_func (Callable[P, T]) – The plotting function to make interactive
- Returns:
Enhanced function with interactive capabilities
- Return type:
Callable[P, T]
- The decorator handles:
Static DataFrames: Returns plot directly
Interactive objects: Configures framewise options
Plot lists: Combines multiple plots into column layout
- Example:
>>> @allow_interactive ... def my_plot(df, param): ... return df.hvplot.line(y=param) >>> # Works with both static and interactive data >>> plot = my_plot(dataframe, 'column_name')