Hamiltonian Monte Carlo Results API Reference

Hamiltonian Monte Carlo (HMC) sampling results analysis and diagnostics.

This module provides tools for analyzing and diagnosing HMC sampling results from Stan models. It offers specialized classes and functions for processing MCMC output, conducting diagnostic tests, and creating interactive visualizations for model validation and troubleshooting.

The module centers around the SampleResults class, which extends MLEInferenceRes to provide HMC-specific functionality including convergence diagnostics, sample quality assessment, and specialized visualization tools for identifying problematic parameters and sampling behavior.

Key Features:
  • MCMC diagnostic test suites

  • Interactive visualization tools for failed diagnostics

  • Efficient CSV to NetCDF conversion for large datasets

  • Dask-enabled processing for memory-intensive operations

  • Specialized trace plot analysis for problematic variables

  • Automated detection and reporting of sampling issues

Diagnostic Capabilities:
  • R-hat convergence assessment

  • Effective sample size (ESS) evaluation

  • Energy fraction of missing information (E-BFMI) analysis

  • Divergence detection and analysis

  • Tree depth saturation monitoring

  • Variable-specific failure pattern identification

The module is designed to handle both small-scale interactive analysis and large-scale batch processing of MCMC results, with particular attention to memory efficiency and computational performance for complex models.

The main class that users will interact with is SampleResults. Other classes in this module provide supporting functionality for diagnostics, visualization, and data conversion.

Sample Results Analysis

class scistanpy.model.results.hmc.SampleResults(
model: 'Model' | None = None,
fit: str | list[str] | os.PathLike | CmdStanMCMC | None = None,
data: dict[str, npt.NDArray] | None = None,
precision: Literal['double', 'single', 'half'] = 'single',
inference_obj: az.InferenceData | str | None = None,
mib_per_chunk: custom_types.Integer | None = None,
use_dask: bool = False,
)[source]

Bases: MLEInferenceRes

Comprehensive analysis interface for HMC sampling results. This class should never be instantiated directly. Instead, use the from_disk method to load the appropriate results object from disk.

This class extends MLEInferenceRes to provide specialized functionality for analyzing Hamiltonian Monte Carlo sampling results from Stan. It offers comprehensive diagnostic capabilities, interactive visualization tools, and efficient data management for large MCMC datasets.

Parameters:
  • model (Optional[Model]) – SciStanPy model used for sampling. Defaults to None.

  • fit (Optional[Union[str, list[str], os.PathLike, CmdStanMCMC]]) – CmdStanMCMC object or path to CSV files. Defaults to None.

  • data (Optional[dict[str, npt.NDArray]]) – Observed data dictionary. Defaults to None.

  • precision (Literal["double", "single", "half"]) – Numerical precision for arrays. Defaults to “single”.

  • inference_obj (Optional[Union[az.InferenceData, str]]) – Pre-existing InferenceData or NetCDF path. Defaults to None.

  • mib_per_chunk (Optional[custom_types.Integer]) – Memory limit per chunk in MiB. Defaults to None.

  • use_dask (bool) – Whether to use Dask for computation. Defaults to False.

Variables:
  • fit – CmdStanMCMC object containing sampling metadata

  • use_dask – Flag controlling Dask usage for computation

The class provides comprehensive functionality for:
  • MCMC convergence diagnostics and reporting

  • Sample quality assessment and visualization

  • Interactive analysis of problematic variables

  • Efficient handling of large datasets with Dask integration

  • Automated detection and reporting of sampling issues

Key Diagnostic Features:
  • R-hat convergence assessment

  • Effective sample size evaluation

  • Energy-based diagnostics (E-BFMI)

  • Divergence detection and analysis

  • Tree depth saturation monitoring

The class automatically handles NetCDF conversion for efficient storage and supports both in-memory and out-of-core computation depending on dataset size and available memory.

Example:
import scistanpy as ssp

# Get MCMC results
mcmc_results = model.mcmc(data=observed_data, chains=4, iter_sampling=2000)

# Run full diagnostics
diagnostics = mcmc_results.diagnose()

# Posterior predictive check (interactive in notebook)
mcmc_results.run_ppc()

# Evaluate problematic samples (interactive in notebook)
mcmc_results.plot_sample_failure_quantile_traces()

# Evaluate problematic variables (interactive in notebook)
mcmc_results.plot_variable_failure_quantile_traces()
calculate_diagnostics() Dataset[source]

Shortcut to running calculate_summaries() with kind="diagnostics" and no other arguments.

Returns:

Dataset containing diagnostic metrics

Return type:

xr.Dataset

The method is designed as a simple interface for users who only need diagnostic information without summary statistics.

calculate_summaries(
var_names: list[str] | None = None,
filter_vars: Literal[None, 'like', 'regex'] = None,
kind: Literal['all', 'stats', 'diagnostics'] = 'all',
round_to: custom_types.Integer = 2,
circ_var_names: list[str] | None = None,
stat_focus: str = 'mean',
stat_funcs: dict[str, callable] | callable | None = None,
extend: bool = True,
hdi_prob: custom_types.Float = 0.94,
skipna: bool = False,
diagnostic_varnames: Sequence[str] = ('mcse_mean', 'mcse_sd', 'ess_bulk', 'ess_tail', 'r_hat'),
) xr.Dataset[source]

Compute comprehensive summary statistics and diagnostics for MCMC results.

This method extends the parent class functionality to provide HMC-specific diagnostic capabilities, including automatic separation of statistics and diagnostics into appropriate InferenceData groups. See az.summary for more detail on arguments.

Parameters:
  • var_names (Optional[list[str]]) – Variable names to include. Defaults to None (all variables).

  • filter_vars (Optional[Literal[None, "like", "regex"]]) – Variable filtering method. Defaults to None.

  • kind (Literal["all", "stats", "diagnostics"]) – Type of computations to perform. Defaults to “all”.

  • round_to (custom_types.Integer) – Decimal places for rounding. Defaults to 2.

  • circ_var_names (Optional[list[str]]) – Names of circular variables. Defaults to None.

  • stat_focus (str) – Primary statistic for focus. Defaults to “mean”.

  • stat_funcs (Optional[Union[dict[str, callable], callable]]) – Custom statistic functions. Defaults to None.

  • extend (bool) – Whether to include extended statistics. Defaults to True. Only meaningful if stat_funcs is not None.

  • hdi_prob (custom_types.Float) – Probability for highest density interval. Defaults to 0.94.

  • skipna (bool) – Whether to skip NaN values. Defaults to False.

  • diagnostic_varnames (Sequence[str]) – Names of diagnostic metrics. Defaults to (“mcse_mean”, “mcse_sd”, “ess_bulk”, “ess_tail”, “r_hat”).

Returns:

Combined dataset with all computed metrics

Return type:

xr.Dataset

Enhanced Features:
  • Automatic Dask acceleration for large datasets

  • Separation of statistics and diagnostics into appropriate groups

  • Memory-efficient computation strategies

The method automatically updates the InferenceData object with new groups:
  • variable_summary_stats: Basic summary statistics

  • variable_diagnostic_stats: MCMC diagnostic metrics

diagnose(
max_tree_depth: custom_types.Integer | None = None,
ebfmi_thresh: custom_types.Float = 0.2,
r_hat_thresh: custom_types.Float = 1.01,
ess_thresh: custom_types.Float = 100,
silent: bool = False,
) tuple['custom_types.StrippedTestRes', dict[str, 'custom_types.StrippedTestRes']][source]
Runs the complete MCMC diagnostic pipeline. This involves running, in order:
  1. calculate_diagnostics()

  2. evaluate_sample_stats()

  3. evaluate_variable_diagnostic_stats()

  4. identify_failed_diagnostics()

Typically, users will want to use this method rather than calling the individual methods themselves.

Parameters:
  • max_tree_depth (Optional[custom_types.Integer]) – Maximum tree depth threshold. Uses model default if None. Defaults to None.

  • ebfmi_thresh (custom_types.Float) – E-BFMI threshold for energy diagnostics. Defaults to 0.2.

  • r_hat_thresh (custom_types.Float) – R-hat threshold for convergence assessment. Defaults to 1.01.

  • ess_thresh (custom_types.Float) – ESS threshold per chain. Defaults to 100.

  • silent (bool) – Whether to suppress diagnostic output. Defaults to False.

Returns:

Tuple of (sample_failures, variable_failures) as returned by identify_failed_diagnostics

Return type:

tuple[custom_types.StrippedTestRes, dict[str, custom_types.StrippedTestRes]]

The method provides comprehensive assessment of MCMC sampling quality, identifying both immediate issues (e.g., divergences, energy problems) and convergence concerns (e.g., R-hat, effective sample size).

All intermediate results are stored in the inference_obj attribute for later access and further analysis.

evaluate_sample_stats(
max_tree_depth: custom_types.Integer | None = None,
ebfmi_thresh: custom_types.Float = 0.2,
) xr.Dataset[source]

Evaluate sample-level diagnostic statistics for MCMC quality assessment.

Parameters:
  • max_tree_depth (Optional[custom_types.Integer]) – Maximum tree depth threshold. Uses model default if None. Defaults to None.

  • ebfmi_thresh (custom_types.Float) – E-BFMI threshold for energy diagnostics. Defaults to 0.2.

Returns:

Dataset with boolean arrays indicating test failures

Return type:

xr.Dataset

This method evaluates sample-level diagnostic statistics to identify problematic samples in the MCMC chains. Tests are considered failures when samples exhibit the following characteristics:

  • Tree Depth: Sample reached maximum tree depth (saturation)

  • E-BFMI: Energy-based fraction of missing information below threshold

  • Divergence: Sample diverged during Hamiltonian dynamics

The resulting boolean arrays have True values indicating failed samples and False values indicating successful samples. This information is stored in the ‘sample_diagnostic_tests’ group of the InferenceData object.

Example:
>>> sample_tests = results.evaluate_sample_stats(ebfmi_thresh=0.15)
>>> n_diverged = sample_tests.diverged.sum().item()
>>> print(f"Number of divergent samples: {n_diverged}")
evaluate_variable_diagnostic_stats(
r_hat_thresh: custom_types.Float = 1.01,
ess_thresh=100,
) xr.Dataset[source]

Evaluate variable-level diagnostic statistics for convergence assessment.

Parameters:
  • r_hat_thresh (custom_types.Float) – R-hat threshold for convergence. Defaults to 1.01.

  • ess_thresh (custom_types.Integer) – ESS threshold per chain. Defaults to 100.

Returns:

Dataset with boolean arrays indicating variable-level test failures

Return type:

xr.Dataset

Raises:
  • ValueError – If variable_diagnostic_stats group doesn’t exist

  • ValueError – If required metrics are missing

This method evaluates variable-level diagnostic statistics to identify parameters that exhibit poor sampling behavior. Tests are considered failures when variables meet the following criteria:

Failure Conditions:
  • R-hat: Split R-hat statistic >= threshold (poor convergence)

  • ESS Bulk: Bulk effective sample size / n_chains <= threshold per chain

  • ESS Tail: Tail effective sample size / n_chains <= threshold per chain

Results are stored in the ‘variable_diagnostic_tests’ group with boolean arrays indicating which variables failed which tests.

Example:
>>> var_tests = results.evaluate_variable_diagnostic_stats(r_hat_thresh=1.02)
>>> failed_convergence = var_tests.sel(metric='r_hat').sum()
>>> print(f"Variables with poor convergence: {failed_convergence.sum().item()}")
classmethod from_disk(
path: str,
csv_files: list[str] | str | None = None,
skip_fit: bool = False,
use_dask: bool = False,
) SampleResults[source]

Load SampleResults from saved NetCDF file with optional CSV metadata.

Parameters:
  • path (str) – Path to NetCDF file containing inference data

  • csv_files (Optional[Union[list[str], str]]) – Paths to CSV files output by Stan. Can also be a glob pattern in place of a list. Defaults to None (auto-detect based on path value).

  • skip_fit (bool) – Whether to skip loading CSV metadata. Defaults to False.

  • use_dask (bool) – Whether to enable Dask for computation. Defaults to False.

Returns:

Loaded SampleResults object ready for analysis

Return type:

SampleResults

Raises:

FileNotFoundError – If the specified NetCDF file doesn’t exist

This class method enables loading of previously saved MCMC results from NetCDF format, with optional access to original CSV metadata for complete functionality.

Loading Modes:
  • Full loading: NetCDF + CSV metadata (complete functionality)

  • NetCDF only: Fast loading without CSV metadata (limited functionality)

  • Auto-detection: Automatically finds CSV files based on NetCDF path

When use_dask=True, the loaded data supports out-of-core computation for memory-efficient analysis of large datasets. Management of Dask happens internally, so users do not need to be familiar with Dask to take advantage of it.

Example:
>>> # Load with auto-detected CSV files (csvs must have same basename)
>>> results = SampleResults.from_disk('model_results.nc')
>>>
>>> # Load with explicit CSV files
>>> results = SampleResults.from_disk(
...     'results.nc', csv_files=['chain_1.csv', 'chain_2.csv']
... )
>>>
>>> # Fast loading without CSV metadata
>>> results = SampleResults.from_disk('results.nc', skip_fit=True)
identify_failed_diagnostics(
silent: bool = False,
) tuple['custom_types.StrippedTestRes', dict[str, 'custom_types.StrippedTestRes']][source]

Identify and report diagnostic test failures with comprehensive summary.

Parameters:

silent (bool) – Whether to suppress printed output. Defaults to False.

Returns:

Tuple of (sample_failures, variable_failures) dictionaries

Return type:

tuple[custom_types.StrippedTestRes, dict[str, custom_types.StrippedTestRes]]

This method analyzes the results of diagnostic tests and provides both programmatic access to failure information and human-readable summaries. It requires that diagnostic evaluation methods have been run previously.

Return Structure:
  • sample_failures: Dictionary mapping test names to arrays of failed sample indices

  • variable_failures: Dictionary mapping metric names to dictionaries of failed variables

The method processes test results to extract:
  • Indices of samples that failed each diagnostic test

  • Names of variables that failed each diagnostic metric

  • Summary statistics showing failure rates and percentages

When not silent, provides detailed reporting including:
  • Failure counts and percentages for each test type

  • Variable-specific failure information organized by metric

  • Clear categorization of sample vs. variable-level issues

plot_sample_failure_quantile_traces(
display: Literal[True],
width: custom_types.Integer,
height: custom_types.Integer,
) HoloMap[source]
plot_sample_failure_quantile_traces(
display: Literal[False],
width: custom_types.Integer,
height: custom_types.Integer,
) dict[str, Overlay]

Visualize quantile traces for samples that failed diagnostic tests.

Parameters:
  • display (bool) – Whether to return formatted layout for display. Defaults to True.

  • width (custom_types.Integer) – Width of plots in pixels. Defaults to 600.

  • height (custom_types.Integer) – Height of plots in pixels. Defaults to 600.

Returns:

Quantile trace plots in requested format

Return type:

Union[hv.HoloMap, dict[str, hv.Overlay]]

Raises:

ValueError – If no samples failed diagnostic tests

This method creates specialized trace plots showing how samples that failed diagnostic tests compare to those that passed. The visualization helps identify systematic patterns in sampling failures.

Plot Structure:
  • X-axis: Cumulative fraction of parameters (0 to 1, sorted by typical quantile of failed samples)

  • Y-axis: Quantiles of failed samples relative to passing samples

  • Individual traces: Semi-transparent lines for each failed sample

  • Typical trace: Bold line showing median behavior across failures

  • Reference line: Diagonal indicating perfect calibration

The plots reveal:
  • Whether failures are systematic across parameters

  • Patterns in how failed samples deviate from typical behavior

  • The severity and consistency of sampling problems

Example:
>>> # Display interactive traces
>>> results.plot_sample_failure_quantile_traces()
plot_variable_failure_quantile_traces(
*,
display: Literal[True],
width: custom_types.Integer,
height: custom_types.Integer,
plot_quantiles: bool,
) VariableAnalyzer[source]
plot_variable_failure_quantile_traces(
*,
display: Literal[False],
width: custom_types.Integer,
height: custom_types.Integer,
plot_quantiles: bool,
) HoloViews

Create interactive analyzer for variables that failed diagnostic tests.

Parameters:
  • display (bool) – Whether to return display-ready analyzer. Defaults to True.

  • width (custom_types.Integer) – Width of plots in pixels. Defaults to 800.

  • height (custom_types.Integer) – Height of plots in pixels. Defaults to 400.

  • plot_quantiles (bool) – Whether to plot quantiles vs raw values. Defaults to False.

Returns:

Interactive analyzer or Panel layout

Return type:

Union[VariableAnalyzer, pn.pane.HoloViews]

This method creates an interactive analysis tool for examining individual variables that failed diagnostic tests. The analyzer provides widgets for selecting specific variables, diagnostic metrics, and array indices.

Interactive Features:
  • Variable Selection: Choose from variables that failed any test

  • Metric Selection: Focus on specific diagnostic failures

  • Index Selection: Examine individual array elements for multi-dimensional parameters

The resulting trace plots show:
  • Sample trajectories across MCMC chains with distinct colors

  • Quantile analysis relative to parameters that passed tests

  • Hover information with detailed sample metadata

  • Chain-specific behavior identification

This tool is particularly valuable for:
  • Understanding the nature of convergence problems

  • Identifying problematic parameter regions

  • Diagnosing systematic vs. sporadic sampling issues

  • Planning model reparameterization strategies

Example:
>>> # Interactive analysis in notebook
>>> analyzer = results.plot_variable_failure_quantile_traces()
>>> analyzer  # Display widget interface

Variable Failure Analyzer

Users will not typically instantiate this class directly. It is the return type of plot_variable_failure_quantile_traces() and provides the interactive analysis interface.

class scistanpy.model.results.hmc.VariableAnalyzer(
sample_results: SampleResults,
plot_width: custom_types.Integer = 800,
plot_height: custom_types.Integer = 400,
plot_quantiles: bool = False,
)[source]

Bases: object

Interactive analysis tool for variables that fail MCMC diagnostic tests.

This class provides an interactive interface for analyzing individual variables that have failed diagnostic tests during MCMC sampling. It creates a dashboard with widgets for selecting variables, metrics, and specific array indices, along with trace plots showing the problematic sampling behavior.

Parameters:
  • sample_results (SampleResults) – SampleResults object containing MCMC diagnostics

  • plot_width (custom_types.Integer) – Width of plots in pixels. Defaults to 800.

  • plot_height (custom_types.Integer) – Height of plots in pixels. Defaults to 400.

  • plot_quantiles (bool) – Whether to plot quantiles vs raw values. Defaults to False.

Variables:
  • sample_results – Reference to source sampling results

  • plot_quantiles – Flag controlling plot content type

  • n_chains – Number of MCMC chains in the results

  • x – Array of step indices for x-axis

  • failed_vars – Dictionary mapping variable names to failure information

  • varchoice – Widget for selecting variables to analyze

  • metricchoice – Widget for selecting diagnostic metrics

  • indexchoice – Widget for selecting array indices

  • plot_width – Recorded width of plots

  • plot_height – Recorded height of plots

  • fig – HoloViews pane containing the current plot

  • layout – Panel layout containing all interface elements

The analyzer automatically identifies variables that have failed diagnostic tests and organizes them by failure type. It provides trace plots that can show either raw parameter values or their quantiles relative to passing samples, helping identify the nature of sampling problems.

Key Features:
  • Automatic identification of failed variables and metrics

  • Interactive widget-based navigation

  • Trace plots with chain-specific coloring

  • Quantile-based analysis for identifying sampling bias

  • Real-time plot updates based on widget selections

Note

This class should not be instantiated directly. Use the plot_variable_failure_quantile_traces() method of SampleResults instead.

display()[source]

Display the complete interactive analysis interface.

Returns:

Panel layout containing all widgets and plots

Return type:

pn.Layout

This method returns the complete interactive interface for display in Jupyter notebooks or Panel applications.

CSV to NetCDF Conversion

Stan results are output in CSV format, which is quite inefficient for large datasets. The following utilities are responsible for converting these CSV files into the more efficient NetCDF file format. Once in NetCDF format, it is easy to manipulate samples using packages such as xarray, dask, and arviz.

scistanpy.model.results.hmc.cmdstan_csv_to_netcdf(
path: str | list[str] | os.PathLike | CmdStanMCMC,
model: Model,
data: dict[str, Any] | None = None,
output_filename: str | None = None,
precision: Literal['double', 'single', 'half'] = 'single',
mib_per_chunk: custom_types.Integer | None = None,
) str[source]

Convert CmdStan CSV output to NetCDF format.

This function provides a high-level interface for converting CmdStan sampling results from CSV format to NetCDF, enabling efficient storage and processing of large MCMC datasets.

Parameters:
  • path (Union[str, list[str], os.PathLike, CmdStanMCMC]) – Path to CSV files or CmdStanMCMC object

  • model (Model) – SciStanPy model used for sampling

  • data (Optional[dict[str, Any]]) – Observed data dictionary. Uses model default if None. Defaults to None.

  • output_filename (Optional[str]) – Output NetCDF filename. Auto-generated if None. Defaults to None.

  • precision (Literal["double", "single", "half"]) – Numerical precision for stored arrays. Defaults to “single”.

  • mib_per_chunk (Optional[custom_types.Integer]) – Memory limit per chunk in MiB. Defaults to None, meaning use Dask default.

Returns:

Path to created NetCDF file

Return type:

str

The conversion process:
  1. Analyzes model structure to determine optimal storage layout

  2. Creates NetCDF file with appropriate groups and dimensions

  3. Converts CSV data with proper chunking for memory efficiency

  4. Organizes results into ArviZ-compatible structure

Benefits of NetCDF format:
  • Significantly faster loading compared to CSV

  • Memory-efficient access with chunking support

  • Metadata preservation and self-describing format

  • Integration with scientific Python ecosystem

Example:
>>> netcdf_path = cmdstan_csv_to_netcdf(
...     'model_output*.csv', model, precision='single'
... )
>>> results = SampleResults.from_disk(netcdf_path)
class scistanpy.model.results.hmc.CmdStanMCMCToNetCDFConverter(
fit: CmdStanMCMC | str | list[str] | os.PathLike,
model: Model,
data: dict[str, Any] | None = None,
)[source]

Bases: object

Object responsible for converting CmdStan CSV output to NetCDF format. This class is used internally by the cmdstan_csv_to_netcdf() function and should not be instantiated directly in most use cases.

This class handles the conversion of CmdStan CSV output files to NetCDF format, providing efficient storage and access for large MCMC datasets. It properly organizes data into appropriate groups and handles dimension naming and chunking strategies.

Parameters:
  • fit (Union[CmdStanMCMC, str, list[str], os.PathLike]) – CmdStanMCMC object or path to CSV files

  • model (Model) – SciStanPy model object for metadata extraction

  • data (Optional[dict[str, Any]]) – Optional observed data dictionary. Defaults to None.

Variables:
  • fit – CmdStanMCMC object containing sampling results

  • model – Reference to the original SciStanPy model

  • data – Observed data used for model fitting

  • config – Configuration dictionary from Stan sampling

  • num_draws – Total number of draws including warmup if saved

  • varname_to_column_order – Mapping from variables to csv column indices

The converter handles:
  • Automatic detection of variable types and dimensions

  • Proper NetCDF group organization

  • Chunking strategies for large datasets

  • Data type optimization based on precision requirements

write_netcdf(
filename: str | None = None,
precision: Literal['double', 'single', 'half'] = 'single',
mib_per_chunk: custom_types.Integer | None = None,
) str[source]

Write the converted data to NetCDF format.

Parameters:
  • filename (Optional[str]) – Output filename. Auto-generated if None. Defaults to None.

  • precision (Literal["double", "single", "half"]) – Numerical precision for arrays. Defaults to “single”.

  • mib_per_chunk (Optional[custom_types.Integer]) – Memory limit per chunk in MiB. Defaults to None, meaning use Dask default.

Returns:

Path to the created NetCDF file

Return type:

str

This method orchestrates the complete conversion process:
  1. Creates NetCDF file with appropriate structure

  2. Sets up dimensions based on model and data characteristics

  3. Creates variables with optimal chunking strategies

  4. Populates data from CSV files with progress tracking

The resulting NetCDF file contains properly organized groups for posterior samples, posterior predictive samples, sample statistics, and observed data.

Utility Functions

The following utility functions are used internally by the other classes and functions in this module and will not typically be called directly by users.

scistanpy.model.results.hmc.dask_enabled_summary_stats(
inference_obj: InferenceData,
) Dataset[source]

Compute summary statistics using Dask for memory efficiency. This is used inside the SampleResults.calculate_summaries() method when Dask is enabled.

Parameters:

inference_obj (az.InferenceData) – ArviZ InferenceData object containing posterior samples

Returns:

Dataset containing computed summary statistics

Return type:

xr.Dataset

This function computes basic summary statistics (mean, standard deviation, and highest density intervals) using Dask for memory-efficient computation on large datasets that might not fit in memory.

The function leverages Dask’s lazy evaluation to:
  • Queue multiple computations for efficient execution

  • Minimize memory usage through chunked processing

  • Provide progress tracking for long-running computations

Computed Statistics:
  • Mean across chains and draws

  • Standard deviation across chains and draws

  • 94% highest density intervals

Example:
>>> stats = dask_enabled_summary_stats(inference_data)
>>> print(stats.sel(metric='mean'))
scistanpy.model.results.hmc.dask_enabled_diagnostics(
inference_obj: InferenceData,
) Dataset[source]

Compute MCMC diagnostics using Dask for memory efficiency. This is used inside the SampleResults.calculate_summaries() method when Dask is enabled.

Parameters:

inference_obj (az.InferenceData) – ArviZ InferenceData object containing posterior samples

Returns:

Dataset containing computed diagnostic metrics

Return type:

xr.Dataset

This function computes comprehensive MCMC diagnostic metrics using Dask for memory-efficient computation on large datasets. All diagnostics are computed simultaneously to maximize efficiency.

Computed Diagnostics:
  • Monte Carlo standard errors (mean and sd methods)

  • Effective sample sizes (bulk and tail)

  • R-hat convergence diagnostic

The Dask implementation enables:
  • Parallel computation across available cores

  • Memory-efficient processing of large datasets

  • Automatic load balancing and optimization

Example:
>>> diagnostics = dask_enabled_diagnostics(inference_data)
>>> print(diagnostics.sel(metric='r_hat'))
scistanpy.model.results.hmc.fit_from_csv_noload(path: str | list[str] | PathLike) CmdStanMCMC[source]

Create CmdStanMCMC object from CSV files without loading data into memory. This function is adapted from cmdstanpy.from_csv.

Parameters:

path (Union[str, list[str], os.PathLike]) – Path specification for CSV files (single file, list, or glob pattern)

Returns:

CmdStanMCMC object with metadata but no loaded sample data

Return type:

CmdStanMCMC

Raises:
  • ValueError – If path specification is invalid or no CSV files found

  • ValueError – If CSV files are not valid Stan output

This function provides a memory-efficient way to create CmdStanMCMC objects by parsing only the metadata from CSV files without loading the actual sample data. This is particularly useful for large datasets where memory usage is a concern.

Path Specifications:
  • Single file: Direct path to one CSV file

  • File list: List of paths to multiple CSV files

  • Glob pattern: Wildcard pattern for automatic file discovery

  • Directory: Directory containing CSV files (loads all .csv files)

The function performs validation to ensure:
  • All specified files exist and are readable

  • Files contain valid Stan CSV output

  • Sampling method is compatible (only ‘sample’ method supported)

  • Configuration is consistent across files

This approach enables efficient processing workflows where sample data is converted to more efficient formats (like NetCDF) without requiring full memory loading of the original CSV files.

Example:
>>> # Load from glob pattern
>>> fit = fit_from_csv_noload('model_output_*.csv')
>>>
>>> # Load from explicit list
>>> fit = fit_from_csv_noload(['chain1.csv', 'chain2.csv'])
>>>
>>> # Use for conversion without memory loading
>>> netcdf_path = cmdstan_csv_to_netcdf(fit, model)