Hamiltonian Monte Carlo Results API Reference¶
Hamiltonian Monte Carlo (HMC) sampling results analysis and diagnostics.
This module provides tools for analyzing and diagnosing HMC sampling results from Stan models. It offers specialized classes and functions for processing MCMC output, conducting diagnostic tests, and creating interactive visualizations for model validation and troubleshooting.
The module centers around the SampleResults
class, which extends MLEInferenceRes
to provide
HMC-specific functionality including convergence diagnostics, sample quality assessment,
and specialized visualization tools for identifying problematic parameters and sampling
behavior.
- Key Features:
MCMC diagnostic test suites
Interactive visualization tools for failed diagnostics
Efficient CSV to NetCDF conversion for large datasets
Dask-enabled processing for memory-intensive operations
Specialized trace plot analysis for problematic variables
Automated detection and reporting of sampling issues
- Diagnostic Capabilities:
R-hat convergence assessment
Effective sample size (ESS) evaluation
Energy fraction of missing information (E-BFMI) analysis
Divergence detection and analysis
Tree depth saturation monitoring
Variable-specific failure pattern identification
The module is designed to handle both small-scale interactive analysis and large-scale batch processing of MCMC results, with particular attention to memory efficiency and computational performance for complex models.
The main class that users will interact with is
SampleResults
. Other classes in this module
provide supporting functionality for diagnostics, visualization, and data conversion.
Sample Results Analysis¶
- class scistanpy.model.results.hmc.SampleResults(
- model: 'Model' | None = None,
- fit: str | list[str] | os.PathLike | CmdStanMCMC | None = None,
- data: dict[str, npt.NDArray] | None = None,
- precision: Literal['double', 'single', 'half'] = 'single',
- inference_obj: az.InferenceData | str | None = None,
- mib_per_chunk: custom_types.Integer | None = None,
- use_dask: bool = False,
Bases:
MLEInferenceRes
Comprehensive analysis interface for HMC sampling results. This class should never be instantiated directly. Instead, use the from_disk method to load the appropriate results object from disk.
This class extends MLEInferenceRes to provide specialized functionality for analyzing Hamiltonian Monte Carlo sampling results from Stan. It offers comprehensive diagnostic capabilities, interactive visualization tools, and efficient data management for large MCMC datasets.
- Parameters:
model (Optional[Model]) – SciStanPy model used for sampling. Defaults to None.
fit (Optional[Union[str, list[str], os.PathLike, CmdStanMCMC]]) – CmdStanMCMC object or path to CSV files. Defaults to None.
data (Optional[dict[str, npt.NDArray]]) – Observed data dictionary. Defaults to None.
precision (Literal["double", "single", "half"]) – Numerical precision for arrays. Defaults to “single”.
inference_obj (Optional[Union[az.InferenceData, str]]) – Pre-existing InferenceData or NetCDF path. Defaults to None.
mib_per_chunk (Optional[custom_types.Integer]) – Memory limit per chunk in MiB. Defaults to None.
use_dask (bool) – Whether to use Dask for computation. Defaults to False.
- Variables:
fit – CmdStanMCMC object containing sampling metadata
use_dask – Flag controlling Dask usage for computation
- The class provides comprehensive functionality for:
MCMC convergence diagnostics and reporting
Sample quality assessment and visualization
Interactive analysis of problematic variables
Efficient handling of large datasets with Dask integration
Automated detection and reporting of sampling issues
- Key Diagnostic Features:
R-hat convergence assessment
Effective sample size evaluation
Energy-based diagnostics (E-BFMI)
Divergence detection and analysis
Tree depth saturation monitoring
The class automatically handles NetCDF conversion for efficient storage and supports both in-memory and out-of-core computation depending on dataset size and available memory.
- Example:
import scistanpy as ssp # Get MCMC results mcmc_results = model.mcmc(data=observed_data, chains=4, iter_sampling=2000) # Run full diagnostics diagnostics = mcmc_results.diagnose() # Posterior predictive check (interactive in notebook) mcmc_results.run_ppc() # Evaluate problematic samples (interactive in notebook) mcmc_results.plot_sample_failure_quantile_traces() # Evaluate problematic variables (interactive in notebook) mcmc_results.plot_variable_failure_quantile_traces()
- calculate_diagnostics() Dataset [source]¶
Shortcut to running
calculate_summaries()
withkind="diagnostics"
and no other arguments.- Returns:
Dataset containing diagnostic metrics
- Return type:
xr.Dataset
The method is designed as a simple interface for users who only need diagnostic information without summary statistics.
- calculate_summaries(
- var_names: list[str] | None = None,
- filter_vars: Literal[None, 'like', 'regex'] = None,
- kind: Literal['all', 'stats', 'diagnostics'] = 'all',
- round_to: custom_types.Integer = 2,
- circ_var_names: list[str] | None = None,
- stat_focus: str = 'mean',
- stat_funcs: dict[str, callable] | callable | None = None,
- extend: bool = True,
- hdi_prob: custom_types.Float = 0.94,
- skipna: bool = False,
- diagnostic_varnames: Sequence[str] = ('mcse_mean', 'mcse_sd', 'ess_bulk', 'ess_tail', 'r_hat'),
Compute comprehensive summary statistics and diagnostics for MCMC results.
This method extends the parent class functionality to provide HMC-specific diagnostic capabilities, including automatic separation of statistics and diagnostics into appropriate InferenceData groups. See
az.summary
for more detail on arguments.- Parameters:
var_names (Optional[list[str]]) – Variable names to include. Defaults to None (all variables).
filter_vars (Optional[Literal[None, "like", "regex"]]) – Variable filtering method. Defaults to None.
kind (Literal["all", "stats", "diagnostics"]) – Type of computations to perform. Defaults to “all”.
round_to (custom_types.Integer) – Decimal places for rounding. Defaults to 2.
circ_var_names (Optional[list[str]]) – Names of circular variables. Defaults to None.
stat_focus (str) – Primary statistic for focus. Defaults to “mean”.
stat_funcs (Optional[Union[dict[str, callable], callable]]) – Custom statistic functions. Defaults to None.
extend (bool) – Whether to include extended statistics. Defaults to True. Only meaningful if stat_funcs is not None.
hdi_prob (custom_types.Float) – Probability for highest density interval. Defaults to 0.94.
skipna (bool) – Whether to skip NaN values. Defaults to False.
diagnostic_varnames (Sequence[str]) – Names of diagnostic metrics. Defaults to (“mcse_mean”, “mcse_sd”, “ess_bulk”, “ess_tail”, “r_hat”).
- Returns:
Combined dataset with all computed metrics
- Return type:
xr.Dataset
- Enhanced Features:
Automatic Dask acceleration for large datasets
Separation of statistics and diagnostics into appropriate groups
Memory-efficient computation strategies
- The method automatically updates the InferenceData object with new groups:
variable_summary_stats: Basic summary statistics
variable_diagnostic_stats: MCMC diagnostic metrics
- diagnose(
- max_tree_depth: custom_types.Integer | None = None,
- ebfmi_thresh: custom_types.Float = 0.2,
- r_hat_thresh: custom_types.Float = 1.01,
- ess_thresh: custom_types.Float = 100,
- silent: bool = False,
- Runs the complete MCMC diagnostic pipeline. This involves running, in order:
Typically, users will want to use this method rather than calling the individual methods themselves.
- Parameters:
max_tree_depth (Optional[custom_types.Integer]) – Maximum tree depth threshold. Uses model default if None. Defaults to None.
ebfmi_thresh (custom_types.Float) – E-BFMI threshold for energy diagnostics. Defaults to 0.2.
r_hat_thresh (custom_types.Float) – R-hat threshold for convergence assessment. Defaults to 1.01.
ess_thresh (custom_types.Float) – ESS threshold per chain. Defaults to 100.
silent (bool) – Whether to suppress diagnostic output. Defaults to False.
- Returns:
Tuple of (sample_failures, variable_failures) as returned by identify_failed_diagnostics
- Return type:
tuple[custom_types.StrippedTestRes, dict[str, custom_types.StrippedTestRes]]
The method provides comprehensive assessment of MCMC sampling quality, identifying both immediate issues (e.g., divergences, energy problems) and convergence concerns (e.g., R-hat, effective sample size).
All intermediate results are stored in the
inference_obj
attribute for later access and further analysis.
- evaluate_sample_stats(
- max_tree_depth: custom_types.Integer | None = None,
- ebfmi_thresh: custom_types.Float = 0.2,
Evaluate sample-level diagnostic statistics for MCMC quality assessment.
- Parameters:
max_tree_depth (Optional[custom_types.Integer]) – Maximum tree depth threshold. Uses model default if None. Defaults to None.
ebfmi_thresh (custom_types.Float) – E-BFMI threshold for energy diagnostics. Defaults to 0.2.
- Returns:
Dataset with boolean arrays indicating test failures
- Return type:
xr.Dataset
This method evaluates sample-level diagnostic statistics to identify problematic samples in the MCMC chains. Tests are considered failures when samples exhibit the following characteristics:
Tree Depth: Sample reached maximum tree depth (saturation)
E-BFMI: Energy-based fraction of missing information below threshold
Divergence: Sample diverged during Hamiltonian dynamics
The resulting boolean arrays have
True
values indicating failed samples andFalse
values indicating successful samples. This information is stored in the ‘sample_diagnostic_tests’ group of the InferenceData object.- Example:
>>> sample_tests = results.evaluate_sample_stats(ebfmi_thresh=0.15) >>> n_diverged = sample_tests.diverged.sum().item() >>> print(f"Number of divergent samples: {n_diverged}")
- evaluate_variable_diagnostic_stats(
- r_hat_thresh: custom_types.Float = 1.01,
- ess_thresh=100,
Evaluate variable-level diagnostic statistics for convergence assessment.
- Parameters:
r_hat_thresh (custom_types.Float) – R-hat threshold for convergence. Defaults to 1.01.
ess_thresh (custom_types.Integer) – ESS threshold per chain. Defaults to 100.
- Returns:
Dataset with boolean arrays indicating variable-level test failures
- Return type:
xr.Dataset
- Raises:
ValueError – If variable_diagnostic_stats group doesn’t exist
ValueError – If required metrics are missing
This method evaluates variable-level diagnostic statistics to identify parameters that exhibit poor sampling behavior. Tests are considered failures when variables meet the following criteria:
- Failure Conditions:
R-hat: Split R-hat statistic >= threshold (poor convergence)
ESS Bulk: Bulk effective sample size / n_chains <= threshold per chain
ESS Tail: Tail effective sample size / n_chains <= threshold per chain
Results are stored in the ‘variable_diagnostic_tests’ group with boolean arrays indicating which variables failed which tests.
- Example:
>>> var_tests = results.evaluate_variable_diagnostic_stats(r_hat_thresh=1.02) >>> failed_convergence = var_tests.sel(metric='r_hat').sum() >>> print(f"Variables with poor convergence: {failed_convergence.sum().item()}")
- classmethod from_disk(
- path: str,
- csv_files: list[str] | str | None = None,
- skip_fit: bool = False,
- use_dask: bool = False,
Load SampleResults from saved NetCDF file with optional CSV metadata.
- Parameters:
path (str) – Path to NetCDF file containing inference data
csv_files (Optional[Union[list[str], str]]) – Paths to CSV files output by Stan. Can also be a glob pattern in place of a list. Defaults to None (auto-detect based on
path
value).skip_fit (bool) – Whether to skip loading CSV metadata. Defaults to False.
use_dask (bool) – Whether to enable Dask for computation. Defaults to False.
- Returns:
Loaded SampleResults object ready for analysis
- Return type:
- Raises:
FileNotFoundError – If the specified NetCDF file doesn’t exist
This class method enables loading of previously saved MCMC results from NetCDF format, with optional access to original CSV metadata for complete functionality.
- Loading Modes:
Full loading: NetCDF + CSV metadata (complete functionality)
NetCDF only: Fast loading without CSV metadata (limited functionality)
Auto-detection: Automatically finds CSV files based on NetCDF path
When use_dask=True, the loaded data supports out-of-core computation for memory-efficient analysis of large datasets. Management of Dask happens internally, so users do not need to be familiar with Dask to take advantage of it.
- Example:
>>> # Load with auto-detected CSV files (csvs must have same basename) >>> results = SampleResults.from_disk('model_results.nc') >>> >>> # Load with explicit CSV files >>> results = SampleResults.from_disk( ... 'results.nc', csv_files=['chain_1.csv', 'chain_2.csv'] ... ) >>> >>> # Fast loading without CSV metadata >>> results = SampleResults.from_disk('results.nc', skip_fit=True)
- identify_failed_diagnostics(
- silent: bool = False,
Identify and report diagnostic test failures with comprehensive summary.
- Parameters:
silent (bool) – Whether to suppress printed output. Defaults to False.
- Returns:
Tuple of (sample_failures, variable_failures) dictionaries
- Return type:
tuple[custom_types.StrippedTestRes, dict[str, custom_types.StrippedTestRes]]
This method analyzes the results of diagnostic tests and provides both programmatic access to failure information and human-readable summaries. It requires that diagnostic evaluation methods have been run previously.
- Return Structure:
sample_failures: Dictionary mapping test names to arrays of failed sample indices
variable_failures: Dictionary mapping metric names to dictionaries of failed variables
- The method processes test results to extract:
Indices of samples that failed each diagnostic test
Names of variables that failed each diagnostic metric
Summary statistics showing failure rates and percentages
- When not silent, provides detailed reporting including:
Failure counts and percentages for each test type
Variable-specific failure information organized by metric
Clear categorization of sample vs. variable-level issues
- plot_sample_failure_quantile_traces(
- display: Literal[True],
- width: custom_types.Integer,
- height: custom_types.Integer,
- plot_sample_failure_quantile_traces(
- display: Literal[False],
- width: custom_types.Integer,
- height: custom_types.Integer,
Visualize quantile traces for samples that failed diagnostic tests.
- Parameters:
display (bool) – Whether to return formatted layout for display. Defaults to True.
width (custom_types.Integer) – Width of plots in pixels. Defaults to 600.
height (custom_types.Integer) – Height of plots in pixels. Defaults to 600.
- Returns:
Quantile trace plots in requested format
- Return type:
Union[hv.HoloMap, dict[str, hv.Overlay]]
- Raises:
ValueError – If no samples failed diagnostic tests
This method creates specialized trace plots showing how samples that failed diagnostic tests compare to those that passed. The visualization helps identify systematic patterns in sampling failures.
- Plot Structure:
X-axis: Cumulative fraction of parameters (0 to 1, sorted by typical quantile of failed samples)
Y-axis: Quantiles of failed samples relative to passing samples
Individual traces: Semi-transparent lines for each failed sample
Typical trace: Bold line showing median behavior across failures
Reference line: Diagonal indicating perfect calibration
- The plots reveal:
Whether failures are systematic across parameters
Patterns in how failed samples deviate from typical behavior
The severity and consistency of sampling problems
- Example:
>>> # Display interactive traces >>> results.plot_sample_failure_quantile_traces()
- plot_variable_failure_quantile_traces(
- *,
- display: Literal[True],
- width: custom_types.Integer,
- height: custom_types.Integer,
- plot_quantiles: bool,
- plot_variable_failure_quantile_traces(
- *,
- display: Literal[False],
- width: custom_types.Integer,
- height: custom_types.Integer,
- plot_quantiles: bool,
Create interactive analyzer for variables that failed diagnostic tests.
- Parameters:
display (bool) – Whether to return display-ready analyzer. Defaults to True.
width (custom_types.Integer) – Width of plots in pixels. Defaults to 800.
height (custom_types.Integer) – Height of plots in pixels. Defaults to 400.
plot_quantiles (bool) – Whether to plot quantiles vs raw values. Defaults to False.
- Returns:
Interactive analyzer or Panel layout
- Return type:
Union[VariableAnalyzer, pn.pane.HoloViews]
This method creates an interactive analysis tool for examining individual variables that failed diagnostic tests. The analyzer provides widgets for selecting specific variables, diagnostic metrics, and array indices.
- Interactive Features:
Variable Selection: Choose from variables that failed any test
Metric Selection: Focus on specific diagnostic failures
Index Selection: Examine individual array elements for multi-dimensional parameters
- The resulting trace plots show:
Sample trajectories across MCMC chains with distinct colors
Quantile analysis relative to parameters that passed tests
Hover information with detailed sample metadata
Chain-specific behavior identification
- This tool is particularly valuable for:
Understanding the nature of convergence problems
Identifying problematic parameter regions
Diagnosing systematic vs. sporadic sampling issues
Planning model reparameterization strategies
- Example:
>>> # Interactive analysis in notebook >>> analyzer = results.plot_variable_failure_quantile_traces() >>> analyzer # Display widget interface
Variable Failure Analyzer¶
Users will not typically instantiate this class directly. It is the return type of plot_variable_failure_quantile_traces()
and provides the interactive analysis interface.
- class scistanpy.model.results.hmc.VariableAnalyzer(
- sample_results: SampleResults,
- plot_width: custom_types.Integer = 800,
- plot_height: custom_types.Integer = 400,
- plot_quantiles: bool = False,
Bases:
object
Interactive analysis tool for variables that fail MCMC diagnostic tests.
This class provides an interactive interface for analyzing individual variables that have failed diagnostic tests during MCMC sampling. It creates a dashboard with widgets for selecting variables, metrics, and specific array indices, along with trace plots showing the problematic sampling behavior.
- Parameters:
sample_results (SampleResults) – SampleResults object containing MCMC diagnostics
plot_width (custom_types.Integer) – Width of plots in pixels. Defaults to 800.
plot_height (custom_types.Integer) – Height of plots in pixels. Defaults to 400.
plot_quantiles (bool) – Whether to plot quantiles vs raw values. Defaults to False.
- Variables:
sample_results – Reference to source sampling results
plot_quantiles – Flag controlling plot content type
n_chains – Number of MCMC chains in the results
x – Array of step indices for x-axis
failed_vars – Dictionary mapping variable names to failure information
varchoice – Widget for selecting variables to analyze
metricchoice – Widget for selecting diagnostic metrics
indexchoice – Widget for selecting array indices
plot_width – Recorded width of plots
plot_height – Recorded height of plots
fig – HoloViews pane containing the current plot
layout – Panel layout containing all interface elements
The analyzer automatically identifies variables that have failed diagnostic tests and organizes them by failure type. It provides trace plots that can show either raw parameter values or their quantiles relative to passing samples, helping identify the nature of sampling problems.
- Key Features:
Automatic identification of failed variables and metrics
Interactive widget-based navigation
Trace plots with chain-specific coloring
Quantile-based analysis for identifying sampling bias
Real-time plot updates based on widget selections
Note
This class should not be instantiated directly. Use the
plot_variable_failure_quantile_traces()
method ofSampleResults
instead.
CSV to NetCDF Conversion¶
Stan results are output in CSV format, which is quite inefficient for large datasets. The following utilities are responsible for converting these CSV files into the more efficient NetCDF file format. Once in NetCDF format, it is easy to manipulate samples using packages such as xarray, dask, and arviz.
- scistanpy.model.results.hmc.cmdstan_csv_to_netcdf(
- path: str | list[str] | os.PathLike | CmdStanMCMC,
- model: Model,
- data: dict[str, Any] | None = None,
- output_filename: str | None = None,
- precision: Literal['double', 'single', 'half'] = 'single',
- mib_per_chunk: custom_types.Integer | None = None,
Convert CmdStan CSV output to NetCDF format.
This function provides a high-level interface for converting CmdStan sampling results from CSV format to NetCDF, enabling efficient storage and processing of large MCMC datasets.
- Parameters:
path (Union[str, list[str], os.PathLike, CmdStanMCMC]) – Path to CSV files or CmdStanMCMC object
model (Model) – SciStanPy model used for sampling
data (Optional[dict[str, Any]]) – Observed data dictionary. Uses model default if None. Defaults to None.
output_filename (Optional[str]) – Output NetCDF filename. Auto-generated if None. Defaults to None.
precision (Literal["double", "single", "half"]) – Numerical precision for stored arrays. Defaults to “single”.
mib_per_chunk (Optional[custom_types.Integer]) – Memory limit per chunk in MiB. Defaults to None, meaning use Dask default.
- Returns:
Path to created NetCDF file
- Return type:
str
- The conversion process:
Analyzes model structure to determine optimal storage layout
Creates NetCDF file with appropriate groups and dimensions
Converts CSV data with proper chunking for memory efficiency
Organizes results into ArviZ-compatible structure
- Benefits of NetCDF format:
Significantly faster loading compared to CSV
Memory-efficient access with chunking support
Metadata preservation and self-describing format
Integration with scientific Python ecosystem
- Example:
>>> netcdf_path = cmdstan_csv_to_netcdf( ... 'model_output*.csv', model, precision='single' ... ) >>> results = SampleResults.from_disk(netcdf_path)
- class scistanpy.model.results.hmc.CmdStanMCMCToNetCDFConverter(
- fit: CmdStanMCMC | str | list[str] | os.PathLike,
- model: Model,
- data: dict[str, Any] | None = None,
Bases:
object
Object responsible for converting CmdStan CSV output to NetCDF format. This class is used internally by the
cmdstan_csv_to_netcdf()
function and should not be instantiated directly in most use cases.This class handles the conversion of CmdStan CSV output files to NetCDF format, providing efficient storage and access for large MCMC datasets. It properly organizes data into appropriate groups and handles dimension naming and chunking strategies.
- Parameters:
fit (Union[CmdStanMCMC, str, list[str], os.PathLike]) – CmdStanMCMC object or path to CSV files
model (Model) – SciStanPy model object for metadata extraction
data (Optional[dict[str, Any]]) – Optional observed data dictionary. Defaults to None.
- Variables:
fit – CmdStanMCMC object containing sampling results
model – Reference to the original SciStanPy model
data – Observed data used for model fitting
config – Configuration dictionary from Stan sampling
num_draws – Total number of draws including warmup if saved
varname_to_column_order – Mapping from variables to csv column indices
- The converter handles:
Automatic detection of variable types and dimensions
Proper NetCDF group organization
Chunking strategies for large datasets
Data type optimization based on precision requirements
- write_netcdf(
- filename: str | None = None,
- precision: Literal['double', 'single', 'half'] = 'single',
- mib_per_chunk: custom_types.Integer | None = None,
Write the converted data to NetCDF format.
- Parameters:
filename (Optional[str]) – Output filename. Auto-generated if None. Defaults to None.
precision (Literal["double", "single", "half"]) – Numerical precision for arrays. Defaults to “single”.
mib_per_chunk (Optional[custom_types.Integer]) – Memory limit per chunk in MiB. Defaults to None, meaning use Dask default.
- Returns:
Path to the created NetCDF file
- Return type:
str
- This method orchestrates the complete conversion process:
Creates NetCDF file with appropriate structure
Sets up dimensions based on model and data characteristics
Creates variables with optimal chunking strategies
Populates data from CSV files with progress tracking
The resulting NetCDF file contains properly organized groups for posterior samples, posterior predictive samples, sample statistics, and observed data.
Utility Functions¶
The following utility functions are used internally by the other classes and functions in this module and will not typically be called directly by users.
- scistanpy.model.results.hmc.dask_enabled_summary_stats(
- inference_obj: InferenceData,
Compute summary statistics using Dask for memory efficiency. This is used inside the
SampleResults.calculate_summaries()
method when Dask is enabled.- Parameters:
inference_obj (az.InferenceData) – ArviZ InferenceData object containing posterior samples
- Returns:
Dataset containing computed summary statistics
- Return type:
xr.Dataset
This function computes basic summary statistics (mean, standard deviation, and highest density intervals) using Dask for memory-efficient computation on large datasets that might not fit in memory.
- The function leverages Dask’s lazy evaluation to:
Queue multiple computations for efficient execution
Minimize memory usage through chunked processing
Provide progress tracking for long-running computations
- Computed Statistics:
Mean across chains and draws
Standard deviation across chains and draws
94% highest density intervals
- Example:
>>> stats = dask_enabled_summary_stats(inference_data) >>> print(stats.sel(metric='mean'))
- scistanpy.model.results.hmc.dask_enabled_diagnostics(
- inference_obj: InferenceData,
Compute MCMC diagnostics using Dask for memory efficiency. This is used inside the
SampleResults.calculate_summaries()
method when Dask is enabled.- Parameters:
inference_obj (az.InferenceData) – ArviZ InferenceData object containing posterior samples
- Returns:
Dataset containing computed diagnostic metrics
- Return type:
xr.Dataset
This function computes comprehensive MCMC diagnostic metrics using Dask for memory-efficient computation on large datasets. All diagnostics are computed simultaneously to maximize efficiency.
- Computed Diagnostics:
Monte Carlo standard errors (mean and sd methods)
Effective sample sizes (bulk and tail)
R-hat convergence diagnostic
- The Dask implementation enables:
Parallel computation across available cores
Memory-efficient processing of large datasets
Automatic load balancing and optimization
- Example:
>>> diagnostics = dask_enabled_diagnostics(inference_data) >>> print(diagnostics.sel(metric='r_hat'))
- scistanpy.model.results.hmc.fit_from_csv_noload(path: str | list[str] | PathLike) CmdStanMCMC [source]¶
Create CmdStanMCMC object from CSV files without loading data into memory. This function is adapted from
cmdstanpy.from_csv
.- Parameters:
path (Union[str, list[str], os.PathLike]) – Path specification for CSV files (single file, list, or glob pattern)
- Returns:
CmdStanMCMC object with metadata but no loaded sample data
- Return type:
CmdStanMCMC
- Raises:
ValueError – If path specification is invalid or no CSV files found
ValueError – If CSV files are not valid Stan output
This function provides a memory-efficient way to create CmdStanMCMC objects by parsing only the metadata from CSV files without loading the actual sample data. This is particularly useful for large datasets where memory usage is a concern.
- Path Specifications:
Single file: Direct path to one CSV file
File list: List of paths to multiple CSV files
Glob pattern: Wildcard pattern for automatic file discovery
Directory: Directory containing CSV files (loads all .csv files)
- The function performs validation to ensure:
All specified files exist and are readable
Files contain valid Stan CSV output
Sampling method is compatible (only ‘sample’ method supported)
Configuration is consistent across files
This approach enables efficient processing workflows where sample data is converted to more efficient formats (like NetCDF) without requiring full memory loading of the original CSV files.
- Example:
>>> # Load from glob pattern >>> fit = fit_from_csv_noload('model_output_*.csv') >>> >>> # Load from explicit list >>> fit = fit_from_csv_noload(['chain1.csv', 'chain2.csv']) >>> >>> # Use for conversion without memory loading >>> netcdf_path = cmdstan_csv_to_netcdf(fit, model)