Utils API Reference¶
Utility functions and classes for the SciStanPy package.
This module provides various utility functions and classes that support the core functionality of SciStanPy, including:
Lazy importing mechanisms for performance optimization
Mathematical utility functions for numerical stability
Array chunking utilities for efficient memory management
Context managers for external library integration
Optimized statistical computation functions
Users will not typically need to interact with this module directly–it is designed to be used internally by SciStanPy.
Lazy Import System¶
To speed up initial import times, SciStanPy provides a lazy import system that defers loading of optional dependencies until they are actually needed.
- scistanpy.utils.lazy_import(name: str)[source]¶
Import a module only when it is first needed.
This function implements lazy module importing to improve package import performance by deferring module loading until actual use.
- Parameters:
name (str) – The fully qualified module name to import
- Returns:
The imported module
- Return type:
module
- Raises:
ImportError – If the specified module cannot be found
- Example:
>>> # Module is not loaded until first use >>> numpy_module = lazy_import('numpy') >>> # Now numpy is actually imported >>> array = numpy_module.array([1, 2, 3])
Note
If the module is already imported, returns the cached version from sys.modules for efficiency.
- class scistanpy.utils.LazyObjectProxy(module_name: str, obj_name: str)[source]¶
Bases:
object
A proxy that delays importing a module and accessing an object until first use.
This class provides a lazy loading mechanism for specific objects within modules, allowing fine-grained control over when imports occur. The proxy forwards all method calls and attribute access to the actual object once it’s loaded.
- Parameters:
module_name (str) – The fully qualified name of the module containing the object
obj_name (str) – The name of the object to import from the module
- Variables:
_module_name – Stored module name for lazy loading
_obj_name – Stored object name for lazy loading
_cached_obj – Cached reference to the imported object (None until first use)
- Example:
>>> # Create a proxy for numpy.array >>> array_proxy = LazyObjectProxy('numpy', 'array') >>> # numpy is not imported yet >>> my_array = array_proxy([1, 2, 3]) # Now numpy is imported
- scistanpy.utils.lazy_import_from(module_name: str, obj_name: str)[source]¶
Create a lazy import proxy for a specific object from a module.
This function provides a convenient way to create lazy import proxies, equivalent to
from module_name import obj_name
but with deferred loading.- Parameters:
module_name (str) – The fully qualified module name to import from
obj_name (str) – The name of the object to import from the module
- Returns:
A proxy that will import and return the object when first accessed
- Return type:
- Example:
>>> # Equivalent to 'from numpy import array' but lazy >>> array = lazy_import_from('numpy', 'array') >>> my_array = array([1, 2, 3]) # numpy imported here
Backend Selection¶
Many SciStanPy operations can be performed using either NumPy or PyTorch as the underlying numerical backend. The utility function below automates the selection of the appropriate backend based on the input data type.
- scistanpy.utils.choose_module(dist: torch.Tensor | 'custom_types.SampleType') ModuleType [source]¶
Choose the appropriate computational module based on input type.
This function provides automatic backend selection between NumPy and PyTorch based on the type of the input data.
- Parameters:
dist (Union[torch.Tensor, np.ndarray, custom_types.SampleType]) – Input data whose type determines the module choice
- Returns:
The appropriate module (torch for tensors, numpy for arrays)
- Return type:
Union[torch, np]
- Raises:
TypeError – If the input type is not supported
- Example:
>>> import torch >>> tensor = torch.tensor([1.0, 2.0]) >>> module = choose_module(tensor) # Returns torch module >>> result = module.exp(tensor)
Numerical Stability¶
With probabilistic computations, numerical stability is often a concern. The following utility function provides a numerically stable implementation of the sigmoid function.
- scistanpy.utils.stable_sigmoid(
- exponent: ndarray[tuple[int, ...], dtype[floating]],
- scistanpy.utils.stable_sigmoid(exponent: torch.Tensor) torch.Tensor
Compute sigmoid function in a numerically stable way.
This function implements a numerically stable version of the sigmoid function that avoids overflow issues by using different computational approaches for positive and negative inputs.
- Parameters:
exponent (Union[torch.Tensor, npt.NDArray[np.floating]]) – Input values for sigmoid computation
- Returns:
Sigmoid values with the same type and shape as input
- Return type:
Union[torch.Tensor, npt.NDArray[np.floating]]
The function uses the identity:
\[\begin{split}\sigma(x) = \begin{cases} \frac{1}{1 + e^{-x}} & \text{if } x \geq 0 \\ \frac{e^{x}}{1 + e^{x}} & \text{if } x < 0 \end{cases}\end{split}\]
Dask Integration¶
For particularly large models, sampling via Stan can yield more data than cat fit in memory. To handle such cases, SciStanPy integrates Dask to enable out-of-core computation and parallel processing, particularly with the SampleResults
class. The following utility functions assist with Dask integration.
- scistanpy.utils.get_chunk_shape(
- array_shape: tuple[custom_types.Integer, ...],
- array_precision: Literal['double', 'single', 'half'],
- mib_per_chunk: custom_types.Integer | None = None,
- frozen_dims: Collection[custom_types.Integer] = (),
Calculate optimal chunk shape for Dask arrays based on memory constraints.
This function determines the optimal chunking strategy for large arrays processed with Dask, balancing memory usage with computational efficiency. It respects frozen dimensions that should not be chunked.
- Parameters:
array_shape (tuple[custom_types.Integer, ...]) – Shape of the array to be chunked
array_precision (Literal["double", "single", "half"]) – Numerical precision assumed in calculating memory usage.
mib_per_chunk (Union[custom_types.Integer, None]) – Target chunk size in MiB. If None, uses Dask default
frozen_dims (Collection[custom_types.Integer]) – Dimensions that should not be chunked
- Returns:
Optimal chunk shape for the array
- Return type:
tuple[custom_types.Integer, …]
- Raises:
ValueError – If mib_per_chunk is negative
IndexError – If frozen_dims contains invalid dimension indices
- The algorithm:
Calculates memory usage per array element based on precision
Sets frozen dimensions to their full size
Iteratively determines chunk sizes for remaining dimensions
Ensures total chunk memory stays within the specified limit (or as close to it as possible if frozen dimensions result in a smallest possible size above the limit)
- Example:
>>> # Chunk a (1000, 2000, 100) array, keeping last dim intact >>> shape = get_chunk_shape( ... (1000, 2000, 100), "double", ... mib_per_chunk=64, frozen_dims=(2,) ... )
- class scistanpy.utils.az_dask(dask_type: str = 'parallelized', output_dtypes: list[object] | None = None)[source]¶
Bases:
object
Context manager for enabling Dask integration with ArviZ.
This context manager provides a convenient way to enable Dask-based parallel computation within ArviZ operations, automatically handling the setup and teardown of Dask configuration.
- Parameters:
dask_type (str) – Type of Dask computation to enable
output_dtypes (Union[list[object], None]) – Expected output data types for Dask operations
- Variables:
dask_type – Stored Dask computation type
output_dtypes – Stored output data types configuration
- Example:
>>> with az_dask() as dask_ctx: ... # ArviZ operations here will use Dask parallelization ... result = az.summary(trace_data)
Note
The context manager automatically disables Dask when exiting, ensuring clean state management.