Utils API Reference

Utility functions and classes for the SciStanPy package.

This module provides various utility functions and classes that support the core functionality of SciStanPy, including:

  • Lazy importing mechanisms for performance optimization

  • Mathematical utility functions for numerical stability

  • Array chunking utilities for efficient memory management

  • Context managers for external library integration

  • Optimized statistical computation functions

Users will not typically need to interact with this module directly–it is designed to be used internally by SciStanPy.

Lazy Import System

To speed up initial import times, SciStanPy provides a lazy import system that defers loading of optional dependencies until they are actually needed.

scistanpy.utils.lazy_import(name: str)[source]

Import a module only when it is first needed.

This function implements lazy module importing to improve package import performance by deferring module loading until actual use.

Parameters:

name (str) – The fully qualified module name to import

Returns:

The imported module

Return type:

module

Raises:

ImportError – If the specified module cannot be found

Example:
>>> # Module is not loaded until first use
>>> numpy_module = lazy_import('numpy')
>>> # Now numpy is actually imported
>>> array = numpy_module.array([1, 2, 3])

Note

If the module is already imported, returns the cached version from sys.modules for efficiency.

class scistanpy.utils.LazyObjectProxy(module_name: str, obj_name: str)[source]

Bases: object

A proxy that delays importing a module and accessing an object until first use.

This class provides a lazy loading mechanism for specific objects within modules, allowing fine-grained control over when imports occur. The proxy forwards all method calls and attribute access to the actual object once it’s loaded.

Parameters:
  • module_name (str) – The fully qualified name of the module containing the object

  • obj_name (str) – The name of the object to import from the module

Variables:
  • _module_name – Stored module name for lazy loading

  • _obj_name – Stored object name for lazy loading

  • _cached_obj – Cached reference to the imported object (None until first use)

Example:
>>> # Create a proxy for numpy.array
>>> array_proxy = LazyObjectProxy('numpy', 'array')
>>> # numpy is not imported yet
>>> my_array = array_proxy([1, 2, 3])  # Now numpy is imported
scistanpy.utils.lazy_import_from(module_name: str, obj_name: str)[source]

Create a lazy import proxy for a specific object from a module.

This function provides a convenient way to create lazy import proxies, equivalent to from module_name import obj_name but with deferred loading.

Parameters:
  • module_name (str) – The fully qualified module name to import from

  • obj_name (str) – The name of the object to import from the module

Returns:

A proxy that will import and return the object when first accessed

Return type:

LazyObjectProxy

Example:
>>> # Equivalent to 'from numpy import array' but lazy
>>> array = lazy_import_from('numpy', 'array')
>>> my_array = array([1, 2, 3])  # numpy imported here

Backend Selection

Many SciStanPy operations can be performed using either NumPy or PyTorch as the underlying numerical backend. The utility function below automates the selection of the appropriate backend based on the input data type.

scistanpy.utils.choose_module(dist: torch.Tensor | 'custom_types.SampleType') ModuleType[source]

Choose the appropriate computational module based on input type.

This function provides automatic backend selection between NumPy and PyTorch based on the type of the input data.

Parameters:

dist (Union[torch.Tensor, np.ndarray, custom_types.SampleType]) – Input data whose type determines the module choice

Returns:

The appropriate module (torch for tensors, numpy for arrays)

Return type:

Union[torch, np]

Raises:

TypeError – If the input type is not supported

Example:
>>> import torch
>>> tensor = torch.tensor([1.0, 2.0])
>>> module = choose_module(tensor)  # Returns torch module
>>> result = module.exp(tensor)

Numerical Stability

With probabilistic computations, numerical stability is often a concern. The following utility function provides a numerically stable implementation of the sigmoid function.

scistanpy.utils.stable_sigmoid(
exponent: ndarray[tuple[int, ...], dtype[floating]],
) ndarray[tuple[int, ...], dtype[floating]][source]
scistanpy.utils.stable_sigmoid(exponent: torch.Tensor) torch.Tensor

Compute sigmoid function in a numerically stable way.

This function implements a numerically stable version of the sigmoid function that avoids overflow issues by using different computational approaches for positive and negative inputs.

Parameters:

exponent (Union[torch.Tensor, npt.NDArray[np.floating]]) – Input values for sigmoid computation

Returns:

Sigmoid values with the same type and shape as input

Return type:

Union[torch.Tensor, npt.NDArray[np.floating]]

The function uses the identity:

\[\begin{split}\sigma(x) = \begin{cases} \frac{1}{1 + e^{-x}} & \text{if } x \geq 0 \\ \frac{e^{x}}{1 + e^{x}} & \text{if } x < 0 \end{cases}\end{split}\]

Dask Integration

For particularly large models, sampling via Stan can yield more data than cat fit in memory. To handle such cases, SciStanPy integrates Dask to enable out-of-core computation and parallel processing, particularly with the SampleResults class. The following utility functions assist with Dask integration.

scistanpy.utils.get_chunk_shape(
array_shape: tuple[custom_types.Integer, ...],
array_precision: Literal['double', 'single', 'half'],
mib_per_chunk: custom_types.Integer | None = None,
frozen_dims: Collection[custom_types.Integer] = (),
) tuple[custom_types.Integer, ...][source]

Calculate optimal chunk shape for Dask arrays based on memory constraints.

This function determines the optimal chunking strategy for large arrays processed with Dask, balancing memory usage with computational efficiency. It respects frozen dimensions that should not be chunked.

Parameters:
  • array_shape (tuple[custom_types.Integer, ...]) – Shape of the array to be chunked

  • array_precision (Literal["double", "single", "half"]) – Numerical precision assumed in calculating memory usage.

  • mib_per_chunk (Union[custom_types.Integer, None]) – Target chunk size in MiB. If None, uses Dask default

  • frozen_dims (Collection[custom_types.Integer]) – Dimensions that should not be chunked

Returns:

Optimal chunk shape for the array

Return type:

tuple[custom_types.Integer, …]

Raises:
  • ValueError – If mib_per_chunk is negative

  • IndexError – If frozen_dims contains invalid dimension indices

The algorithm:
  1. Calculates memory usage per array element based on precision

  2. Sets frozen dimensions to their full size

  3. Iteratively determines chunk sizes for remaining dimensions

  4. Ensures total chunk memory stays within the specified limit (or as close to it as possible if frozen dimensions result in a smallest possible size above the limit)

Example:
>>> # Chunk a (1000, 2000, 100) array, keeping last dim intact
>>> shape = get_chunk_shape(
...     (1000, 2000, 100), "double",
...     mib_per_chunk=64, frozen_dims=(2,)
... )
class scistanpy.utils.az_dask(dask_type: str = 'parallelized', output_dtypes: list[object] | None = None)[source]

Bases: object

Context manager for enabling Dask integration with ArviZ.

This context manager provides a convenient way to enable Dask-based parallel computation within ArviZ operations, automatically handling the setup and teardown of Dask configuration.

Parameters:
  • dask_type (str) – Type of Dask computation to enable

  • output_dtypes (Union[list[object], None]) – Expected output data types for Dask operations

Variables:
  • dask_type – Stored Dask computation type

  • output_dtypes – Stored output data types configuration

Example:
>>> with az_dask() as dask_ctx:
...     # ArviZ operations here will use Dask parallelization
...     result = az.summary(trace_data)

Note

The context manager automatically disables Dask when exiting, ensuring clean state management.