Transformed Data API Reference

Transformed data components for Stan model generation.

This module provides specialized transformation classes for components that belong in Stan’s transformed data block. These transformations represent deterministic computations that can be performed once at the beginning of Stan program execution, before any sampling begins.

Transformed data components differ from regular transformations in that they:
  • Execute only once per Stan program run

  • Cannot depend on parameters (only on data)

  • Reduce per-iteration computational overhead in Stan models

The module integrates with SciStanPy’s transformation system while providing the specialized behavior required for Stan’s transformed data block, enabling significant performance improvements for models with expensive deterministic computations.

These optimizations are particularly valuable for:
  • Complex likelihood functions with constant terms

  • Expensive matrix operations on fixed data

  • Normalization constants for custom distributions

  • Any deterministic computation independent of parameters

Transformed data components are never directly accessed by users. Instead, they are used internally by certain Parameter subclasses to optimize model performance. For example, the MultinomialLogTheta class automatically adds a LogMultinomialCoefficient transformed data component to pre-compute the multinomial coefficient when the counts are known data.

Base Class

class scistanpy.model.components.transformations.transformed_data.TransformedData(
*,
shape: tuple['custom_types.Integer', ...] | 'custom_types.Integer' = (),
**model_params: custom_types.CombinableParameterType,
)[source]

Bases: Transformation

Abstract base class for Stan transformed data block components.

This class provides the foundation for model components that generate Stan code for the transformed data block. Components in this block represent deterministic computations that are performed once at the beginning of Stan execution, before any parameter sampling begins.

Transformed data components must satisfy strict requirements:
  • Can only depend on data (not parameters)

  • Must be deterministic (no random components)

  • Execute exactly once per Stan program run

  • Cannot be sampled from or optimized

The class inherits from Transformation but disables sampling and PyTorch optimization capabilities since transformed data components represent fixed computations rather than random variables or learnable parameters.

Subclasses must implement:
  • model_varname property for Stan variable naming

  • write_stan_operation method for generating Stan code

The class provides performance benefits by:
  • Pre-computing expensive deterministic functions

  • Reducing per-iteration computational overhead

  • Enabling Stan compiler optimizations

  • Avoiding redundant calculations during sampling

get_transformed_data_assignment(
index_opts: tuple[str, ...] | None,
assignment_kwargs: dict | None = None,
right_side_kwargs: dict | None = None,
) str

Generate complete transformation assignment code.

Parameters:
  • index_opts (Optional[tuple[str, ...]]) – Index variable names for multi-dimensional operations

  • assignment_kwargs (Optional[dict]) – Keyword arguments for assignment formatting. Defaults to None.

  • right_side_kwargs (Optional[dict]) – Keyword arguments for right-side formatting. Defaults to None.

Returns:

Complete Stan assignment statement

Return type:

str

This method combines the left-hand side variable name with the right-hand side operation to create a complete Stan assignment statement suitable for use in transformed parameters or transformed data blocks.

abstract property model_varname: str

Get or generate the SciStanPy variable name for this component. Only valid within a SciStanPy Model.

Returns:

Variable name for this component

Return type:

str

If a variable name has been explicitly assigned, returns that name. Otherwise, automatically generates a name based on child component relationships using dot notation for hierarchical names.

Examples:
class MyModel(Model):
    def __init__(self):
        super().__init__()

        # Parameter without explicit name has auto-generated name
        # Name is "param2.mu" based on child relationship
        param1 = Parameter(...)  # model_varname is "param2.mu"

        # Explicitly named parameter
        param2 = Parameter(mu = self.param1) # model_varname is "param2"
property torch_parametrization

Raise error for PyTorch parameterization attempts.

Raises:

NotImplementedError – Always, as transformed data has no parameters

Transformed data components are not learnable parameters and therefore cannot have PyTorch parameterizations. They represent fixed computations based on data rather than optimizable variables.

Concrete Implementations

class scistanpy.model.components.transformations.transformed_data.LogMultinomialCoefficient(counts: parameters.MultinomialLogTheta, **kwargs)[source]

Bases: TransformedData

Pre-computed logarithmic multinomial coefficient for performance optimization.

This class implements a performance optimization for multinomial distributions parameterized by log_theta. When the multinomial is used to model observable data (known counts), the multinomial coefficient can be pre-calculated once rather than computed at each MCMC iteration.

Parameters:
  • counts (parameters.MultinomialLogTheta) – Multinomial parameter with log_theta parameterization

  • kwargs – Additional keyword arguments passed to parent class

Variables:

SHAPE_CHECK – Disabled for this component (False)

Mathematical Background:
The multinomial probability mass function includes a coefficient term:
\[C(n; k_1, k_2, ..., k_m) = \frac{n!}{k_1! \times k_2! \times ... \times k_m!}\]

For fixed observed counts, this coefficient is constant across all MCMC iterations and can be pre-computed for efficiency.

Performance Impact:
  • Eliminates factorial computations from each MCMC iteration

  • Reduces computational overhead for multinomial likelihoods

  • Particularly beneficial for large sample sizes or many categories

  • Can provide substantial speedup for multinomial-heavy models

Usage Requirements:
  • The counts parameter must be observable (represent data)

  • Only applicable to MultinomialLogTheta distributions

  • Automatically managed by MultinomialLogTheta components

The coefficient is automatically included in the transformed data block when appropriate and removed if the parameter becomes non-observable.

SHAPE_CHECK: bool = False
property model_varname: str

Get the model variable name for the multinomial coefficient.

Returns:

Descriptive variable name based on the counts parameter

Return type:

str

The variable name follows the pattern: “{counts_variable_name}.log_multinomial_coefficient”

This provides clear identification of the coefficient’s purpose and its relationship to the associated multinomial parameter.

write_stan_operation(counts: str) str[source]

Generate Stan code for computing the log multinomial coefficient.

Parameters:

counts (str) – Stan variable name for the count data

Returns:

Stan function call for log multinomial coefficient

Return type:

str

Raises:

ValueError – If counts parameter is not observable

This method generates the Stan function call to compute the logarithmic multinomial coefficient. The computation is only valid for observable (fixed) count data, as the coefficient must be deterministic.