Transformed Data API Reference¶
Transformed data components for Stan model generation.
This module provides specialized transformation classes for components that
belong in Stan’s transformed data
block. These transformations represent
deterministic computations that can be performed once at the beginning of
Stan program execution, before any sampling begins.
- Transformed data components differ from regular transformations in that they:
Execute only once per Stan program run
Cannot depend on parameters (only on data)
Reduce per-iteration computational overhead in Stan models
The module integrates with SciStanPy’s transformation system while providing
the specialized behavior required for Stan’s transformed data
block, enabling
significant performance improvements for models with expensive deterministic
computations.
- These optimizations are particularly valuable for:
Complex likelihood functions with constant terms
Expensive matrix operations on fixed data
Normalization constants for custom distributions
Any deterministic computation independent of parameters
Transformed data components are never directly accessed by users. Instead, they
are used internally by certain Parameter
subclasses to optimize model performance. For example, the MultinomialLogTheta
class automatically adds a
LogMultinomialCoefficient
transformed data component to pre-compute
the multinomial coefficient when the counts are known data.
Base Class¶
- class scistanpy.model.components.transformations.transformed_data.TransformedData(
- *,
- shape: tuple['custom_types.Integer', ...] | 'custom_types.Integer' = (),
- **model_params: custom_types.CombinableParameterType,
Bases:
Transformation
Abstract base class for Stan transformed data block components.
This class provides the foundation for model components that generate Stan code for the transformed data block. Components in this block represent deterministic computations that are performed once at the beginning of Stan execution, before any parameter sampling begins.
- Transformed data components must satisfy strict requirements:
Can only depend on data (not parameters)
Must be deterministic (no random components)
Execute exactly once per Stan program run
Cannot be sampled from or optimized
The class inherits from Transformation but disables sampling and PyTorch optimization capabilities since transformed data components represent fixed computations rather than random variables or learnable parameters.
- Subclasses must implement:
model_varname property for Stan variable naming
write_stan_operation method for generating Stan code
- The class provides performance benefits by:
Pre-computing expensive deterministic functions
Reducing per-iteration computational overhead
Enabling Stan compiler optimizations
Avoiding redundant calculations during sampling
- get_transformed_data_assignment(
- index_opts: tuple[str, ...] | None,
- assignment_kwargs: dict | None = None,
- right_side_kwargs: dict | None = None,
Generate complete transformation assignment code.
- Parameters:
index_opts (Optional[tuple[str, ...]]) – Index variable names for multi-dimensional operations
assignment_kwargs (Optional[dict]) – Keyword arguments for assignment formatting. Defaults to None.
right_side_kwargs (Optional[dict]) – Keyword arguments for right-side formatting. Defaults to None.
- Returns:
Complete Stan assignment statement
- Return type:
str
This method combines the left-hand side variable name with the right-hand side operation to create a complete Stan assignment statement suitable for use in transformed parameters or transformed data blocks.
- abstract property model_varname: str¶
Get or generate the SciStanPy variable name for this component. Only valid within a SciStanPy Model.
- Returns:
Variable name for this component
- Return type:
str
If a variable name has been explicitly assigned, returns that name. Otherwise, automatically generates a name based on child component relationships using dot notation for hierarchical names.
- Examples:
class MyModel(Model): def __init__(self): super().__init__() # Parameter without explicit name has auto-generated name # Name is "param2.mu" based on child relationship param1 = Parameter(...) # model_varname is "param2.mu" # Explicitly named parameter param2 = Parameter(mu = self.param1) # model_varname is "param2"
- property torch_parametrization¶
Raise error for PyTorch parameterization attempts.
- Raises:
NotImplementedError – Always, as transformed data has no parameters
Transformed data components are not learnable parameters and therefore cannot have PyTorch parameterizations. They represent fixed computations based on data rather than optimizable variables.
Concrete Implementations¶
- class scistanpy.model.components.transformations.transformed_data.LogMultinomialCoefficient(counts: parameters.MultinomialLogTheta, **kwargs)[source]¶
Bases:
TransformedData
Pre-computed logarithmic multinomial coefficient for performance optimization.
This class implements a performance optimization for multinomial distributions parameterized by log_theta. When the multinomial is used to model observable data (known counts), the multinomial coefficient can be pre-calculated once rather than computed at each MCMC iteration.
- Parameters:
counts (parameters.MultinomialLogTheta) – Multinomial parameter with log_theta parameterization
kwargs – Additional keyword arguments passed to parent class
- Variables:
SHAPE_CHECK – Disabled for this component (False)
- Mathematical Background:
- The multinomial probability mass function includes a coefficient term:
- \[C(n; k_1, k_2, ..., k_m) = \frac{n!}{k_1! \times k_2! \times ... \times k_m!}\]
For fixed observed counts, this coefficient is constant across all MCMC iterations and can be pre-computed for efficiency.
- Performance Impact:
Eliminates factorial computations from each MCMC iteration
Reduces computational overhead for multinomial likelihoods
Particularly beneficial for large sample sizes or many categories
Can provide substantial speedup for multinomial-heavy models
- Usage Requirements:
The counts parameter must be observable (represent data)
Only applicable to MultinomialLogTheta distributions
Automatically managed by MultinomialLogTheta components
The coefficient is automatically included in the transformed data block when appropriate and removed if the parameter becomes non-observable.
- SHAPE_CHECK: bool = False¶
- property model_varname: str¶
Get the model variable name for the multinomial coefficient.
- Returns:
Descriptive variable name based on the counts parameter
- Return type:
str
The variable name follows the pattern: “{counts_variable_name}.log_multinomial_coefficient”
This provides clear identification of the coefficient’s purpose and its relationship to the associated multinomial parameter.
- write_stan_operation(counts: str) str [source]¶
Generate Stan code for computing the log multinomial coefficient.
- Parameters:
counts (str) – Stan variable name for the count data
- Returns:
Stan function call for log multinomial coefficient
- Return type:
str
- Raises:
ValueError – If counts parameter is not observable
This method generates the Stan function call to compute the logarithmic multinomial coefficient. The computation is only valid for observable (fixed) count data, as the coefficient must be deterministic.