fit_partition.Rd
Fit partition on some data, optionally finding best lambda using CV and then re-fiting on full data.
fit_partition( y, X, d = NULL, X_aux = NULL, d_aux = NULL, max_splits = Inf, max_cells = Inf, min_size = 3, cv_folds = 2, verbosity = 0, breaks_per_dim = NULL, potential_lambdas = NULL, X_range = NULL, bucket_min_n = NA, bucket_min_d_var = FALSE, obj_fn, est_plan, partition_i = NA, pr_cl = NULL, bump_samples = 0, bump_ratio = 1, ... )
y | Nx1 matrix of outcome (label/target) data. With multiple core estimates see Details below. |
---|---|
X | NxK matrix of features (covariates). With multiple core estimates see Details below. |
d | (Optional) NxP matrix (with colnames) of treatment data. If all equally important they should be normalized to have the same variance. With multiple core estimates see Details below. |
X_aux | aux X sample to compute statistics on (OOS data) |
d_aux | aux d sample to compute statistics on (OOS data) |
max_splits | Maximum number of splits even if splits continue to improve OOS fit |
max_cells | Maximum number of cells even if more splits continue to improve OOS fit |
min_size | Minimum cell size when building full grid, cv_tr will use (F-1)/F*min_size, cv_te doesn't use any. |
cv_folds | Number of CV Folds or a vector of foldids. If m_mode==DS.MULTI_SAMPLE, then a list with foldids per Dataset. |
verbosity | 0 print no message. 1 prints progress bar for high-level loops. 2 prints detailed output for high-level loops. Nested operations decrease verbosity by 1. |
breaks_per_dim | NULL (for all possible breaks); K-length vector with # of break (chosen by quantiles); or K-dim list of vectors giving potential split points for non-categorical variables (can put c(0) for categorical). Similar to 'discrete splitting' in CausalTree though their they do separate split-points for treated and controls. |
potential_lambdas | potential lambdas to search through in CV |
X_range | list of min/max for each dimension (e.g., from |
bucket_min_n | Minimum number of observations needed between different split checks |
bucket_min_d_var | Ensure positive variance of d for the observations between different split checks |
obj_fn | Default is |
est_plan | |
partition_i | Default NA. Use this to avoid CV |
pr_cl | Default NULL. Parallel cluster. Used for:
|
bump_samples | Number of bump bootstraps (default 0), or list of such length where each items is a bootstrap sample. If m_mode==DS.MULTI_SAMPLE then each item is a sublist with such bootstrap samples over each dataset. |
bump_ratio | For bootstraps the ratio of sample size to sample (between 0 and 1, default 1) |
... | Additional params. |
An object.
Grid Partition (type=grid_partition
)
Full sequence of in-sample objective function values
Full sequence of partition complexities (num_cells - 1)
Index of partition chosen
Full sequence of Grid Partitions
Full sequence of splits (type=partition_split
)
lambda chosen
List of the held-out observations for each fold (e.g., we might have generated them)
Returns the partition and information about the fitting process
With multiple core estimates (M) there are 3 options (the first two have the same sample across treatment effects).
DS.MULTI_SAMPLE: Multiple pairs of (Y_m,W_m). y,X,d are then lists of length M. Each element then has the typical size The N_m may differ across m. The number of columns of X will be the same across m.
DS.MULTI_D: Multiple treatments and a single outcome. d is then a NxM matrix.
DS.MULTI_Y: A single treatment and multiple outcomes. y is then a NXM matrix.