Preps various aspects of run before training models. Things like train/test splits, creating hyperparameters, etc.

prep_models(
  run_info,
  back_test_scenarios = NULL,
  back_test_spacing = NULL,
  models_to_run = NULL,
  models_not_to_run = NULL,
  run_ensemble_models = TRUE,
  pca = NULL,
  num_hyperparameters = 10,
  seed = 123
)

Arguments

run_info

run info using the set_run_info() function.

back_test_scenarios

Number of specific back test folds to run when determining the best model. Default of NULL will automatically choose the number of back tests to run based on historical data size, which tries to always use a minimum of 80% of the data when training a model.

back_test_spacing

Number of periods to move back for each back test scenario. Default of NULL moves back 1 period at a time for year, quarter, and month data. Moves back 4 for week and 7 for day data.

models_to_run

List of models to run. Default of NULL runs all models.

models_not_to_run

List of models not to run, overrides values in models_to_run. Default of NULL doesn't turn off any model.

run_ensemble_models

If TRUE, prep for ensemble models.

pca

If TRUE, run principle component analysis on any lagged features to speed up model run time. Default of NULL runs PCA on day and week date types across all local multivariate models, and also for global models across all date types.

num_hyperparameters

number of hyperparameter combinations to test out on validation data for model tuning.

seed

Set seed for random number generator. Numeric value.

Value

Writes outputs related to model prep to disk.

Examples

# \donttest{
data_tbl <- timetk::m4_monthly %>%
  dplyr::rename(Date = date) %>%
  dplyr::mutate(id = as.character(id)) %>%
  dplyr::filter(
    Date >= "2012-01-01",
    Date <= "2015-06-01"
  )

run_info <- set_run_info()
#> Finn Submission Info
#>  Experiment Name: finn_fcst
#>  Run Name: finn_fcst-20241029T144922Z
#> 

prep_data(run_info,
  input_data = data_tbl,
  combo_variables = c("id"),
  target_variable = "value",
  date_type = "month",
  forecast_horizon = 3
)
#>  Prepping Data
#>  Prepping Data [3.6s]
#> 

prep_models(run_info,
  models_to_run = c("arima", "ets", "glmnet")
)
#>  Creating Model Workflows
#>  Creating Model Workflows [244ms]
#> 
#>  Creating Model Hyperparameters
#>  Creating Model Hyperparameters [276ms]
#> 
#>  Creating Train Test Splits
#>  Creating Train Test Splits [412ms]
#> 
# }