vignettes/forecast-components.Rmd
forecast-components.Rmd
The easiest way to use the finnts package is through the function forecast_time_series()
, but instead of calling that function you can also call the sub components of the Finn forecast process. This could enable you to break out your time series forecast process into separate steps in a production pipeline, or even give you more control over how you use Finn.
Below is an example workflow of using the sub components of Finn.
Let’s get some example data and then set our Finn run info.
library(finnts)
hist_data <- timetk::m4_monthly %>%
dplyr::filter(
date >= "2013-01-01",
id == "M2"
) %>%
dplyr::rename(Date = date) %>%
dplyr::mutate(id = as.character(id))
run_info <- set_run_info(
experiment_name = "finnts_fcst",
run_name = "finn_sub_component_run"
)
Clean and prepare our data before training models. We can even pull out our prepped data to see the features and transformations applied before models are trained.
prep_data(
run_info = run_info,
input_data = hist_data,
combo_variables = c("id"),
target_variable = "value",
date_type = "month",
forecast_horizon = 6
)
R1_prepped_data_tbl <- get_prepped_data(
run_info = run_info,
recipe = "R1"
)
print(R1_prepped_data_tbl)
#> # A tibble: 36 × 66
#> Date Combo id Target Date_index.num Date_diff Date_year Date_half
#> <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2013-01-01 M2 M2 260 1356998400 0 2013 1
#> 2 2013-02-01 M2 M2 260 1359676800 2678400 2013 1
#> 3 2013-03-01 M2 M2 -100 1362096000 2419200 2013 1
#> 4 2013-04-01 M2 M2 -10 1364774400 2678400 2013 1
#> 5 2013-05-01 M2 M2 50 1367366400 2592000 2013 1
#> 6 2013-06-01 M2 M2 110 1370044800 2678400 2013 1
#> 7 2013-07-01 M2 M2 130 1372636800 2592000 2013 2
#> 8 2013-08-01 M2 M2 -250 1375315200 2678400 2013 2
#> 9 2013-09-01 M2 M2 -260 1377993600 2678400 2013 2
#> 10 2013-10-01 M2 M2 160 1380585600 2592000 2013 2
#> # ℹ 26 more rows
#> # ℹ 58 more variables: Date_quarter <dbl>, Date_month <dbl>,
#> # Date_month.lbl <chr>, Target_lag6 <dbl>, Target_lag9 <dbl>,
#> # Target_lag12 <dbl>, Target_lag6_roll3_Avg <dbl>,
#> # Target_lag9_roll3_Avg <dbl>, Target_lag12_roll3_Avg <dbl>,
#> # Target_lag6_roll6_Avg <dbl>, Target_lag9_roll6_Avg <dbl>,
#> # Target_lag12_roll6_Avg <dbl>, Target_lag6_roll9_Avg <dbl>, …
R2_prepped_data_tbl <- get_prepped_data(
run_info = run_info,
recipe = "R2"
)
print(R2_prepped_data_tbl)
#> # A tibble: 216 × 133
#> Date Combo id Target Date_index.num Date_diff Date_year Date_half
#> <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2013-01-01 M2 M2 260 1356998400 0 2013 1
#> 2 2013-02-01 M2 M2 260 1359676800 2678400 2013 1
#> 3 2013-03-01 M2 M2 -100 1362096000 2419200 2013 1
#> 4 2013-04-01 M2 M2 -10 1364774400 2678400 2013 1
#> 5 2013-05-01 M2 M2 50 1367366400 2592000 2013 1
#> 6 2013-06-01 M2 M2 110 1370044800 2678400 2013 1
#> 7 2013-07-01 M2 M2 130 1372636800 2592000 2013 2
#> 8 2013-08-01 M2 M2 -250 1375315200 2678400 2013 2
#> 9 2013-09-01 M2 M2 -260 1377993600 2678400 2013 2
#> 10 2013-10-01 M2 M2 160 1380585600 2592000 2013 2
#> # ℹ 206 more rows
#> # ℹ 125 more variables: Date_quarter <dbl>, Date_month <dbl>,
#> # Date_month.lbl <chr>, Horizon <dbl>, Origin <dbl>, Target_lag1 <dbl>,
#> # Target_lag2 <dbl>, Target_lag3 <dbl>, Target_lag4 <dbl>, Target_lag5 <dbl>,
#> # Target_lag6 <dbl>, Target_lag9 <dbl>, Target_lag12 <dbl>,
#> # Target_lag1_roll3_Avg <dbl>, Target_lag2_roll3_Avg <dbl>,
#> # Target_lag3_roll3_Avg <dbl>, Target_lag4_roll3_Avg <dbl>, …
Now that our data is prepared for modeling, let’s now train some models. First we need to create the model workflows, determine our back testing process, and how many hyperparameter combinations to try during the validation process.
Then we can kick off training each model on our data.
prep_models(
run_info = run_info,
models_to_run = c("arima", "ets", "glmnet"),
num_hyperparameters = 2
)
train_models(
run_info = run_info,
run_global_models = FALSE
)
After each individual model is trained, we can feed those predictions into ensemble models.
ensemble_models(run_info = run_info)
The last step is to create the final simple model averages and select the best models.
final_models(run_info = run_info)
Finally we can now retrieve the forecast results from this Finn run.
finn_output_tbl <- get_forecast_data(run_info = run_info)
print(finn_output_tbl)
#> # A tibble: 390 × 17
#> Combo id Model_ID Model_Name Model_Type Recipe_ID Run_Type Train_Test_ID
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 M2 M2 arima--lo… NA local simple_a… Future_… 1
#> 2 M2 M2 arima--lo… NA local simple_a… Future_… 1
#> 3 M2 M2 arima--lo… NA local simple_a… Future_… 1
#> 4 M2 M2 arima--lo… NA local simple_a… Future_… 1
#> 5 M2 M2 arima--lo… NA local simple_a… Future_… 1
#> 6 M2 M2 arima--lo… NA local simple_a… Future_… 1
#> 7 M2 M2 arima--lo… NA local simple_a… Back_Te… 2
#> 8 M2 M2 arima--lo… NA local simple_a… Back_Te… 3
#> 9 M2 M2 arima--lo… NA local simple_a… Back_Te… 3
#> 10 M2 M2 arima--lo… NA local simple_a… Back_Te… 4
#> # ℹ 380 more rows
#> # ℹ 9 more variables: Best_Model <chr>, Horizon <dbl>, Date <date>,
#> # Target <dbl>, Forecast <dbl>, lo_95 <dbl>, lo_80 <dbl>, hi_80 <dbl>,
#> # hi_95 <dbl>