The finnts package, commonly referred to as “Finn”, is a standardized times series forecast framework developed by Microsoft Finance. It’s a result of years of effort trying to perfect a centralized forecasting practice that everyone in finance could leverage. Even though it was built for finance like forecasts, it can easily be extended to any type of time series forecast.
Finn takes years of hard work and thousands of lines of code, and simplifies the forecasting process down to one line of code. A single function, “forecast_time_series”, takes in historical data and applies dozens of models to produce a state of the art forecast. While simplifying the forecasting process down to a single function call might seem limiting, Finn actually allows for a lot of flexibility under the hood. In order to leverage the best components of Finn, please check out all of the other vignettes within the package.
library(finnts)
browseVignettes("finnts")
Getting started with Finn is as simple as 1..2..3
Data used in Finn needs to follow a few requirements, called out below.
A good example to use when producing your first Finn forecast is to leverage existing data examples from the timetk package. Let’s take a monthly example and trim it down to speed up the run time of your first Finn forecast.
library(finnts)
hist_data <- timetk::m4_monthly %>%
dplyr::filter(date >= "2013-01-01") %>%
dplyr::rename(Date = date) %>%
dplyr::mutate(id = as.character(id))
print(hist_data)
#> # A tibble: 120 × 3
#> id Date value
#> <chr> <date> <dbl>
#> 1 M1 2013-01-01 9120
#> 2 M1 2013-02-01 8280
#> 3 M1 2013-03-01 7860
#> 4 M1 2013-04-01 7150
#> 5 M1 2013-05-01 8110
#> 6 M1 2013-06-01 10860
#> 7 M1 2013-07-01 10730
#> 8 M1 2013-08-01 9610
#> 9 M1 2013-09-01 8270
#> 10 M1 2013-10-01 9200
#> # ℹ 110 more rows
print(unique(hist_data$id))
#> [1] "M1" "M2" "M750" "M1000"
The above data set contains 4 individual time series, identified using the “id” column.
Before we call the Finn forecast function. Let’s first set up some run information using set_run_info()
, this helps log all components of our Finn forecast successfully.
run_info <- set_run_info(
experiment_name = "finn_forecast",
run_name = "test_run"
)
Calling the “forecast_time_series” function is the easiest part. In this example we will be running just two models.
# no need to assign it to a variable, since all of the outputs are written to disk :)
forecast_time_series(
run_info = run_info,
input_data = hist_data,
combo_variables = c("id"),
target_variable = "value",
date_type = "month",
forecast_horizon = 3,
back_test_scenarios = 6,
models_to_run = c("arima", "ets"),
return_data = FALSE
)
finn_output_tbl <- get_forecast_data(run_info = run_info)
print(finn_output_tbl)
best_model_tbl <- finn_output_tbl %>%
dplyr::filter(Best_Model == "Yes") %>%
dplyr::select(Combo, Model_ID, Model_Name, Model_Type, Recipe_ID) %>%
dplyr::distinct()
print(best_model_tbl)
Note: the best model for the “M1” combination is a simple average of “arima” and “ets” models.
trained_model_tbl <- get_trained_models(run_info = run_info)
print(trained_model_tbl)
R1_prepped_data_tbl <- get_prepped_data(
run_info = run_info,
recipe = "R1"
)
print(R1_prepped_data_tbl)
R2_prepped_data_tbl <- get_prepped_data(
run_info = run_info,
recipe = "R2"
)
print(R2_prepped_data_tbl)
run_info_tbl <- get_run_info(experiment_name = "finn_forecast")
print(run_info_tbl)