The “Best-Model” you see in both the back testing and future forecast outputs are chosen based on what had the best accuracy over the back testing process. After all individual, ensemble, and average model forecast are created for both back testing and the future forecast, a weighted MAPE calculation is applied to each unique data combo and model combination.

A standard MAPE calculation is produced first, then instead of a simple average to get the final MAPE a weighted MAPE is taken based on the size of the target variable value. Please see below for an example of the process.

#> Simple Back Test Results
#> # A tibble: 10 × 8
#>    Combo     Date       Model  FCST Target   MAPE Target_Total Percent_Total
#>    <chr>     <date>     <chr> <dbl>  <dbl>  <dbl>        <dbl>         <dbl>
#>  1 Country_1 2020-01-01 arima     9     10 0.1             150        0.0667
#>  2 Country_1 2020-02-01 arima    23     20 0.15            150        0.133 
#>  3 Country_1 2020-03-01 arima    35     30 0.167           150        0.2   
#>  4 Country_1 2020-04-01 arima    41     40 0.025           150        0.267 
#>  5 Country_1 2020-05-01 arima    48     50 0.04            150        0.333 
#>  6 Country_1 2020-01-01 ets       7     10 0.3             150        0.0667
#>  7 Country_1 2020-02-01 ets      22     20 0.1             150        0.133 
#>  8 Country_1 2020-03-01 ets      29     30 0.0333          150        0.2   
#>  9 Country_1 2020-04-01 ets      42     40 0.05            150        0.267 
#> 10 Country_1 2020-05-01 ets      53     50 0.06            150        0.333
#> 
#> Overall Model Accuracy by Combo
#> # A tibble: 2 × 4
#>   Combo     Model   MAPE Weighted_MAPE
#>   <chr>     <chr>  <dbl>         <dbl>
#> 1 Country_1 arima 0.0963        0.08  
#> 2 Country_1 ets   0.109         0.0733

During the simple back test process above, arima seems to be the better model from a pure MAPE perspective, but ETS ends up being the winner when using weighted MAPE. The benefits of weighted MAPE allow finnts to find the optimal model that performs the best on the biggest components of a forecast, which comes with the added benefit of putting more weight on more recent observations since those are more likely to have larger target values then ones further into the past. Another way of putting more weight on more recent observations is how Finn overlaps its back testing scenarios. This means the most recent observations are tested for accuracy in different forecast horizons (H=1, H=2, etc). More info on this in the back testing vignette.

User of Finn can also take the Finn outputs, create their own accuracy metrics, and choose their own best models since all model results are written to disk.