Finn continues to get updated on a regular basis. A best practice is to ensure you are using a specific version of Finn for your production forecast. This can be done through the use of the renv package while using Finn on your local machine, or using docker containers for running Finn in the cloud.
Finn was built to run at scale in Azure, leveraging spark as the parallel back end. Check out the parallel processing vignette to learn how to get Finn running on Azure services like Databricks. The best way to run Finn in production is through the use of Azure Machine Learning, specifically Azure ML Pipelines.
Below are a few tips for leveraging Azure ML Pipelines
forecast_time_series()
so you can have different pipeline steps ran for each step of the Finn forecast process.add_unique_id
to FALSE within set_run_info()
. That will let you call the exact same run info in each separate pipeline step. Make sure that you are using your own unique run_name
within set_run_info()
, one that is different than previous Finn runs but the same name when using set_run_info()
in each pipeline step.prep_data()
and prep_models()
, use the default spark cluster settings. Where each spark task gets sent to a specific core on an executor.train_models()
, ensemble_models()
, and final_models()
consider adjusting the spark cluster settings like “spark.executor.cores” equal to 1 and set inner_parallel
to TRUE within each function. That way only a single task/time series gets sent to each spark executor node, and all cores within that node can be used during the modeling process for that task/time series. Use num_cores
within each function to control how many cores on the executor to use, with the default being all available cores minus one. This can significantly speed up run time, but if you have many time series and want to run all of the models within Finn ensure that you have a spark cluster that can scale to many VM’s.