Reporting for the benchmark
This document details a proposed framework to report values (parameters, metrics) so they can be compared or aggregated during the benchmark analysis.
This reporting is facilitated by a library under /src/common/metrics.py
. This page first introduces the specifications of the reporting for each benchmark script, then documents the common library functions to implement this reporting.
Specifications of reporting
As mentioned in the project definition, we'd like to address three benchmarking scenarios: 1. Training framework comparison (lightgbm versus other ML frameworks) 2. Lightgbm performance and scalability (lightgbm on different compute types) 3. Lightgbm "master" vs lightgbm "custom" (measuring progress of lightgbm versions)
In order to do support those, we propose to report 3 kind of content: - properties: used to segment the analysis, they will be properties of the script (framework, version) or properties of the environment (VM types, dependencies, compilation settings, etc). - parameters: in particular for training, any relevant parameter passed to the script (ex: learning rate). - metrics: measures taken during the script, in particular various execution times or custom validation metrics (ex: RMSE).
For all scripts, we'd like to have a minimal set of typical properties, parameters and metrics that each script will report. See /src/scripts/lightgbm_python/train.py
for an example implementation of all of those.
The following tables details each reporting entry, with their type and description.
Common properties
The purpose of properties is to let us segment the benchmarking analysis. For instance, comparing different frameworks against one another, or compare two lightgbm versions. Some of those properties can be reported by the scripts themselves (ex: python api version), some others will have to be reported by the orchestrator (ex: VM type on which the script is run).
Entry | Type | Description |
---|---|---|
task |
property | the task of the script, picked in ['generate', 'train', 'score'] |
framework |
property | an identifier for the ML algorithm being benchmarked (ex: lightgbm_python , treelite ). |
framework_version |
property | the version of the framework (ex: "3.2.1" ). |
environment |
property | Optional: log relevant dependencies and their version numbers as a dictionary. |
In order to facilitate recording all those, we could add as many system information we could get from python modules like platform
.
To learn how to report properties, see common library below.
Common metrics
The common metrics capture various times that we'll compare accross frameworks. If possible, we'd like the training and inferencing times to be distinct from data loading. If that's not possible, then to not report any data loading time and we'll figure out how to compare those during analysis.
Entry | Type | Description |
---|---|---|
time_data_loading |
metric | time for loading the data before executing the task |
time_data_generation |
metric | time for generating data (for task generate ) |
time_training |
metric | time for training on previously loaded data (for task training ) |
time_inferencing |
metric | time for inferencing on previously loaded data (for task inferencing ) |
To learn how to implement reporting those metrics, see common libary below.
Parameters
There's no common parameters yet. You can report anything as parameters. See how below.
Using common report library
To use the common report library, first import:
from common.metrics import MetricsLogger
Then, a typical logging session works as follows.
1. Open a session
# initializes reporting of metrics with a session name
metrics_logger = MetricsLogger("lightgbm_python.score")
2. Add common properties
Make sure to provide the properties expected per specifications above.
# add the common properties to the session
metrics_logger.set_properties(
task = 'score',
framework = 'lightgbm_python',
framework_version = lightgbm.__version__
)
You can capture all relevant platform/system info by using helper code function:
# will capture platform info and record as properties
metrics_logger.set_platform_properties()
Optionally, you can provide custom properties using json (for instance from CLI arguments), and report those using:
# logger will parse the json
metrics_logger.set_properties_from_json(json_string)
3. Add any parameters
Any keyword arg of log_parameters() is submitted as a parameter in mlflow.
metrics_logger.log_parameters(**lgbm_params)
4. Compute wall-time using with statement
To compute wall time, the MetricsLogger
class provide a helper method you can use within a with
statement:
with metrics_logger.log_time_block("time_training"):
# anything within this code block will count in wall time
booster = lightgbm.train(
lgbm_params,
train_data,
valid_sets = val_data
)
# anything outside of that will not count
This will record a metric "time_training"
measuring the time spent for the execution of this code block (only).