sample/sample.py

Usage

This script is a tutorial sample script to explain how all the benchmark scripts are structured and standardized using the RunnableScript helper class.

We've numbered the steps you need to modify and adapt this sample script to your own needs.

Follow each STEP below, and their associated TODO.

`SampleScript`

Bases: RunnableScript

STEP 1 : Package your script as a class.

This class inherits from RunnableScript, that factors duplicate code to achieve usual routines of every script in the lightgbm-benchmark repo.

It has a standard main() function (see below) that you should not need to modify except for edge cases.

See src/common/components.py for details on that class.

TODO: name your class specifically for this script

Source code in src/scripts/sample/sample.py

class SampleScript(RunnableScript):
    """
    STEP 1 : Package your script as a class.

    This class inherits from RunnableScript, that factors
    duplicate code to achieve usual routines
    of every script in the lightgbm-benchmark repo.

    It has a standard main() function (see below)
    that you should not need to modify except for edge cases.

    See `src/common/components.py` for details on that class.

    TODO: name your class specifically for this script
    """
    def __init__(self):
        """
        STEP 3 : Define your benchmark "task" in the constructor.

        This Specific constructor for this SampleScript class. It has no arguments,
        as it will be called from the helper `main()` method.

        In your custom script class, you need to call the super constructor with the parameters below.

        TODO: pick your task (score, train, generate, compile, ...)
        TODO: name your framework and version
        """
        super().__init__(
            task="sample_task", # str
            framework="sample_framework", # str
            framework_version="0.0.1" # str
        )

    @classmethod
    def get_arg_parser(cls, parser=None):
        """
        STEP 4 : Define your arguments

        This method will be called by the main() function
        to add your script custom arguments to argparse,
        on top of standard arguments of the benchmark.

        TODO: align this section with your requirements.

        Args:
            parser (argparse.ArgumentParser): an existing argument parser instance

        Returns:
            ArgumentParser: the argument parser instance
        """
        # IMPORTANT: call this to add generic benchmark arguments
        parser = RunnableScript.get_arg_parser(parser)

        # add arguments that are specific to your script
        # here's a couple examples
        group_i = parser.add_argument_group(f"I/O Arguments [{__name__}:{cls.__name__}]")
        group_i.add_argument(
            "--data",
            required=True,
            type=input_file_path,  # use this helper type for a directory containing a single file
            help="Some input location (directory containing a unique file)",
        )
        group_i.add_argument(
            "--model",
            required=True,
            type=input_file_path,  # use this helper type for a directory containing a single file
            help="Some input location (directory containing a unique file)",
        )
        group_i.add_argument(
            "--output",
            required=True,
            default=None,
            type=str,
            help="Some output location (directory)",
        )

        # make sure to return parser
        return parser


    def run(self, args, logger, metrics_logger, unknown_args):
        """
        STEP 5 : Define your run function.

        This is the core function of your script.
        You are required to override this method with your own implementation.

        Args:
            args (argparse.namespace): command line arguments provided to script
            logger (logging.logger): a logger initialized for this script
            metrics_logger (common.metrics.MetricLogger): to report metrics for this script, already initialized for MLFlow
            unknown_args (list[str]): list of arguments not recognized during argparse
        """
        # make sure the output argument exists
        os.makedirs(args.output, exist_ok=True)

        # and create your own file inside the output
        args.output = os.path.join(args.output, "predictions.txt")

        # CUSTOM CODE STARTS HERE
        # below this line is user code
        logger.info(f"Loading model from {args.model}")
        booster = lightgbm.Booster(model_file=args.model)

        # to log executing time of a code block, use log_time_block()
        logger.info(f"Loading data for inferencing")
        with metrics_logger.log_time_block(metric_name="time_data_loading"):
            inference_data = lightgbm.Dataset(args.data, free_raw_data=False).construct()
            inference_raw_data = inference_data.get_data()

        # optional: add data shape as property
        metrics_logger.set_properties(
            inference_data_length=inference_data.num_data(),
            inference_data_width=inference_data.num_feature(),
        )

        # to log executing time of a code block, use log_time_block()
        logger.info(f"Running .predict()")
        with metrics_logger.log_time_block(metric_name="time_inferencing"):
            booster.predict(data=inference_raw_data)

`init()`

STEP 3 : Define your benchmark "task" in the constructor.

This Specific constructor for this SampleScript class. It has no arguments, as it will be called from the helper main() method.

In your custom script class, you need to call the super constructor with the parameters below.

TODO: pick your task (score, train, generate, compile, ...) TODO: name your framework and version

Source code in src/scripts/sample/sample.py

def __init__(self):
    """
    STEP 3 : Define your benchmark "task" in the constructor.

    This Specific constructor for this SampleScript class. It has no arguments,
    as it will be called from the helper `main()` method.

    In your custom script class, you need to call the super constructor with the parameters below.

    TODO: pick your task (score, train, generate, compile, ...)
    TODO: name your framework and version
    """
    super().__init__(
        task="sample_task", # str
        framework="sample_framework", # str
        framework_version="0.0.1" # str
    )

`get_arg_parser(parser=None)` `classmethod`

STEP 4 : Define your arguments

This method will be called by the main() function to add your script custom arguments to argparse, on top of standard arguments of the benchmark.

TODO: align this section with your requirements.

Parameters:

Name	Type	Description	Default
`parser`	`argparse.ArgumentParser`	an existing argument parser instance	`None`

Returns:

Name	Type	Description
`ArgumentParser`		the argument parser instance

Source code in src/scripts/sample/sample.py

@classmethod
def get_arg_parser(cls, parser=None):
    """
    STEP 4 : Define your arguments

    This method will be called by the main() function
    to add your script custom arguments to argparse,
    on top of standard arguments of the benchmark.

    TODO: align this section with your requirements.

    Args:
        parser (argparse.ArgumentParser): an existing argument parser instance

    Returns:
        ArgumentParser: the argument parser instance
    """
    # IMPORTANT: call this to add generic benchmark arguments
    parser = RunnableScript.get_arg_parser(parser)

    # add arguments that are specific to your script
    # here's a couple examples
    group_i = parser.add_argument_group(f"I/O Arguments [{__name__}:{cls.__name__}]")
    group_i.add_argument(
        "--data",
        required=True,
        type=input_file_path,  # use this helper type for a directory containing a single file
        help="Some input location (directory containing a unique file)",
    )
    group_i.add_argument(
        "--model",
        required=True,
        type=input_file_path,  # use this helper type for a directory containing a single file
        help="Some input location (directory containing a unique file)",
    )
    group_i.add_argument(
        "--output",
        required=True,
        default=None,
        type=str,
        help="Some output location (directory)",
    )

    # make sure to return parser
    return parser

`run(args, logger, metrics_logger, unknown_args)`

STEP 5 : Define your run function.

This is the core function of your script. You are required to override this method with your own implementation.

Parameters:

Name	Type	Description	Default
`args`	`argparse.namespace`	command line arguments provided to script	required
`logger`	`logging.logger`	a logger initialized for this script	required
`metrics_logger`	`common.metrics.MetricLogger`	to report metrics for this script, already initialized for MLFlow	required
`unknown_args`	`list[str]`	list of arguments not recognized during argparse	required

Source code in src/scripts/sample/sample.py

def run(self, args, logger, metrics_logger, unknown_args):
    """
    STEP 5 : Define your run function.

    This is the core function of your script.
    You are required to override this method with your own implementation.

    Args:
        args (argparse.namespace): command line arguments provided to script
        logger (logging.logger): a logger initialized for this script
        metrics_logger (common.metrics.MetricLogger): to report metrics for this script, already initialized for MLFlow
        unknown_args (list[str]): list of arguments not recognized during argparse
    """
    # make sure the output argument exists
    os.makedirs(args.output, exist_ok=True)

    # and create your own file inside the output
    args.output = os.path.join(args.output, "predictions.txt")

    # CUSTOM CODE STARTS HERE
    # below this line is user code
    logger.info(f"Loading model from {args.model}")
    booster = lightgbm.Booster(model_file=args.model)

    # to log executing time of a code block, use log_time_block()
    logger.info(f"Loading data for inferencing")
    with metrics_logger.log_time_block(metric_name="time_data_loading"):
        inference_data = lightgbm.Dataset(args.data, free_raw_data=False).construct()
        inference_raw_data = inference_data.get_data()

    # optional: add data shape as property
    metrics_logger.set_properties(
        inference_data_length=inference_data.num_data(),
        inference_data_width=inference_data.num_feature(),
    )

    # to log executing time of a code block, use log_time_block()
    logger.info(f"Running .predict()")
    with metrics_logger.log_time_block(metric_name="time_inferencing"):
        booster.predict(data=inference_raw_data)

`get_arg_parser(parser=None)`

STEP 2: main function block

The section below (get_arg_parser(), main() and main block) should go unchanged, except for the name of your class.

Those are standard functions we enforce in order to get some unit tests on the module (arguments parsing mainly).

To ensure compatibility with shrike unit tests

TODO: just replace SampleScript to the name of your class

Source code in src/scripts/sample/sample.py

def get_arg_parser(parser=None):
    """
    STEP 2: main function block

    The section below (get_arg_parser(), main() and main block) should go unchanged,
    except for the name of your class.

    Those are standard functions we enforce in order to get some unit tests
    on the module (arguments parsing mainly).

    To ensure compatibility with shrike unit tests

    TODO: just replace SampleScript to the name of your class
    """
    return SampleScript.get_arg_parser(parser)

`main(cli_args=None)`

To ensure compatibility with shrike unit tests

Source code in src/scripts/sample/sample.py

def main(cli_args=None):
    """ To ensure compatibility with shrike unit tests """
    SampleScript.main(cli_args)

sample/sample.py

Usage

SampleScript

__init__()

get_arg_parser(parser=None) classmethod

run(args, logger, metrics_logger, unknown_args)

get_arg_parser(parser=None)

main(cli_args=None)

`SampleScript`

`init()`

`get_arg_parser(parser=None)` `classmethod`

`run(args, logger, metrics_logger, unknown_args)`

`get_arg_parser(parser=None)`

`main(cli_args=None)`