The training pipeline uses monkey-patching to wrap the agent’s _update method, intercepting training updates to extract and log metrics to MLflow. This approach provides comprehensive experiment tracking without modifying the underlying SKRL agent implementation or training code.
The MLflow integration automatically extracts metrics from SKRL agents across several categories:
episode_reward - Reward for the current episodeepisode_reward_mean - Mean reward across recent episodesepisode_length - Length of the current episodeepisode_length_mean - Mean episode length across recent episodescumulative_rewards - Cumulative rewards over timemean_rewards - Mean reward valuessuccess_rate - Success rate for task-specific metricspolicy_loss - Policy network lossvalue_loss - Value network loss (critic loss for some algorithms)critic_loss - Critic network loss (SAC, TD3, DDPG)entropy - Policy entropy for explorationlearning_rate - Current learning rategrad_norm - Gradient norm for monitoring optimizationkl_divergence - KL divergence between old and new policies (PPO)timesteps - Total environment timestepsiterations - Training iteration countfps - Training frames per secondepoch_time - Time per training epochrollout_time - Time spent collecting experiencelearning_time - Time spent in optimizationFor metrics with multiple values (tensors or arrays), the integration extracts statistical aggregates:
metric_name/mean - Mean valuemetric_name/std - Standard deviationmetric_name/min - Minimum valuemetric_name/max - Maximum valueAll entries in agent.tracking_data are automatically extracted, supporting algorithm-specific metrics from PPO, SAC, TD3, DDPG, A2C, and other SKRL implementations.
The integration uses the create_mlflow_logging_wrapper function from skrl_mlflow_agent module to create a closure that wraps the agent’s _update method. The wrapper is applied after the SKRL Runner is instantiated but before training begins.
agent - The SKRL agent instance to extract metrics from (required)mlflow_module - The mlflow module for logging metrics (required)metric_filter - Optional set of metric names to log (default: None)
The MLflow logging interval is controlled via the --mlflow_log_interval CLI argument:
step - Log metrics after every training step (most frequent)balanced - Log metrics every 10 steps (default, recommended)rollout - Log metrics once per rollout cycleTo customize which metrics are logged, modify the create_mlflow_logging_wrapper call in skrl_training.py:
from training.rl.scripts.skrl_mlflow_agent import create_mlflow_logging_wrapper
basic_metrics = {
"episode_reward_mean",
"episode_length_mean",
"policy_loss",
"value_loss",
}
wrapper_func = create_mlflow_logging_wrapper(
agent=runner.agent,
mlflow_module=mlflow,
metric_filter=basic_metrics,
)
runner.agent._update = wrapper_func
optimization_metrics = {
"learning_rate",
"grad_norm",
"kl_divergence",
"policy_loss",
"value_loss",
}
wrapper_func = create_mlflow_logging_wrapper(
agent=runner.agent,
mlflow_module=mlflow,
metric_filter=optimization_metrics,
)
runner.agent._update = wrapper_func
The monkey-patching approach is applied after creating the SKRL Runner:
import mlflow
from skrl.utils.runner.torch import Runner
from training.rl.scripts.skrl_mlflow_agent import create_mlflow_logging_wrapper
mlflow.set_tracking_uri("azureml://...")
mlflow.set_experiment("isaaclab-training")
with mlflow.start_run():
runner = Runner(env, agent_cfg)
wrapper_func = create_mlflow_logging_wrapper(
agent=runner.agent,
mlflow_module=mlflow,
metric_filter=None,
)
runner.agent._update = wrapper_func
runner.run()
Use CLI arguments to control logging frequency:
# Log after every training step
python training/rl/scripts/skrl_training.py --mlflow_log_interval step
# Log every 10 steps (default)
python training/rl/scripts/skrl_training.py --mlflow_log_interval balanced
# Log once per rollout
python training/rl/scripts/skrl_training.py --mlflow_log_interval rollout
# Log every 100 steps
python training/rl/scripts/skrl_training.py --mlflow_log_interval 100
Modify the wrapper creation in skrl_training.py:
production_metrics = {
"episode_reward_mean",
"episode_length_mean",
"success_rate",
}
wrapper_func = create_mlflow_logging_wrapper(
agent=runner.agent,
mlflow_module=mlflow,
metric_filter=production_metrics,
)
runner.agent._update = wrapper_func
The MLflow integration is automatically applied in skrl_training.py when training with Isaac Lab tasks:
python training/rl/scripts/skrl_training.py \
--task Isaac-Cartpole-v0 \
--num_envs 512 \
--headless
The training script handles MLflow setup and monkey-patching automatically. To customize the logging interval, use the --mlflow_log_interval argument. To customize metric filtering, modify the create_mlflow_logging_wrapper call in skrl_training.py.
Training runs complete but no metrics appear in MLflow.
mlflow.set_tracking_uri() is called with the correct Azure ML workspace URI and authentication is valid.create_mlflow_logging_wrapper is called after Runner instantiation and runner.agent._update is replaced before runner.run().tracking_data may not populate until after the first rollout. If using metric_filter, verify the filter set contains matching metric names.Some expected metrics are not logged while others are.
agent.tracking_data for available entries.metric_filter, ensure metric names match exactly (case-sensitive).AttributeError: Agent must have 'tracking_data' attribute
tracking_data. Verify SKRL version compatibility.runner.agent and runner.agent._update exist before replacement.Training slows down due to excessive MLflow API calls.
--mlflow_log_interval 100 or higher.metric_filter to log only essential metrics.Log messages like "Failed to extract or log metrics at step X" indicate transient data structure changes or incompatible metric types. Occasional warnings are harmless. For persistent warnings, check the exception details and modify _extract_from_value() in skrl_mlflow_agent.py for specific metric types.
Possible Causes:
tracking_data structure during trainingSolutions:
_extract_from_value() in skrl_mlflow_agent.py for specific metric typesSymptom: Integration runs but extracts zero metrics.
Possible Causes:
tracking_data is empty
metric_filter with no matching metric namesmax_depth=2 are not extractedmax_depth in _extract_from_tracking_data() if needed🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.