Skip to content

Utility References

ID

agentlightning.utils.id.generate_id(length)

Generate a random ID of the given length.

Parameters:

  • length (int) –

    The length of the ID to generate.

Returns:

  • str

    A random ID of the given length.

Metrics

agentlightning.utils.metrics.MetricsBackend

Abstract base class for metrics backends.

has_prometheus()

Check if the backend has prometheus support.

inc_counter(name, amount=1.0, labels=None) async

Increments a registered counter.

Parameters:

  • name (str) –

    Metric name (must be registered as a counter).

  • amount (float, default: 1.0 ) –

    Increment amount.

  • labels (Optional[LabelDict], default: None ) –

    Label values.

Raises:

  • ValueError

    If the metric is not registered, has the wrong type, or label keys do not match the registered label names.

observe_histogram(name, value, labels=None) async

Records an observation for a registered histogram.

Parameters:

  • name (str) –

    Metric name (must be registered as a histogram).

  • value (float) –

    Observed value.

  • labels (Optional[LabelDict], default: None ) –

    Label values.

Raises:

  • ValueError

    If the metric is not registered, has the wrong type, or label keys do not match the registered label names.

register_counter(name, label_names=None, group_level=None)

Registers a counter metric.

Parameters:

  • name (str) –

    Metric name.

  • label_names (Optional[Sequence[str]], default: None ) –

    List of label names. Order determines the truncation priority for group-level logging.

  • group_level (Optional[int], default: None ) –

    Optional per-metric grouping depth for backends that support label grouping (Console). Global backend settings take precedence when provided.

Raises:

  • ValueError

    If the metric is already registered with a different type or label set.

register_histogram(name, label_names=None, buckets=None, group_level=None)

Registers a histogram metric.

Parameters:

  • name (str) –

    Metric name.

  • label_names (Optional[Sequence[str]], default: None ) –

    List of label names. Order determines the truncation priority for group-level logging.

  • buckets (Optional[Sequence[float]], default: None ) –

    Bucket boundaries (exclusive upper bounds). If None, the backend may choose defaults.

  • group_level (Optional[int], default: None ) –

    Optional per-metric grouping depth for backends that support label grouping (Console). Global backend settings take precedence when provided.

Raises:

  • ValueError

    If the metric is already registered with a different type or label set.

agentlightning.utils.metrics.ConsoleMetricsBackend

Bases: MetricsBackend

Console backend with sliding-window aggregations and label grouping.

This backend:

  • Requires explicit metric registration.
  • Stores timestamped events per (metric_name, labels) key.
  • Computes rate and percentiles (P50, P95, P99) over a sliding time window.
  • Uses a single global logging decision: when logging is triggered, it logs all metric groups, not just the one being updated.

Rate is always per second.

Label grouping: When logging, label dictionaries are truncated to the first group_level label pairs (following the registered label order) and metrics with identical truncated labels are aggregated together. For example:

labels = {"method": "GET", "path": "/", "status": "200"}
group_level = 2  # aggregated labels {"method": "GET", "path": "/"}

If group_level is None or < 1, all label combinations for a metric are merged into a single log entry (equivalent to grouping by zero labels). Individual counters or histograms can set their own group_level during registration; those values apply only when the backend-level group_level is unset, allowing selective overrides.

Thread-safety: Runtime updates and snapshotting use two aiologic locks: one for mutating shared state and another that serializes the global logging decision/snapshot capture so other tasks can continue writing. Metric registration happens during initialization, so it is intentionally left lock-free; this assumption is documented here to avoid blocking writes unnecessarily.

__init__(window_seconds=60.0, log_interval_seconds=10.0, group_level=None)

Initializes ConsoleMetricsBackend.

Parameters:

  • window_seconds (Optional[float], default: 60.0 ) –

    Sliding window size (in seconds) used when computing rate and percentiles. If None, all in-memory events are used.

  • log_interval_seconds (float, default: 10.0 ) –

    Minimum time (in seconds) between log bursts. When the interval elapses, the next metric event triggers a snapshot and logging of all metrics.

  • group_level (Optional[int], default: None ) –

    Label grouping depth. When logging, only the first group_level labels (following registered order) are retained and metric events sharing those labels are aggregated. If None or < 1, all label combinations collapse into a single group per metric.

inc_counter(name, amount=1.0, labels=None) async

Increments a registered counter metric.

See base class for behavior and error conditions.

observe_histogram(name, value, labels=None) async

Records an observation for a registered histogram metric.

See base class for behavior and error conditions.

register_counter(name, label_names=None, group_level=None)

Registers a counter metric.

See base class for argument documentation.

register_histogram(name, label_names=None, buckets=None, group_level=None)

Registers a histogram metric.

See base class for argument documentation.

agentlightning.utils.metrics.PrometheusMetricsBackend

Bases: MetricsBackend

Metrics backend that forwards events to prometheus_client.

All metrics must be registered before use. This backend does not compute any aggregations; it only updates Prometheus metrics.

Thread-safety: Registration is protected by a lock. Metric updates assume metrics are registered during initialization and then remain stable.

Due to the nature of Prometheus, this backend is only suitable for recording high-volume metrics. Low-volume metrics might be lost if the event has only appeared once.

__init__()

Initializes PrometheusMetricsBackend.

Raises:

  • ImportError

    If prometheus_client is not installed.

has_prometheus()

Check if the backend has prometheus support.

inc_counter(name, amount=1.0, labels=None) async

Increments a registered Prometheus counter.

observe_histogram(name, value, labels=None) async

Records an observation for a registered Prometheus histogram.

register_counter(name, label_names=None, group_level=None)

Registers a Prometheus counter metric.

register_histogram(name, label_names=None, buckets=None, group_level=None)

Registers a Prometheus histogram metric.

agentlightning.utils.metrics.MultiMetricsBackend

Bases: MetricsBackend

Metrics backend that forwards calls to multiple underlying backends.

__init__(backends)

Initializes MultiMetricsBackend.

Parameters:

  • backends (Sequence[MetricsBackend]) –

    Sequence of underlying backends.

Raises:

  • ValueError

    If no backends are provided.

has_prometheus()

Check if the backend has prometheus support.

inc_counter(name, amount=1.0, labels=None) async

Increments a counter metric in all underlying backends.

observe_histogram(name, value, labels=None) async

Records a histogram observation in all underlying backends.

register_counter(name, label_names=None, group_level=None)

Registers a counter metric in all underlying backends.

register_histogram(name, label_names=None, buckets=None, group_level=None)

Registers a histogram metric in all underlying backends.

agentlightning.utils.metrics.setup_multiprocess_prometheus()

Set up prometheus multiprocessing directory if not already configured.

agentlightning.utils.metrics.get_prometheus_registry()

Get the appropriate prometheus registry based on multiprocessing configuration.

agentlightning.utils.metrics.shutdown_metrics(server=None, worker=None, *args, **kwargs)

Shutdown prometheus metrics.

Server Launcher

agentlightning.utils.server_launcher.PythonServerLauncher

Unified launcher for FastAPI, using uvicorn or gunicorn per mode/worker count.

See PythonServerLauncherArgs for configuration options.

Parameters:

  • app (FastAPI) –

    The FastAPI app to launch.

  • args (PythonServerLauncherArgs) –

    The configuration for the server.

  • serve_context (Optional[AsyncContextManager[Any]], default: None ) –

    An optional context manager to apply around the server startup.

access_endpoint property

Return a loopback-friendly URL so health checks succeed even when binding to 0.0.0.0.

endpoint property

Return the externally advertised host:port pair regardless of accessibility.

health_url property

Build the absolute health-check endpoint from args, if one is configured.

__getstate__()

Control pickling to prevent server state from being sent to subprocesses.

__init__(app, args, serve_context=None)

Initialize the launcher with the FastAPI app, configuration, and optional serve context.

is_running()

Return True if the server has been started and not yet stopped.

reload() async

Restart the server by stopping it if necessary and invoking start again.

run_forever() async

Start the server and block the caller until it exits, respecting the configured mode.

start() async

Starts the server according to launch_mode and n_workers.

stop() async

Stop the server using the inverse of whatever launch mode was used to start it.

agentlightning.utils.server_launcher.PythonServerLauncherArgs dataclass

access_host = None class-attribute instance-attribute

The hostname or IP address to advertise to the client. If not provided, the server will use the default outbound IPv4 address for this machine.

access_log = False class-attribute instance-attribute

Whether to turn on access logs.

healthcheck_url = None class-attribute instance-attribute

The health check URL to use. If not provided, the server will not be checked for healthiness after starting.

host = None class-attribute instance-attribute

The hostname or IP address to bind the server to.

kill_unhealthy_server = True class-attribute instance-attribute

Whether to kill the server if it is not healthy after startup. This setting is ignored when launch_mode is not asyncio.

launch_mode = 'asyncio' class-attribute instance-attribute

The launch mode. asyncio is the default mode to runs the server in the current thread. thread runs the server in a separate thread. mp runs the server in a separate process.

log_level = logging.INFO class-attribute instance-attribute

The log level to use.

n_workers = 1 class-attribute instance-attribute

The number of workers to run in the server. Only applicable for mp mode. When n_workers > 1, the server will be run using Gunicorn.

port = None class-attribute instance-attribute

The TCP port to listen on. If not provided, the server will use a random available port.

process_join_timeout = 10.0 class-attribute instance-attribute

The timeout to wait for the process to join.

startup_timeout = 60.0 class-attribute instance-attribute

The timeout to wait for the server to start up.

thread_join_timeout = 10.0 class-attribute instance-attribute

The timeout to wait for the thread to join.

timeout_keep_alive = 30 class-attribute instance-attribute

The timeout to keep the connection alive.

agentlightning.utils.server_launcher.LaunchMode = Literal['asyncio', 'thread', 'mp'] module-attribute

The launch mode for the server.

OpenTelemetry

agentlightning.utils.otel.full_qualified_name(obj)

agentlightning.utils.otel.get_tracer_provider(inspect=True)

Get the OpenTelemetry tracer provider configured for Agent Lightning.

Parameters:

  • inspect (bool, default: True ) –

    Whether to inspect the tracer provider and log its configuration. When it's on, make sure you also set the logger level to DEBUG to see the logs.

agentlightning.utils.otel.get_tracer(use_active_span_processor=True)

Resolve the OpenTelemetry tracer configured for Agent Lightning.

Parameters:

  • use_active_span_processor (bool, default: True ) –

    Whether to use the active span processor.

Returns:

  • Tracer

    OpenTelemetry tracer tagged with the agentlightning instrumentation name.

Raises:

  • RuntimeError

    If OpenTelemetry was not initialized before calling this helper.

agentlightning.utils.otel.make_tag_attributes(tags)

Convert a list of tags into flattened attributes for span tagging.

There is no syntax enforced for tags, they are just strings. For example:

["gen_ai.model:gpt-4", "reward.extrinsic"]

agentlightning.utils.otel.extract_tags_from_attributes(attributes)

Extract tag attributes from flattened span attributes.

Parameters:

  • attributes (Dict[str, Any]) –

    A dictionary of flattened span attributes.

Convert a dictionary of links into flattened attributes for span linking.

Links example:

{
    "gen_ai.response.id": "response-123",
    "span_id": "abcd-efgh-ijkl",
}

agentlightning.utils.otel.query_linked_spans(spans, links)

Query spans that are linked by the given link attributes.

Parameters:

  • spans (Sequence[T_SpanLike]) –

    A sequence of spans to search.

  • links (List[LinkPydanticModel]) –

    A list of link attributes to match.

Returns:

  • List[T_SpanLike]

    A list of spans that match the given link attributes.

Extract link attributes from flattened span attributes.

Parameters:

  • attributes (Dict[str, Any]) –

    A dictionary of flattened span attributes.

agentlightning.utils.otel.filter_attributes(attributes, prefix)

Filter attributes that start with the given prefix.

The attribute must start with prefix. or be exactly prefix to be included.

Parameters:

  • attributes (Dict[str, Any]) –

    A dictionary of span attributes.

  • prefix (str) –

    The prefix to filter by.

Returns:

  • Dict[str, Any]

    A dictionary of attributes that start with the given prefix.

agentlightning.utils.otel.filter_and_unflatten_attributes(attributes, prefix)

Filter attributes that start with the given prefix and unflatten them. The prefix will be removed during unflattening.

Parameters:

  • attributes (Dict[str, Any]) –

    A dictionary of span attributes.

  • prefix (str) –

    The prefix to filter by.

Returns:

  • Union[Dict[str, Any], List[Any]]

    A nested dictionary or list of attributes that start with the given prefix.

agentlightning.utils.otel.flatten_attributes(nested_data, *, expand_leaf_lists=False)

Flatten a nested dictionary or list into a flat dictionary with dotted keys.

This function recursively traverses dictionaries and lists, producing a flat key-value mapping where nested paths are represented via dot-separated keys. Lists are indexed numerically.

Example:

>>> flatten_attributes({"a": {"b": 1, "c": [2, 3]}}, expand_leaf_lists=True)
{"a.b": 1, "a.c.0": 2, "a.c.1": 3}

Parameters:

  • nested_data (Union[Dict[str, Any], List[Any]]) –

    A nested structure composed of dictionaries, lists, or primitive values.

  • expand_leaf_lists (bool, default: False ) –

    Whether to expand lists composed only of primitive values. When False (the default), lists of str/int/float/bool are treated as leaf values and stored without enumerating their indices.

Returns:

  • Dict[str, Any]

    A flat dictionary mapping dotted-string paths to primitive values.

agentlightning.utils.otel.unflatten_attributes(flat_data)

Reconstruct a nested dictionary/list structure from a flat dictionary.

Keys are dot-separated paths. Segments that are digit strings will only become list indices if all keys in that dict form a consecutive 0..n-1 range. Otherwise they remain dict keys.

Example:

>>> unflatten_attributes({"a.b": 1, "a.c.0": 2, "a.c.1": 3})
{"a": {"b": 1, "c": [2, 3]}}

Parameters:

  • flat_data (Dict[str, Any]) –

    A dictionary whose keys are dot-separated paths and whose values are primitive data elements.

Returns:

  • Union[Dict[str, Any], List[Any]]

    A nested dictionary (and lists where appropriate) corresponding to

  • Union[Dict[str, Any], List[Any]]

    the flattened structure.

agentlightning.utils.otel.sanitize_attribute_value(object, force=True)

Sanitize an attribute value to be a valid OpenTelemetry attribute value.

agentlightning.utils.otel.sanitize_attributes(attributes, force=True)

Sanitize a dictionary of attributes to be a valid OpenTelemetry attributes.

Parameters:

  • attributes (Dict[str, Any]) –

    A dictionary of attributes to sanitize.

  • force (bool, default: True ) –

    Whether to force sanitization even when the value is not JSON serializable.

agentlightning.utils.otel.sanitize_list_attribute_sanity(maybe_list)

Try to sanitize a list of attributes to be a valid OpenTelemetry attribute value.

Raise error if the list contains multiple types of primitive values.

agentlightning.utils.otel.check_attributes_sanity(attributes)

Check if a dictionary of attributes is a valid OpenTelemetry attributes.

agentlightning.utils.otel.format_exception_attributes(exception)

Format an exception into a dictionary of attributes.

OTLP

agentlightning.utils.otlp.handle_otlp_export(request, request_message_cls, response_message_cls, message_callback, signal_name) async

Generic handler for /v1/traces, /v1/metrics, /v1/logs.

Convert the OTLP Protobuf request to a JSON-like object.

agentlightning.utils.otlp.spans_from_proto(request, sequence_id_bulk_issuer) async

Parse an OTLP proto payload into List[Span].

A store is needed here for generating a sequence ID for each span.

System Snapshot

agentlightning.utils.system_snapshot.system_snapshot(include_gpu=False)

Capture a snapshot of the system's hardware and software information.

Parameters:

  • include_gpu (bool, default: False ) –

    Whether to include GPU information.

Returns:

  • Dict[str, Any]

    A dictionary containing the system's hardware and software information.