Skip to content

APO

Shortcut

You can use the shortcut agl.APO(...) to create an APO instance.

import agentlightning as agl

agl.APO(...)

Installation

pip install agentlightning[apo]

Scope of Current Implementation

APO is currently scoped to optimize a single prompt template. Optimizing multiple prompt templates is not supported yet.

There is however no restriction on the number of variable placeholders in the prompt template (can range from zero to many). It's possible that invalid prompts are created during the optimization process. It is up to the agent developer to ensure that the prompt template is valid for the agent's task.

Initial Prompt

APO expects the initial prompt to be provided in the initial_resources dictionary. This can be done in two approaches:

  1. Pass to the Trainer constructor:
trainer = agl.Trainer(
    algorithm=agl.APO(...),
    initial_resources={"main_prompt": agl.PromptTemplate(template="You are a helpful assistant.", engine="f-string")},
)
  1. Pass to the [APO][agentlightning.algorithm.apo.APO].set_initial_resources() method:
algo = agl.APO(...)
algo.set_initial_resources(
    {"this_is_also_valid_key": agl.PromptTemplate(template="You are a helpful assistant.", engine="f-string")}
)

The resource key can be arbitrary, which is used to identify the prompt template in class-based implementations when you have multiple resources. When the key changes, the agent developer needs to update the key in the rollout method.

Tutorials Using APO

References

agentlightning.algorithm.apo

APO

Bases: Algorithm, Generic[T_task]

Automatic Prompt Optimization (APO) algorithm using textual gradients and beam search.

APO is an iterative prompt optimization algorithm that uses LLM-generated textual gradients to improve prompts through a beam search process. It evaluates prompts on rollouts, computes critiques based on the results, and applies edits to generate improved prompts.

The algorithm operates in rounds, where each round: 1. Samples parent prompts from the current beam 2. Generates new prompts by computing textual gradients and applying edits 3. Evaluates all candidates on a validation set 4. Selects the top-k prompts for the next round

Based on the ideas from: - ProTeGi: https://aclanthology.org/2023.emnlp-main.494.pdf - TextGrad: https://github.com/zou-group/textgrad

__init__(async_openai_client, *, gradient_model='gpt-5-mini', apply_edit_model='gpt-4.1-mini', diversity_temperature=1.0, gradient_batch_size=4, val_batch_size=16, beam_width=4, branch_factor=4, beam_rounds=3, rollout_batch_timeout=3600.0, run_initial_validation=True, _poml_trace=False)

Initialize the APO algorithm with configuration parameters.

Parameters:

  • async_openai_client (AsyncOpenAI) –

    AsyncOpenAI client for making LLM API calls.

  • gradient_model (str, default: 'gpt-5-mini' ) –

    Model name for computing textual gradients (critiques).

  • apply_edit_model (str, default: 'gpt-4.1-mini' ) –

    Model name for applying edits based on critiques.

  • diversity_temperature (float, default: 1.0 ) –

    Temperature parameter for LLM calls to control diversity.

  • gradient_batch_size (int, default: 4 ) –

    Number of rollout results to sample for gradient computation.

  • val_batch_size (int, default: 16 ) –

    Number of validation examples to use for evaluation.

  • beam_width (int, default: 4 ) –

    Number of top-scoring prompts to keep in the beam at each round.

  • branch_factor (int, default: 4 ) –

    Number of new prompt candidates to generate from each parent prompt by applying textual gradient edits. This controls the expansion of the search tree.

  • beam_rounds (int, default: 3 ) –

    Number of beam search rounds to perform.

  • rollout_batch_timeout (float, default: 3600.0 ) –

    Maximum time in seconds to wait for rollout batch completion.

  • run_initial_validation (bool, default: True ) –

    If True, runs validation on the seed prompt before starting optimization to establish a baseline score. Defaults to True.

compute_textual_gradient(current_prompt, rollout_results, *, prefix=None) async

Compute a textual gradient (critique) for the current prompt based on rollout results.

This method samples rollout results, sends them to an LLM along with the current prompt, and generates a critique describing how the prompt could be improved.

Parameters:

  • current_prompt (VersionedPromptTemplate) –

    The prompt template to critique.

  • rollout_results (List[RolloutResultForAPO]) –

    List of rollout results containing spans, messages, and rewards.

Returns:

  • Optional[str]

    A textual critique generated by the LLM, or None if generation fails.

evaluate_prompt_on_batch(prompt, resource_name, dataset, mode, *, prefix=None) async

Evaluate a prompt on a batch of tasks by running rollouts and computing average reward.

This method: 1. Adds the prompt as a named resource to the store 2. Enqueues rollouts for each task in the dataset 3. Waits for rollouts to complete (with timeout) 4. Computes and returns the average reward

Parameters:

  • prompt (VersionedPromptTemplate) –

    The prompt template string to evaluate.

  • resource_name (str) –

    The name to register the prompt under in the store.

  • dataset (Sequence[T_task]) –

    Sequence of tasks to evaluate the prompt on.

  • mode (RolloutMode) –

    Rollout mode ("train" or "val") for logging/tracking.

Returns:

  • List[RolloutResultForAPO]

    A tuple of (rollout_results, average_reward) where rollout_results contains

  • float

    detailed information for each rollout and average_reward is the mean final reward.

get_adapter()

Get the adapter for converting spans to messages.

Returns:

Raises:

  • ValueError

    If the adapter is not a TraceToMessages.

get_best_prompt()

Retrieve the best prompt discovered during optimization.

Returns:

  • PromptTemplate

    The prompt template with the highest validation score found so far.

Raises:

  • ValueError

    If no best prompt has been found yet (run() not called).

get_rollout_results(rollout, *, prefix=None) async

Convert completed rollouts to APO-compatible result format.

Fetches spans for each rollout, adapts them to messages, and packages them with rewards and status information for gradient computation.

Parameters:

  • rollout (List[Rollout]) –

    List of completed rollout metadata.

Returns:

  • List[RolloutResultForAPO]

    List of rollout results formatted for APO processing.

get_seed_prompt_template()

Extract the initial prompt template from the algorithm's resources.

Returns:

  • Tuple[str, PromptTemplate]

    A tuple of (resource_name, prompt_template) representing the seed prompt.

Raises:

  • ValueError

    If initial_resources is not set or no PromptTemplate is found.

run(train_dataset=None, val_dataset=None) async

Execute the APO algorithm to optimize prompts through beam search with textual gradients.

The algorithm performs iterative prompt optimization over multiple rounds: - Each round: samples parent prompts, generates new candidates via textual gradients, evaluates all candidates on validation data, and keeps the top performers - Tracks the historically best prompt across all rounds - Uses different training data samples for each gradient computation to ensure diversity

Parameters:

  • train_dataset (Optional[Dataset[T_task]], default: None ) –

    Dataset of tasks for computing textual gradients. Required.

  • val_dataset (Optional[Dataset[T_task]], default: None ) –

    Dataset of tasks for evaluating and selecting prompts. Required.

Raises:

  • ValueError

    If train_dataset or val_dataset is None, or if resources are not set.

textual_gradient_and_apply_edit(current_prompt, rollout, *, prefix=None) async

Generate an improved prompt by computing a textual gradient and applying an edit.

This is the main optimization step that: 1. Computes a critique (textual gradient) based on rollout performance 2. Uses another LLM to apply the critique and generate an improved prompt

Parameters:

  • current_prompt (VersionedPromptTemplate) –

    The current prompt template to improve.

  • rollout (List[RolloutResultForAPO]) –

    List of rollout results to base the critique on.

Returns:

  • Optional[str]

    The improved prompt text, or the original prompt if gradient computation fails.