APO¶
Shortcut
You can use the shortcut agl.APO(...)
to create an APO instance.
Installation¶
Scope of Current Implementation¶
APO is currently scoped to optimize a single prompt template. Optimizing multiple prompt templates is not supported yet.
There is however no restriction on the number of variable placeholders in the prompt template (can range from zero to many). It's possible that invalid prompts are created during the optimization process. It is up to the agent developer to ensure that the prompt template is valid for the agent's task.
Initial Prompt¶
APO expects the initial prompt to be provided in the initial_resources
dictionary. This can be done in two approaches:
- Pass to the Trainer constructor:
trainer = agl.Trainer(
algorithm=agl.APO(...),
initial_resources={"main_prompt": agl.PromptTemplate(template="You are a helpful assistant.", engine="f-string")},
)
- Pass to the
[APO][agentlightning.algorithm.apo.APO].set_initial_resources()
method:
algo = agl.APO(...)
algo.set_initial_resources(
{"this_is_also_valid_key": agl.PromptTemplate(template="You are a helpful assistant.", engine="f-string")}
)
The resource key can be arbitrary, which is used to identify the prompt template in class-based implementations when you have multiple resources. When the key changes, the agent developer needs to update the key in the rollout
method.
Tutorials Using APO¶
- Train the First Agent with APO - A step-by-step guide to training your first agent using APO.
References¶
agentlightning.algorithm.apo
¶
APO
¶
Bases: Algorithm
, Generic[T_task]
Automatic Prompt Optimization (APO) algorithm using textual gradients and beam search.
APO is an iterative prompt optimization algorithm that uses LLM-generated textual gradients to improve prompts through a beam search process. It evaluates prompts on rollouts, computes critiques based on the results, and applies edits to generate improved prompts.
The algorithm operates in rounds, where each round: 1. Samples parent prompts from the current beam 2. Generates new prompts by computing textual gradients and applying edits 3. Evaluates all candidates on a validation set 4. Selects the top-k prompts for the next round
Based on the ideas from: - ProTeGi: https://aclanthology.org/2023.emnlp-main.494.pdf - TextGrad: https://github.com/zou-group/textgrad
__init__(async_openai_client, *, gradient_model='gpt-5-mini', apply_edit_model='gpt-4.1-mini', diversity_temperature=1.0, gradient_batch_size=4, val_batch_size=16, beam_width=4, branch_factor=4, beam_rounds=3, rollout_batch_timeout=3600.0, run_initial_validation=True, _poml_trace=False)
¶
Initialize the APO algorithm with configuration parameters.
Parameters:
-
async_openai_client
(AsyncOpenAI
) –AsyncOpenAI client for making LLM API calls.
-
gradient_model
(str
, default:'gpt-5-mini'
) –Model name for computing textual gradients (critiques).
-
apply_edit_model
(str
, default:'gpt-4.1-mini'
) –Model name for applying edits based on critiques.
-
diversity_temperature
(float
, default:1.0
) –Temperature parameter for LLM calls to control diversity.
-
gradient_batch_size
(int
, default:4
) –Number of rollout results to sample for gradient computation.
-
val_batch_size
(int
, default:16
) –Number of validation examples to use for evaluation.
-
beam_width
(int
, default:4
) –Number of top-scoring prompts to keep in the beam at each round.
-
branch_factor
(int
, default:4
) –Number of new prompt candidates to generate from each parent prompt by applying textual gradient edits. This controls the expansion of the search tree.
-
beam_rounds
(int
, default:3
) –Number of beam search rounds to perform.
-
rollout_batch_timeout
(float
, default:3600.0
) –Maximum time in seconds to wait for rollout batch completion.
-
run_initial_validation
(bool
, default:True
) –If True, runs validation on the seed prompt before starting optimization to establish a baseline score. Defaults to True.
compute_textual_gradient(current_prompt, rollout_results, *, prefix=None)
async
¶
Compute a textual gradient (critique) for the current prompt based on rollout results.
This method samples rollout results, sends them to an LLM along with the current prompt, and generates a critique describing how the prompt could be improved.
Parameters:
-
current_prompt
(VersionedPromptTemplate
) –The prompt template to critique.
-
rollout_results
(List[RolloutResultForAPO]
) –List of rollout results containing spans, messages, and rewards.
Returns:
-
Optional[str]
–A textual critique generated by the LLM, or None if generation fails.
evaluate_prompt_on_batch(prompt, resource_name, dataset, mode, *, prefix=None)
async
¶
Evaluate a prompt on a batch of tasks by running rollouts and computing average reward.
This method: 1. Adds the prompt as a named resource to the store 2. Enqueues rollouts for each task in the dataset 3. Waits for rollouts to complete (with timeout) 4. Computes and returns the average reward
Parameters:
-
prompt
(VersionedPromptTemplate
) –The prompt template string to evaluate.
-
resource_name
(str
) –The name to register the prompt under in the store.
-
dataset
(Sequence[T_task]
) –Sequence of tasks to evaluate the prompt on.
-
mode
(RolloutMode
) –Rollout mode ("train" or "val") for logging/tracking.
Returns:
-
List[RolloutResultForAPO]
–A tuple of (rollout_results, average_reward) where rollout_results contains
-
float
–detailed information for each rollout and average_reward is the mean final reward.
get_adapter()
¶
Get the adapter for converting spans to messages.
Returns:
-
TraceToMessages
–The TraceToMessages instance for this algorithm.
Raises:
-
ValueError
–If the adapter is not a TraceToMessages.
get_best_prompt()
¶
Retrieve the best prompt discovered during optimization.
Returns:
-
PromptTemplate
–The prompt template with the highest validation score found so far.
Raises:
-
ValueError
–If no best prompt has been found yet (run() not called).
get_rollout_results(rollout, *, prefix=None)
async
¶
Convert completed rollouts to APO-compatible result format.
Fetches spans for each rollout, adapts them to messages, and packages them with rewards and status information for gradient computation.
Parameters:
-
rollout
(List[Rollout]
) –List of completed rollout metadata.
Returns:
-
List[RolloutResultForAPO]
–List of rollout results formatted for APO processing.
get_seed_prompt_template()
¶
Extract the initial prompt template from the algorithm's resources.
Returns:
-
Tuple[str, PromptTemplate]
–A tuple of (resource_name, prompt_template) representing the seed prompt.
Raises:
-
ValueError
–If initial_resources is not set or no PromptTemplate is found.
run(train_dataset=None, val_dataset=None)
async
¶
Execute the APO algorithm to optimize prompts through beam search with textual gradients.
The algorithm performs iterative prompt optimization over multiple rounds: - Each round: samples parent prompts, generates new candidates via textual gradients, evaluates all candidates on validation data, and keeps the top performers - Tracks the historically best prompt across all rounds - Uses different training data samples for each gradient computation to ensure diversity
Parameters:
-
train_dataset
(Optional[Dataset[T_task]]
, default:None
) –Dataset of tasks for computing textual gradients. Required.
-
val_dataset
(Optional[Dataset[T_task]]
, default:None
) –Dataset of tasks for evaluating and selecting prompts. Required.
Raises:
-
ValueError
–If train_dataset or val_dataset is None, or if resources are not set.
textual_gradient_and_apply_edit(current_prompt, rollout, *, prefix=None)
async
¶
Generate an improved prompt by computing a textual gradient and applying an edit.
This is the main optimization step that: 1. Computes a critique (textual gradient) based on rollout performance 2. Uses another LLM to apply the critique and generate an improved prompt
Parameters:
-
current_prompt
(VersionedPromptTemplate
) –The current prompt template to improve.
-
rollout
(List[RolloutResultForAPO]
) –List of rollout results to base the critique on.
Returns:
-
Optional[str]
–The improved prompt text, or the original prompt if gradient computation fails.