LLM Agents

An agent is a special kind of tool that uses an inline prompt and tools to solve a task.

Usage

We want to build a script that can investigate the most recent run failures in a GitHub repository using GitHub Actions. To do so, we probably will need to the following agents:

query the GitHub API, agent_github
compute some git diff to determine which changes broken the build, agent_git
read or search files agent_fs

script({
    tools: ["agent_fs", "agent_git", "agent_github", ...],
    ...
})

Each of these agent is capable of calling an LLM with a specific set of tools to accomplish a task.

The full script source code is available below:

script({
  tools: ["agent_fs", "agent_git", "agent_github", "agent_interpreter", "agent_docs"],
  model: "reasoning",
  parameters: {
    jobUrl: { type: "string" }, // URL of the job
    workflow: { type: "string" }, // Workflow name
    failure_run_id: { type: "number" }, // ID of the failed run
    branch: { type: "string" }, // Branch name
  },
});

const { workflow = "build.yml", failure_run_id, branch = await git.branch(), jobUrl } = env.vars;

if (jobUrl) {
  $`1. Extract the run id and job id from the  ${jobUrl}`;
  $`2. Find the last successful run before the failed run for the same workflow and branch`;
} else if (failure_run_id) {
  $`1. Find the failed run ${failure_run_id} of ${workflow} for branch ${branch}
    2. Find the last successful run before the failed run for the same workflow and branch`;
} else {
  $`0. Find the worflow ${workflow} in the repository
1. Find the latest failed run of ${workflow} for branch ${branch}
2. Find the last successful run before the failed run`;
}
$`3. Compare the run job logs between the failed run and the last successful run
4. git diff the failed run commit (head_sha) and the last successful run commit
    - show a diff of the source code that created the problem if possible
5. Analyze all the above information and identify the root cause of the failure
    - generate a patch to fix the problem if possible
6. Generate a detailled report of the failure and the root cause
    - include a list of all HTML urls to the relevant runs, commits, pull requests or issues
    - include diff of code changes
    - include the patch if generated
    - include a summary of the root cause
`;

defOutputProcessor(async ({ messages }) => {
  await runPrompt((_) => {
    _.$`- Generate a pseudo code summary of the plan implemented in MESSAGES. MESSAGES is a LLM conversation with tools.
        - Judge the quality of the plan and suggest 2 improvements.
        - Generate a python program that optimizes the plan in code. Assume "llm" is a LLM call.`;
    _.def(
      "MESSAGES",
      messages.map((msg) => _.$`- ${msg.role}: ${msg.content || JSON.stringify(msg)}`).join("\n"),
    );
  });
  return undefined;
});

Multiple instances of the same agent

Some agents, like agent_git, can be instantiated with different parameters, like working on different repositories.

script({
  system: [
    "system.agent_git",
    {
      id: "system.agent_git",
      parameters: { repo: "microsoft/jacdac", variant: "jacdac" },
    },
  ],
});

$`Generate a table with the last commits of the jacdac and current git repository?`;

In such case, make sure to provide a variant argument that will be used to generate a unique agent name.

To split or not to split

You could try to load all the tools in the same LLM call and run the task as a single LLM conversation. Results may vary.

script({
    tools: ["fs", "git", "github", ...],
    ...
})