Skip to content

LLM Agents

An agent is a special kind of tool that uses an inline prompt and tools to solve a task.

Usage

We want to build a script that can investigate the most recent run failures in a GitHub repository using GitHub Actions. To do so, we probably will need to the following agents:

  • query the GitHub API, agent_github
  • compute some git diff to determine which changes broken the build, agent_git
  • read or search files agent_fs
github-investigator.genai.mts
script({
tools: ["agent_fs", "agent_git", "agent_github", ...],
...
})

Each of these agent is capable of calling an LLM with a specific set of tools to accomplish a task.

The full script source code is available below:

github-investigator.genai.mts
script({
tools: [
"agent_fs",
"agent_git",
"agent_github",
"agent_interpreter",
"agent_docs",
],
parameters: {
jobUrl: { type: "string" }, // URL of the job
workflow: { type: "string" }, // Workflow name
failure_run_id: { type: "number" }, // ID of the failed run
branch: { type: "string" }, // Branch name
},
})
const {
workflow = "build.yml",
failure_run_id,
branch = await git.branch(),
jobUrl,
} = env.vars
if (jobUrl) {
$`1. Extract the run id and job id from the ${jobUrl}`
$`2. Find the last successful run before the failed run for the same workflow and branch`
} else if (failure_run_id) {
$`1. Find the failed run ${failure_run_id} of ${workflow} for branch ${branch}
2. Find the last successful run before the failed run for the same workflow and branch`
} else {
$`0. Find the worflow ${workflow} in the repository
1. Find the latest failed run of ${workflow} for branch ${branch}
2. Find the last successful run before the failed run`
}
$`3. Compare the run job logs between the failed run and the last successful run
4. git diff the failed run commit (head_sha) and the last successful run commit
- show a diff of the source code that created the problem if possible
5. Analyze all the above information and identify the root cause of the failure
- generate a patch to fix the problem if possible
6. Generate a detailled report of the failure and the root cause
- include a list of all HTML urls to the relevant runs, commits, pull requests or issues
- include diff of code changes
- include the patch if generated
- include a summary of the root cause
`
defOutputProcessor(async ({ messages }) => {
await runPrompt((_) => {
_.$`- Generate a pseudo code summary of the plan implemented in MESSAGES. MESSAGES is a LLM conversation with tools.
- Judge the quality of the plan and suggest 2 improvements.
- Generate a python program that optimizes the plan in code. Assume "llm" is a LLM call.`
_.def(
"MESSAGES",
messages
.map(
(msg) =>
_.$`- ${msg.role}: ${msg.content || msg.value || JSON.stringify(msg)}`
)
.join("\n")
)
})
return undefined
})

To split or not to split

You could try to load all the tools in the same LLM call and run the task as a single LLM conversation. Results may vary.

github-investigator.genai.mts
script({
tools: ["fs", "git", "github", ...],
...
})