GitHub Action Investigator

The following sample shows a script that analyzes a GitHub Action Workflow Job log and attempts to determine the root cause of the issue.

Strategy

The script is a hybrid between traditional software and LLM/Agent based software. We start by collecting relevant information for the LLM, to fill the context with relevant information then we let the agent reason and ask for more information as needed through tools.

The first part of the script is a traditional software that collects the information and prepares the context for the LLM. It uses the following simple strategy:

find information about the failed workflow run, including commit, and job logs
find the last successful workflow run if any, and gather the commit, and job lobs
build a LLM context with all the information, including commit diffs, and job lobs diffs.

The information gathered is this section is not hallucinated by design and added to the final output using the env.output object.

The second part is an agent that uses the LLM to reason about the information and ask for more information as needed.

Add the script

Open your GitHub repository and start a new pull request.
Add the following script to your repository as genaisrc/prr.genai.mts.

script({
  title: "GitHub Action Investigator",
  description: "Analyze GitHub Action runs to find the root cause of a failure",
  parameters: {
    /** the user can get the url from the github web
     *  like 14890513008 or https://github.com/microsoft/genaiscript/actions/runs/14890513008
     */
    runId: {
      type: "number",
      description: "Run identifier",
    },
    jobId: {
      type: "number",
      description: "Job identifier",
    },
    runUrl: {
      type: "string",
      description: "Run identifier or URL",
    },
  },
  system: ["system", "system.assistant", "system.annotations", "system.files"],
  flexTokens: 10000,
  cache: "gai",
  model: "small",
  tools: ["agent_fs", "agent_github", "agent_git"],
});
const { dbg, output, vars } = env;

output.heading(2, "Investigator report");
output.heading(3, "Context collection");
const { owner, repo } = await github.info();

let runId: number = vars.runId;
let jobId: number = vars.jobId;
if (isNaN(runId)) {
  const runUrl = vars.runUrl;
  output.itemLink(`run url`, runUrl);

  // Retrieve repository information
  const { runRepo, runOwner, runIdUrl, jobIdUrl } =
    /^https:\/\/github\.com\/(?<runOwner>\w+)\/(?<runRepo>\w+)\/actions\/runs\/(?<runIdUrl>\d+)?(\/job\/(?<jobIdRun>\d+))?/i.exec(
      runUrl,
    )?.groups || {};
  if (!runRepo)
    throw new Error(
      "Url not recognized. Please provide a valid URL https://github.com/<owner>/<repo>/actions/runs/<runId>/...",
    );
  runId = parseInt(runIdUrl);
  dbg(`runId: ${runId}`);

  jobId = parseInt(jobIdUrl);
  dbg(`jobId: ${jobId}`);

  if (runOwner !== owner)
    cancel(`Run owner ${runOwner} does not match the current repository owner ${owner}`);
  if (runRepo !== repo)
    cancel(`Run repository ${runRepo} does not match the current repository ${repo}`);
}

if (isNaN(runId)) throw new Error("You must provide a runId or runUrl");
output.itemValue(`run id`, runId);
// fetch run
const run = await github.workflowRun(runId);
dbg(`run: %O`, run);
const branch = run.head_branch;
dbg(`branch: ${branch}`);

const workflow = await github.workflow(run.workflow_id);
dbg(`workflow: ${workflow.name}`);

// List workflow runs for the specified workflow and branch
const runs = await github.listWorkflowRuns(workflow.id, {
  status: "completed",
  branch,
  count: 100,
});
runs.reverse(); // from newest to oldest

dbg(
  `runs: %O`,
  runs.map(({ id, conclusion, workflow_id, html_url, run_started_at }) => ({
    id,
    conclusion,
    workflow_id,
    html_url,
    run_started_at,
  })),
);

const reversedRuns = runs.filter((r) => new Date(r.run_started_at) <= new Date(run.run_started_at));
if (!reversedRuns.length) cancel("No runs found");
dbg(
  `reversed runs: %O`,
  reversedRuns.map(({ id, conclusion, workflow_id, html_url, run_started_at }) => ({
    id,
    conclusion,
    workflow_id,
    html_url,
    run_started_at,
  })),
);

const firstFailedJobs = await github.listWorkflowJobs(run.id);
const firstFailedJob =
  firstFailedJobs.find(({ conclusion }) => conclusion === "failure") ?? firstFailedJobs[0];
const firstFailureLog = firstFailedJob.content;
if (!firstFailureLog) cancel("No logs found");
output.itemLink(`failed job`, firstFailedJob.html_url);

// resolve the latest successful workflow run
const lastSuccessRun = reversedRuns.find(({ conclusion }) => conclusion === "success");
if (lastSuccessRun)
  output.itemLink(`last successful run #${lastSuccessRun.run_number}`, lastSuccessRun.html_url);
else output.item(`last successful run not found`);

let gitDiffRef: string;
let logRef: string;
let logDiffRef: string;
if (lastSuccessRun) {
  if (lastSuccessRun.head_sha === run.head_sha) {
    console.debug("No previous successful run found");
  } else {
    output.itemLink(
      `diff (${lastSuccessRun.head_sha.slice(0, 7)}...${run.head_sha.slice(0, 7)})`,
      `https://github.com/${owner}/${repo}/compare/${lastSuccessRun.head_sha}...${run.head_sha}`,
    );

    // Execute git diff between the last success and failed run commits
    await git.fetch("origin", lastSuccessRun.head_sha);
    await git.fetch("origin", run.head_sha);
    const gitDiff = await git.diff({
      base: lastSuccessRun.head_sha,
      head: run.head_sha,
      excludedPaths: "**/genaiscript.d.ts",
    });

    if (gitDiff) {
      gitDiffRef = def("GIT_DIFF", gitDiff, {
        language: "diff",
        lineNumbers: true,
        flex: 1,
      });
    }
  }
}

if (!lastSuccessRun) {
  // Define log content if no last successful run is available
  logRef = def("LOG", firstFailureLog, {
    maxTokens: 20000,
    lineNumbers: false,
  });
} else {
  const lastSuccessJobs = await github.listWorkflowJobs(lastSuccessRun.id);
  const lastSuccessJob = lastSuccessJobs.find(({ name }) => firstFailedJob.name === name);
  if (!lastSuccessJob)
    console.debug(`could not find job ${firstFailedJob.name} in last success run`);
  else {
    output.itemLink(`last successful job`, lastSuccessJob.html_url);
    const jobDiff = await github.diffWorkflowJobLogs(firstFailedJob.id, lastSuccessJob.id);
    // Generate a diff of logs between the last success and failed runs
    logDiffRef = def("LOG_DIFF", jobDiff, {
      language: "diff",
      lineNumbers: false,
    });
  }
}

// Instruction for generating a report based on the analysis
$`Your are an expert software engineer and you are able to analyze the logs and find the root cause of the failure.

${lastSuccessRun ? `You are analyzing 2 GitHub Action Workflow Runs: a SUCCESS_RUN and a FAILED_RUN.` : ""}

${gitDiffRef ? `- ${gitDiffRef} contains a git diff of the commits of SUCCESS_RUN and FAILED_RUN` : ""}
${logDiffRef ? `- ${logDiffRef} contains a workflow job diff of SUCCESS_RUN and FAILED_RUN` : ""}
${logRef ? `- ${logRef} contains the log of the FAILED_RUN` : ""}

${lastSuccessRun ? `- The SUCCESS_RUN is the last successful workflow run (head_sha: ${lastSuccessRun})` : ""}
- The FAILED_RUN is the workflow run that failed (head_sha: ${run.head_sha})

## Task
Analyze the diff in LOG_DIFF and provide a summary of the root cause of the failure.

Show the code that is responsible for the failure.
If you cannot find the root cause, stop.

Investigate potential fixes for the failure.
If you find a solution, generate a diff with suggested fixes. Use a diff format.
If you cannot locate the error, do not generate a diff.

## Instructions

Use 'agent_fs', 'agent_git' and 'agent_github' if you need more information.
Do not invent git or github information.
You have access to the entire source code through the agent_fs tool.
`;

output.heading(2, `AI Analysis`);

Automate with GitHub Actions

Using GitHub Actions and GitHub Models, you can automate the execution of the script and creation of the comments.

You can decide to turn on/off the agentic part of the script my commenting out the agent_* line. A script without agent has a predictable token consumption behavior (it’s 1 LLM call); an agentic script will go into a loop and will consume more tokens, but it will be able to ask for more information if needed.

name: genai investigator
on:
    workflow_run:
        workflows: ["build", "playwright", "ollama"]
        types:
            - completed
concurrency:
    group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event.workflow_run.event }}-${{ github.event.workflow_run.conclusion }}
    cancel-in-progress: true
permissions:
    contents: read
    actions: read
    pull-requests: write
    models: read
jobs:
    investigate:
        # Only run this job if the workflow run concluded with a failure
        # and was triggered by a pull request event
        if: ${{ github.event.workflow_run.conclusion == 'failure' && github.event.workflow_run.event == 'pull_request' }}
        runs-on: ubuntu-latest
        steps:
            - uses: actions/checkout@v4
              with:
                  submodules: "recursive"
                  fetch-depth: 10
            - uses: pnpm/action-setup@v4
            - uses: actions/setup-node@v4
              with:
                  node-version: "22"
                  cache: pnpm
            - run: pnpm install --frozen-lockfile
            - name: compile
              run: pnpm compile
            - name: genaiscript gai
              run: node packages/cli/dist/src/index.js run gai -p github --pull-request-comment --vars "runId=${{ github.event.workflow_run.id }}" --out-trace $GITHUB_STEP_SUMMARY
              env:
                  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Are we done yet?

No! This script is far from perfect and in fact, it probably needs better heuristics to build the context that is specific to your repository. This is a good starting point, but you will need to tweak the heuristics to make it work for your repository.