Commenter

This sample automates the process of adding comments to source code using an LLM and validates that the changes haven’t introduced any code modifications.

To achieve this, we could use a combination of tools to validate the transformation: source formatters, compilers, linters, or LLM-as-judge.

The algorithm could be summarized as follows:

for each file of files
    // generate
    add comments using GenAI

    // validate validate validate!
    format generated code (optional) -- keep things consistent
    build generated -- let's make sure it's still valid code
    check that only comments were changed -- LLM as a judge

// and more validate
final human code review

Let’s get started with analyzing the script.

Getting Files to Process

The user can select which files to comment on or, if none are selected, we’ll use Git to find all modified files.

let files = env.files
if (files.length === 0)
    // no files selected, use git to find modified files
    files = await ..."git status --porcelain"... // details in sources

Processing Each File

We process each file separately to avoid overwhelming the token context and to keep the AI focused. We can use inline prompts to make inner queries.

for (const file of files) {
    ... add comments
    ... format generated code (optional) -- keep things consistent
    ... build generated -- let's make sure it's still valid code
    ... check that only comments were changed -- LLM as judge
    ... save changes
}

The Prompt for Adding Comments

Within the addComments function, we prompt GenAI to add comments. We do this twice to increase the likelihood of generating useful comments, as the LLM might have been less effective on the first pass.

const res = await runPrompt(
    (ctx) => {
        ctx.$`You can add comments to this code...` // prompt details in sources
    },
    { system: ["system", "system.files"] }
)

We provide a detailed set of instructions to the AI on how to analyze and comment on the code.

Format, build, lint

At this point, we have source code modified by an LLM. We should try to use all available tools to validate the changes. It is best to start with formatters and compilers, as they are deterministic and typically fast.

Judge results with LLM

We issue one more prompt to judge the modified code (git diff) and make sure the code is not modified.

async function checkModifications(filename: string): Promise<boolean> {
    const diff = await host.exec(`git diff ${filename}`)
    if (!diff.stdout) return false
    const res = await runPrompt(
        (ctx) => {
            ctx.def("DIFF", diff.stdout)
            ctx.$`You are an expert developer at all programming languages.

        Your task is to analyze the changes in DIFF and make sure that only comments are modified.
        Report all changes that are not comments and print "<MODIFIED>".
        `
        },
        {
            cache: "cmt-check",
        }
    )
    return res.text?.includes("<MODIFIED>")
}

How to Run the Script

To run this script, you’ll first need to install the GenAIScript CLI. Follow the installation guide here.

genaiscript run cmt

Format and build

One important aspect is to normalize and validate the AI-generated code. The user can provide a format command to run a formatter and a build command to check if the code is still valid.

script({...,
    parameters: {
        format: {
            type: "string",
            description: "Format source code command",
        },
        build: {
            type: "string",
            description: "Build command",
        },
    },
})

const { format, build } = env.vars.build

genaiscript run cmt --vars "build=npm run build" "format=npm run format"

Full source (GitHub)

script({
    title: "Source Code Comment Generator",
    description: `Add comments to source code to make it more understandable for AI systems or human developers.`,
    parameters: {
        format: {
            type: "string",
            description: "Format source code command",
        },
        build: {
            type: "string",
            description: "Build command",
        },
    },
})

const { format, build } = env.vars

// Get files from environment or modified files from Git if none provided
let files = env.files
if (!files.length)
    files = await git.listFiles("staged", { askStageOnEmpty: true })
if (!files.length) files = await git.listFiles("modified-base")

// custom filter to only process code files
files = files.filter(
    ({ filename }) =>
        /\.(py|m?ts|m?js|cs|java|c|cpp|h|hpp)$/.test(filename) && // known languages only
        !/\.test/.test(filename) // ignore test files
)

// Shuffle files
files = files.sort(() => Math.random() - 0.5)

console.log(YAML.stringify(files.map((f) => f.filename)))

// Process each file separately to avoid context explosion
const jobs = host.promiseQueue(5)
await jobs.mapAll(files, processFile)

async function processFile(file: WorkspaceFile) {
    console.log(`processing ${file.filename}`)
    if (!file.content) console.log(`empty file, continue`)
    try {
        const newContent = await addComments(file)
        // Save modified content if different
        if (newContent && file.content !== newContent) {
            console.log(`updating ${file.filename}`)
            await workspace.writeText(file.filename, newContent)
            let revert = false
            // try formatting
            if (format) {
                const formatRes = await host.exec(`${format} ${file.filename}`)
                if (formatRes.exitCode !== 0) {
                    revert = true
                }
            }
            // try building
            if (!revert && build) {
                const buildRes = await host.exec(`${build} ${file.filename}`)
                if (buildRes.exitCode !== 0) {
                    revert = true
                }
            }
            // last LLM as judge check
            if (!revert) revert = await checkModifications(file.filename)

            // revert
            if (revert) {
                console.error(`reverting ${file.filename}...`)
                await workspace.writeText(file.filename, file.content)
            }
        }
    } catch (e) {
        console.error(`error: ${e}`)
    }
}

// Function to add comments to code
async function addComments(file: WorkspaceFile): Promise<string | undefined> {
    let { filename, content } = file
    if (parsers.tokens(file) > 20000) return undefined // too big

    const res = await runPrompt(
        (ctx) => {
            // Define code snippet for AI context with line numbers
            const code = ctx.def(
                "CODE",
                { filename, content },
                { lineNumbers: false }
            )

            // AI prompt to add comments for better understanding
            ctx.def("FILE", code, { detectPromptInjection: "available" })
            ctx.$`You are an expert developer at all programming languages.

You are tasked with adding comments to code in FILE to make it more understandable for AI systems or human developers.
You should analyze it, and add/update appropriate comments as needed.

To add or update comments to this code, follow these steps:

1. Analyze the code to understand its structure and functionality.
- If you are not familiar with the programming language, emit an empty file.
- If there is no code, emit an empty file.
2. Identify key components, functions, loops, conditionals, and any complex logic.
3. Add comments that explain:
- The purpose of functions or code blocks using the best comment format for that programming language.
- How complex algorithms or logic work
- Any assumptions or limitations in the code
- The meaning of important variables or data structures
- Any potential edge cases or error handling
- All function arguments and return value
- A Top level file comment that describes the code in the file

When adding or updating comments, follow these guidelines:

- Use clear and concise language
- Avoid stating the obvious (e.g., don't just restate what the code does)
- Focus on the "why" and "how" rather than just the "what"
- Use single-line comments for brief explanations
- Use multi-line comments for longer explanations or function/class descriptions
- Always place comments above the code they refer to.
- If comments already exist, review and update them as needed.
- Minimize changes to existing comments.
- For TypeScript functions, classes and fields, use JSDoc comments. do NOT add type annotations in comments.
- For Python functions and classes, use docstrings.
- do NOT modify comments with TODOs.
- do NOT modify comments with URLs or links as they are reference to external resources.
- do NOT add comments to imports

Your output should be the original code with your added comments. Make sure to preserve ALL the original code's formatting and structure. DO NOT BE LAZY.

Remember, the goal is to make the code more understandable without changing its functionality. DO NOT MODIFY THE CODE ITSELF.
Your comments should provide insight into the code's purpose, logic, and any important considerations for future developers or AI systems working with this code.
`
        },
        {
            system: [
                "system.assistant",
                "system.safety_jailbreak",
                "system.safety_harmful_content",
                "system.safety_validate_harmful_content",
            ],
            label: `comment ${filename}`,
        }
    )
    const { text, fences } = res
    const newContent = fences?.[0]?.content ?? text
    return newContent
}

async function checkModifications(filename: string): Promise<boolean> {
    const diff = await git.diff({ paths: filename })
    if (!diff) return false
    const res = await runPrompt(
        (ctx) => {
            ctx.def("DIFF", diff, { language: "diff" })
            ctx.$`You are an expert developer at all programming languages.

        Your task is to analyze the changes in DIFF and make sure that only comments are modified.
        Report all changes that are not comments or spacing and print <MOD>;
        otherwise, print <NO_MOD>.
        `
        },
        {
            system: ["system.assistant", "system.safety_jailbreak"],
            cache: "cmt-check",
            label: `check comments in ${filename}`,
        }
    )

    const modified =
        res.text?.includes("<MOD>") || !res.text?.includes("<NO_MOD>")
    return modified
}

Content Safety

The following measures are taken to ensure the safety of the generated content:

This script includes system prompts to prevent prompt injection and harmful content generation.
- system.safety_jailbreak
- system.safety_harmful_content
The generated description is saved to a file at a specific path, which allows for a manual review before committing the changes.

Additional measures to further enhance safety would be to run a model with a safety filter or validate the message with a content safety service.

Refer to the Transparency Note for more information on content safety.