Commenter

AI generated translation.

Cet exemple automatise le processus d’ajout de commentaires au code source à l’aide d’un LLM et vérifie que les modifications n’ont pas introduit de changements dans le code.

Pour cela, nous pouvons utiliser une combinaison d’outils pour valider la transformation : formateurs de code, compilateurs, linters ou un LLM comme juge.

L’algorithme peut se résumer comme suit :

for each file of files
    // generate
    add comments using GenAI

    // validate validate validate!
    format generated code (optional) -- keep things consistent
    build generated -- let's make sure it's still valid code
    check that only comments were changed -- LLM as a judge

// and more validate
final human code review

Commençons par analyser le script.

Récupération des fichiers à traiter

L’utilisateur peut sélectionner les fichiers à commenter ou, si aucun fichier n’est sélectionné, nous utiliserons Git pour trouver tous les fichiers modifiés.

let files = env.files
if (files.length === 0)
    // no files selected, use git to find modified files
    files = await ..."git status --porcelain"... // details in sources

Traitement de chaque fichier

Nous traitons chaque fichier séparément pour ne pas surcharger le contexte de tokens et pour maintenir la concentration de l’IA. Nous pouvons utiliser des prompts en ligne pour effectuer des requêtes internes.

for (const file of files) {
    ... add comments
    ... format generated code (optional) -- keep things consistent
    ... build generated -- let's make sure it's still valid code
    ... check that only comments were changed -- LLM as judge
    ... save changes
}

Le prompt pour ajouter des commentaires

Dans la fonction addComments, nous demandons à GenAI d’ajouter des commentaires. Nous le faisons deux fois pour augmenter la probabilité de générer des commentaires utiles, car le LLM peut avoir été moins efficace lors du premier passage.

const res = await runPrompt(
    (ctx) => {
        ctx.$`You can add comments to this code...` // prompt details in sources
    },
    { system: ["system", "system.files"] }
)

Nous fournissons un ensemble d’instructions détaillées à l’IA afin qu’elle analyse et commente le code.

Formater, construire, analyser

À ce stade, nous avons un code source modifié par un LLM. Nous devons essayer d’utiliser tous les outils disponibles pour valider les modifications. Il est préférable de commencer par les formateurs et les compilateurs, car ils sont déterministes et généralement rapides.

Évaluer les résultats avec LLM

Nous lançons un prompt supplémentaire pour juger le code modifié (git diff) et nous assurer que le code n’a pas été modifié.

async function checkModifications(filename: string): Promise<boolean> {
    const diff = await host.exec(`git diff ${filename}`)
    if (!diff.stdout) return false
    const res = await runPrompt(
        (ctx) => {
            ctx.def("DIFF", diff.stdout)
            ctx.$`You are an expert developer at all programming languages.

        Your task is to analyze the changes in DIFF and make sure that only comments are modified.
        Report all changes that are not comments and print "<MODIFIED>".
        `
        },
        {
            cache: "cmt-check",
        }
    )
    return res.text?.includes("<MODIFIED>")
}

## Comment exécuter le script

Pour exécuter ce script, vous devez d’abord installer l’interface en ligne de commande GenAIScript. Suivez le guide d’installation ici.

genaiscript run cmt

Formatage et compilation

Un aspect important est de normaliser et de valider le code généré par l’IA. L’utilisateur peut fournir une commande format pour exécuter un formateur et une commande build pour vérifier si le code est toujours valide.

script({...,
    parameters: {
        format: {
            type: "string",
            description: "Format source code command",
        },
        build: {
            type: "string",
            description: "Build command",
        },
    },
})

const { format, build } = env.vars.build

genaiscript run cmt --vars "build=npm run build" "format=npm run format"

Full source (GitHub)

script({
  title: "Source Code Comment Generator",
  description: `Add comments to source code to make it more understandable for AI systems or human developers.`,
  parameters: {
    format: {
      type: "string",
      description: "Format source code command",
    },
    build: {
      type: "string",
      description: "Build command",
    },
  },
});

const { format, build } = env.vars;

// Get files from environment or modified files from Git if none provided
let files = env.files;
if (!files.length) files = await git.listFiles("staged", { askStageOnEmpty: true });
if (!files.length) files = await git.listFiles("modified-base");

// custom filter to only process code files
files = files.filter(
  ({ filename }) =>
    /\.(py|m?ts|m?js|cs|java|c|cpp|h|hpp)$/.test(filename) && // known languages only
    !/\.test/.test(filename), // ignore test files
);

// Shuffle files
files = files.sort(() => Math.random() - 0.5);

console.log(YAML.stringify(files.map((f) => f.filename)));

// Process each file separately to avoid context explosion
const jobs = host.promiseQueue(5);
await jobs.mapAll(files, processFile);

async function processFile(file: WorkspaceFile) {
  console.log(`processing ${file.filename}`);
  if (!file.content) console.log(`empty file, continue`);
  try {
    const newContent = await addComments(file);
    // Save modified content if different
    if (newContent && file.content !== newContent) {
      console.log(`updating ${file.filename}`);
      await workspace.writeText(file.filename, newContent);
      let revert = false;
      // try formatting
      if (format) {
        const formatRes = await host.exec(`${format} ${file.filename}`);
        if (formatRes.exitCode !== 0) {
          revert = true;
        }
      }
      // try building
      if (!revert && build) {
        const buildRes = await host.exec(`${build} ${file.filename}`);
        if (buildRes.exitCode !== 0) {
          revert = true;
        }
      }
      // last LLM as judge check
      if (!revert) revert = await checkModifications(file.filename);

      // revert
      if (revert) {
        console.error(`reverting ${file.filename}...`);
        await workspace.writeText(file.filename, file.content);
      }
    }
  } catch (e) {
    console.error(`error: ${e}`);
  }
}

// Function to add comments to code
async function addComments(file: WorkspaceFile): Promise<string | undefined> {
  let { filename, content } = file;
  if ((await tokenizers.count(file.content)) > 20000) return undefined; // too big

  const res = await runPrompt(
    (ctx) => {
      // Define code snippet for AI context with line numbers
      const code = ctx.def("CODE", { filename, content }, { lineNumbers: false });

      // AI prompt to add comments for better understanding
      ctx.def("FILE", code, { detectPromptInjection: "available" });
      ctx.$`You are an expert developer at all programming languages.

You are tasked with adding comments to code in FILE to make it more understandable for AI systems or human developers.
You should analyze it, and add/update appropriate comments as needed.

To add or update comments to this code, follow these steps:

1. Analyze the code to understand its structure and functionality.
- If you are not familiar with the programming language, emit an empty file.
- If there is no code, emit an empty file.
2. Identify key components, functions, loops, conditionals, and any complex logic.
3. Add comments that explain:
- The purpose of functions or code blocks using the best comment format for that programming language.
- How complex algorithms or logic work
- Any assumptions or limitations in the code
- The meaning of important variables or data structures
- Any potential edge cases or error handling
- All function arguments and return value
- A Top level file comment that describes the code in the file

When adding or updating comments, follow these guidelines:

- Use clear and concise language
- Avoid stating the obvious (e.g., don't just restate what the code does)
- Focus on the "why" and "how" rather than just the "what"
- Use single-line comments for brief explanations
- Use multi-line comments for longer explanations or function/class descriptions
- Always place comments above the code they refer to.
- If comments already exist, review and update them as needed.
- Minimize changes to existing comments.
- For TypeScript functions, classes and fields, use JSDoc comments. do NOT add type annotations in comments.
- For Python functions and classes, use docstrings.
- do NOT modify comments with TODOs.
- do NOT modify comments with URLs or links as they are reference to external resources.
- do NOT add comments to imports

Your output should be the original code with your added comments. Make sure to preserve ALL the original code's formatting and structure. DO NOT BE LAZY.

Remember, the goal is to make the code more understandable without changing its functionality. DO NOT MODIFY THE CODE ITSELF.
Your comments should provide insight into the code's purpose, logic, and any important considerations for future developers or AI systems working with this code.
`;
    },
    {
      system: [
        "system.assistant",
        "system.safety_jailbreak",
        "system.safety_harmful_content",
        "system.safety_validate_harmful_content",
      ],
      label: `comment ${filename}`,
    },
  );
  const { text, fences } = res;
  const newContent = fences?.[0]?.content ?? text;
  return newContent;
}

async function checkModifications(filename: string): Promise<boolean> {
  const diff = await git.diff({ paths: filename });
  if (!diff) return false;
  const res = await runPrompt(
    (ctx) => {
      ctx.def("DIFF", diff, { language: "diff" });
      ctx.$`You are an expert developer at all programming languages.

        Your task is to analyze the changes in DIFF and make sure that only comments are modified.
        Report all changes that are not comments or spacing and print <MOD>;
        otherwise, print <NO_MOD>.
        `;
    },
    {
      system: ["system.assistant", "system.safety_jailbreak"],
      cache: "cmt-check",
      label: `check comments in ${filename}`,
    },
  );

  const modified = res.text?.includes("<MOD>") || !res.text?.includes("<NO_MOD>");
  return modified;
}

Sécurité du contenu

Les mesures suivantes sont prises pour assurer la sécurité du contenu généré :

Ce script inclut des invites système pour empêcher les injections de prompt et la génération de contenu nuisible.
- system.safety_jailbreak
- system.safety_harmful_content
La description générée est sauvegardée dans un fichier à un chemin spécifique, ce qui permet une revue manuelle avant de valider les modifications.

Des mesures supplémentaires pour renforcer la sécurité incluent l’exécution d’un modèle avec un filtre de sécurité ou la validation du message via un service de sécurité de contenu.

Consultez la Note de transparence pour plus d’informations sur la sécurité du contenu.