Skip to content

Summarize Many Documents

Suppose I have a directory with multiple .pdf (or other) files and I want to run a GenAIScript over all of them. In this example, I’m generating a catchy tweet for each document and I want to save the tweet in another file.

Development

  1. Use the > GenAIScript: Create new script... command in the command palette to create a new script.

  2. This is an easy script. Assuming the script will take the file as an argument, you can refer to that argument in env.files and tell the LLM what to do with it:

    gen-tweet.genai.mjs
    script({ title: "gen-tweet" })
    def("FILE", env.files)
    $`Given the paper in FILE, write a 140 character summary of the paper
    that makes the paper sound exciting and encourages readers to look at it.`
  3. Right click on the document in VS Code Explorer (it can be a .pdf, a .docx, or a .md file because def knows how to read and parse all these file types). Select Run GenAIScript. Select the script gen-tweet you just wrote.

  4. Assuming we give the GenAIScript a paper describing GenAIScript, the Output will be displayed in a new document tab.

    Discover GenAIScript: a revolutionary scripting language integrating AI to automate complex tasks, making coding accessible to all! #AI #CodingFuture

    Because we didn’t tell the LLM to write the output to a file, it will by default go to standard out.

Automation

  1. We can run the script from the command line:

    Terminal window
    npx genaiscript run gen-tweet example1.pdf
  2. The output will be displayed in the terminal.

  3. Now that we have the script working for a single file, we can use the command line to apply it to a list of files. Let’s assume you start with a file ex1.pdf you want the output in a new file ex1.tweet.md. How you do this depends on the shell script you prefer. (See batch processing…).

    for file in *.pdf; do
    newfile="${file%.pdf}.tweet.md"; # foo.pdf -> foo.tweet.md
    if [ ! -f "$newfile" ]; then # skip if already exists
    npx genaiscript run gen-tweet $file > $newfile
    fi
    done