Summarize Many Documents
Suppose I have a directory with multiple .pdf
(or other) files and I want to run a GenAIScript over all of them.
In this example, I’m generating a catchy tweet for each document and I want to save the tweet in another file.
Development
Use the
> GenAIScript: Create new script...
command in the command palette to create a new script.This is an easy script. Assuming the script will take the file as an argument, you can refer to that argument in
env.files
and tell the LLM what to do with it:gen-tweet.genai.mjs script({ title: "gen-tweet" })def("FILE", env.files)$`Given the paper in FILE, write a 140 character summary of the paperthat makes the paper sound exciting and encourages readers to look at it.`Right click on the document in VS Code Explorer (it can be a
.pdf
, a.docx
, or a.md
file becausedef
knows how to read and parse all these file types). Select Run GenAIScript. Select the scriptgen-tweet
you just wrote.Assuming we give the GenAIScript a paper describing GenAIScript, the Output will be displayed in a new document tab.
Discover GenAIScript: a revolutionary scripting language integrating AI to automate complex tasks, making coding accessible to all! #AI #CodingFutureBecause we didn’t tell the LLM to write the output to a file, it will by default go to standard out.
Automation
We can run the script from the command line:
Terminal window npx genaiscript run gen-tweet example1.pdfThe output will be displayed in the terminal.
Now that we have the script working for a single file, we can use the command line to apply it to a list of files. Let’s assume you start with a file
ex1.pdf
you want the output in a new fileex1.tweet.md
. How you do this depends on the shell script you prefer.for file in *.pdf; donewfile="${file%.pdf}.tweet.md"; # foo.pdf -> foo.tweet.mdif [ ! -f "$newfile" ]; then # skip if already existsnpx genaiscript run gen-tweet $file > $newfilefidoneGet-ChildItem -Filter *.pdf | ForEach-Object {$newName = $_.BaseName + ".tweet.md"if (-not (Test-Path $newName)) {npx genaiscript run gen-tweet $_.FullName | Set-Content "$newName"}}import subprocess, sys, osfor input_file in sys.argv[1:]:output_file = os.path.splitext(input_file)[0] + '.tweet.md'if not os.path.exists(output_file):with open(output_file, 'w') as outfile:result = subprocess.check_output(["npx", "genaiscript", "run", "gen-tweet",input_file], universal_newlines=True)outfile.write(result)#!/usr/bin/env zximport "zx/globals"const files = await glob("*.pdf")for (const file of files) {const out = file.replace(/\.pdf$/i, ".tweet.md") // foo.pdf -> foo.tweet.mdif (!(await fs.exists(out)))// don't regenerate if it already existsawait $`genaiscript run gen-tweet ${file} > ${out}`}This script requires zx.