Skip to content

Search And Transform

This script is an evolution of the “search and replace” feature from text editor, where the “replace” step has been replaced by a LLM transformation.

It can be useful to batch apply text transformations that are not easily done with regular expressions.

For example, when GenAIScript added the ability to use a string command string in the exec command, we needed to convert all script using

host.exec("cmd", ["arg0", "arg1", "arg2"])

to

host.exec(`cmd arg0 arg1 arg2`)`

While it’s possible to match this function call with a regular expression

host\.exec\s*\([^,]+,\s*\[[^\]]+\]\s*\)

it’s not easy to formulate the replacement string… unless you can describe it in natural language:

Convert the call to a single string command shell in TypeScript

Here are some example of the transformations where the LLM correctly handled variables.

  • concatenate the arguments of a function call into a single string
const { stdout } = await host.exec("git", ["diff"])
const { stdout } = await host.exec(`git diff`)
  • concatenate the arguments and use the ${} syntax to interpolate variables
const { stdout: commits } = await host.exec("git", [
"log",
"--author",
author,
"--until",
until,
"--format=oneline",
])
const { stdout: commits } = await host.exec(`git log --author ${author} --until ${until} --format=oneline`)

The search step is done with the workspace.grep that allows to efficiently search for a pattern in files (this is the same search engine that powers the Visual Studio Code search).

const { pattern, glob } = env.vars
const patternRx = new RegExp(pattern, "g")
const { files } = await workspace.grep(patternRx, glob)

Compute Transforms

The second step is to apply the regular expression to the file content and pre-compute the LLM transformation of each match using an inline prompt.

const { transform } = env.vars
...
const patches = {} // map of match -> transformed
for (const file of files) {
const { content } = await workspace.readText(file.filename)
for (const match of content.matchAll(patternRx)) {
const res = await runPrompt(
(ctx) => {
ctx.$`
## Task
Your task is to transform the MATCH with the following TRANSFORM.
Return the transformed text.
- do NOT add enclosing quotes.
## Context
`
ctx.def("MATCHED", match[0])
ctx.def("TRANSFORM", transform)
},
{ label: match[0], system: [], cache: "search-and-transform" }
)
...

Since the LLM sometimes decides to wrap the answer in quotes, we need to remove them.

...
const transformed = res.fences?.[0].content ?? res.text
patches[match[0]] = transformed

Transform

Finally, with the transforms pre-computed, we apply a final regex replace to patch the old file content with the transformed strings.

const newContent = content.replace(
patternRx,
(match) => patches[match] ?? match
)
await workspace.writeText(file.filename, newContent)
}

Parameters

The script takes three parameters: a file glob, a pattern to search for, and a LLM transformation to apply. We declare these parameters in the script metadata and extract them from the env.vars object.

script({ ...,
parameters: {
glob: {
type: "string",
description: "The glob pattern to filter files",
default: "*",
},
pattern: {
type: "string",
description: "The text pattern (regular expression) to search for",
},
transform: {
type: "string",
description: "The LLM transformation to apply to the match",
},
},
})
const { pattern, glob, transform } = env.vars

Full source

st.genai.mts
script({
title: "Search and transform",
description:
"Search for a pattern in files and apply a LLM transformation the match",
parameters: {
glob: {
type: "string",
description: "The glob pattern to filter files",
default: "*",
},
pattern: {
type: "string",
description: "The text pattern (regular expression) to search for",
},
transform: {
type: "string",
description: "The LLM transformation to apply to the match",
},
},
})
const { pattern, glob, transform } = env.vars
if (!pattern) cancel("pattern is missing")
const patternRx = new RegExp(pattern, "g")
if (!transform) cancel("transform is missing")
const { files } = await workspace.grep(patternRx, glob)
// cached computed transformations
const patches = {}
for (const file of files) {
console.log(file.filename)
const { content } = await workspace.readText(file.filename)
// skip binary files
if (!content) continue
// compute transforms
for (const match of content.matchAll(patternRx)) {
console.log(` ${match[0]}`)
if (patches[match[0]]) continue
const res = await runPrompt(
(_) => {
_.$`
## Task
Your task is to transform the MATCH with the following TRANSFORM.
Return the transformed text.
- do NOT add enclosing quotes.
## Context
`
_.def("MATCHED", match[0])
_.def("TRANSFORM", transform)
},
{ label: match[0], system: [], cache: "search-and-transform" }
)
const transformed = res.fences?.[0].content ?? res.text
if (transformed) patches[match[0]] = transformed
console.log(` ${match[0]} -> ${transformed ?? "?"}`)
}
// apply transforms
const newContent = content.replace(
patternRx,
(match) => patches[match] ?? match
)
// save results if file content is modified
if (content !== newContent)
await workspace.writeText(file.filename, newContent)
}

To run this script, you can use the --vars option to pass the pattern and the transform.

Terminal window
genaiscript st --vars 'pattern=host\.exec\s*\([^,]+,\s*\[[^\]]+\]\s*\)' 'transform=Convert the call to a single string command shell in TypeScript'