Context (env+def)
Information about the context of script execution is available in the env
global object.
Environment (env
)
The env
global object contains properties that provide information about the script execution context.
env
is populated automatically by the GenAIScript runtime.
env.files
The env.files
array contains all files within the execution context. The context is defined implicitly
by the user based on:
script
files
option
script({ files: "**/*.pdf",})
or multiple paths
script({ files: ["src/*.pdf", "other/*.pdf"],})
-
the UI location to start the tool
-
CLI files arguments.
The files are stored in env.files
which can be injected in the prompt.
- using
def
def("FILE", env.files)
- filtered,
def("DOCS", env.files, { endsWith: ".md" })def("CODE", env.files, { endsWith: ".py" })
- directly in a
$
call
$`Summarize ${env.files}.
In this case, the prompt is automatically expanded with a def
call and the value of env.files
.
// expandedconst files = def("FILES", env.files, { ignoreEmpty: true })$`Summarize ${files}.
env.vars
The vars
property contains the variables that have been defined in the script execution context.
// grab locale from variable or default to en-USconst locale = env.vars.locale || "en-US"
Read more about variables.
Definition (def
)
The def("FILE", file)
function is a shorthand for generating a fenced variable output.
def("FILE", file)
It renders approximately to
FILE:
```file="filename"file content```
or if the model support XML tags (see fence formats):
<FILE file="filename">file content</FILE>
The def
function can also be used with an array of files, such as env.files
.
def("FILE", env.files)
Language
You can specify the language of the text contained in def
. This can help GenAIScript optimize the rendering of the text.
// hint that the output is a diffdef("DIFF", gitdiff, { language: "diff" })
Referencing
The def
function returns a variable name that can be used in the prompt.
The name might be formatted differently to accommodate the model’s preference.
const f = def("FILE", file)
$`Summarize ${f}.`
File filters
Since a script may be executed on a full folder, it is often useful to filter the files based on
- their extension
def("FILE", env.files, { endsWith: ".md" })
- or using a glob:
def("FILE", files, { glob: "**/*.{md,mdx}" })
Empty files
By default, if def
is used with an empty array of files, it will cancel the prompt. You can override this behavior
by setting ignoreEmpty
to true
.
def("FILE", env.files, { endsWith: ".md", ignoreEmpty: true })
maxTokens
It is possible to limit the number of tokens that are generated by the def
function. This can be useful when the output is too large and the model has a token limit.
The maxTokens
option can be set to a number to limit the number of tokens generated for each individual file.
def("FILE", env.files, { maxTokens: 100 })
Data filters
The def
function treats data files such as CSV and XLSX specially. It will automatically convert the data into a
markdown table format to improve tokenization.
sliceHead
, keep the top N rows
def("FILE", env.files, { sliceHead: 100 })
sliceTail
, keep the last N rows
def("FILE", env.files, { sliceTail: 100 })
sliceSample
, keep a random sample of N rows
def("FILE", env.files, { sliceSample: 100 })
Prompt Caching
You can use cacheControl: "ephemeral"
to specify that the prompt can be cached
for a short amount of time, and enable prompt caching optimization, which is supported (differently) by various LLM providers.
$`...`.cacheControl("ephemeral")
def("FILE", env.files, { cacheControl: "ephemeral" })
Read more about prompt caching.
Safety: Prompt Injection detection
You can schedule a check for prompt injection/jai break with your configured content safety provider.
def("FILE", env.files, { detectPromptInjection: true })
Predicted output
Some models, like OpenAI gpt-4o and gpt-4o-mini, support specifying a predicted output (with some limitations). This helps reduce latency for model responses where much of the response is known ahead of time. This can be helpful when asking the LLM to edit specific files.
Set the prediction: true
flag to enable it on a def
call. Note that only a single file can be predicted.
def("FILE", env.files[0], { prediction: true })
Data definition (defData
)
The defData
function offers additional formatting options for converting a data object into a textual representation. It supports rendering objects as YAML, JSON, or CSV (formatted as a Markdown table).
// render to markdown-ified CSV by defaultdefData("DATA", data)
// render as yamldefData("DATA", csv, { format: "yaml" })
The defData
function also supports functions to slice the input rows and columns.
headers
, list of column names to includesliceHead
, number of rows or fields to include from the beginningsliceTail
, number of rows or fields to include from the endsliceSample
, number of rows or fields to pick at randomdistinct
, list of column names to deduplicate the data based onquery
, a jq query to filter the data
defData("DATA", data, { sliceHead: 5, sliceTail: 5, sliceSample: 100,})
You can leverage the data filtering functionality
using parsers.tidyData
as well.
Diff Definition (defDiff
)
It is very common to compare two pieces of data and ask the LLM to analyze the differences. Using diffs is a great way to naturally compress the information since we only focus on differences!
The defDiff
takes care of formatting the diff in a way that helps LLM reason. It behaves similarly to def
and assigns
a name to the diff.
// diff filesdefDiff("DIFF", env.files[0], env.files[1])
// diff stringsdefDiff("DIFF", "cat", "dog")
// diff objectsdefDiff("DIFF", { name: "cat" }, { name: "dog" })
You can leverage the diff functionality using parsers.diff
.