Data Schemas
It is possible to force the LLM to generate data that conforms to a specific schema. This technique works reasonably well and GenAIScript also provides automatic validation “just in case”.
You will notice that the schema supported by GenAIScript is much simpler than the full-blow JSON schema specification. We recommend using simple schemas to avoid confusing the LLM; then port them to your application specific data format later on.
defSchema
Use defSchema
to define a JSON/YAML schema for the prompt output.
const schema = defSchema("CITY_SCHEMA", { type: "array", description: "A list of cities with population and elevation information.", items: { type: "object", description: "A city with population and elevation information.", properties: { name: { type: "string", description: "The name of the city." }, population: { type: "number", description: "The population of the city.", }, url: { type: "string", description: "The URL of the city's Wikipedia page.", }, }, required: ["name", "population", "url"], },})
$`Generate data using JSON compliant with ${schema}.`
👤 user
CITY_SCHEMA:
```typescript-schema// A list of cities with population and elevation information.type CITY_SCHEMA = Array<{ // The name of the city. name: string, // The population of the city. population: number, // The URL of the city's Wikipedia page. url: string, }>```
Generate data using JSON compliant with CITY_SCHEMA.
🤖 assistant
File ./data.json:
```json schema=CITY_SCHEMA[ { "name": "New York", "population": 8398748, "url": "https://en.wikipedia.org/wiki/New_York_City" }, { "name": "Los Angeles", "population": 3990456, "url": "https://en.wikipedia.org/wiki/Los_Angeles" }, { "name": "Chicago", "population": 2705994, "url": "https://en.wikipedia.org/wiki/Chicago" }]```
Native zod support
A Zod type can be passed in defSchema
and it will be automatically converted to JSON schema.
The GenAIScript also exports the z
object from Zod for convenience.
// import from genaiscriptimport { z } from "genaiscript/runtime"// or directly from zod// import { z } from "zod"// create schema using zodconst CitySchema = z.array( z.object({ name: z.string(), population: z.number(), url: z.string(), }))// JSON schema to constrain the output of the tool.const schema = defSchema("CITY_SCHEMA", CitySchema)
Prompt encoding
Following the “All You Need Is Types” approach from TypeChat, the schema is converted TypeScript types before being injected in the LLM prompt.
// A list of cities with population and elevation information.type CITY_SCHEMA = Array<{ // The name of the city. name: string // The population of the city. population: number // The URL of the city's Wikipedia page. url: string}>
You can change this behavior by using the { format: "json" }
option.
const schema = defSchema("CITY_SCHEMA", {...}, { format: "json" })
Use the schema
Then tell the LLM to use this schema to generate data.
const schema = defSchema(...)$`Use ${schema} for the JSON schema.`
Validation
When a JSON/YAML payload is generated with the schema identifier, GenAIScript automatically validates the payload against the schema.
Repair
GenAIScript will automatically try to repair the data by issues additional messages back to the LLM with the parsing output.
Runtime Validation
Use parsers.validateJSON
to validate JSON when running the script.
const validation = parsers.validateJSON(schema, json)