Prompt Caching
Prompt caching is a feature that can reduce processing time and costs for repetitive prompts. It is supported by various LLM providers, but the implementation may vary.
ephemeral
You can mark def
section or $
function with cacheControl
set as "ephemeral"
to enable prompt caching optimization. This essentially means that it
is acceptable for the LLM provider to cache the prompt for a short amount of time.
def("FILE", env.files, { cacheControl: "ephemeral" })
$`Some very cool prompt`.cacheControl("ephemeral")
LLM provider supporet
In most cases, the ephemeral
hint is ignored by LLM providers. However, the following are supported
OpenAI, Azure OpenAI
Prompt caching of the prompt prefix is automatically enabled by OpenAI. All ephemeral annotations are removed.
Anthropic
The ephemeral
annotation is converted into 'cache-control': { ... }
field in the message object.
Note that prompt caching is still marked as beta and not supported in all models (specially the older ones).