2. Components#

In this tutorial, we’ll go into more depth around the building blocks of SPPs – components.

As mentioned before, symbolic prompt programs are essentially graphs of components that are evaluated lazily with .run* methods.

Hide code cell source
# %load -r 3:25 _init.py
import pathlib
import sammo
from sammo.runners import OpenAIChat
from sammo.base import Template, EvaluationScore
from sammo.components import Output, GenerateText, ForEach, Union
from sammo.extractors import ExtractRegex
from sammo.data import DataTable
import json
import requests
import os

if not 'OPENAI_API_KEY' in os.environ:
    raise ValueError("Please set the environment variable 'OPENAI_API_KEY'.")

_ = sammo.setup_logger("WARNING")  # we're only interested in warnings for now

runner = OpenAIChat(
    model_id="gpt-3.5-turbo",
    api_config={"api_key": os.environ['OPENAI_API_KEY']},
    cache=os.getenv("CACHE_FILE", "cache.tsv"),
    timeout=30,
)

2.1. What is a component, actually?#

A component is simply a lazily evaluated function that gets information from its parents, calls its children, performs some operation on the values and returns a Result instance.

To better understand the details, let’s write our own component that sorts the output of its child node.

from sammo.base import Result, Component, Runner, VerbatimText
from frozendict import frozendict

class Sort(Component):
    async def _call(self, runner: Runner, context: dict, dynamic_context: frozendict | None) -> Result:
        intermediate_result = await self._child(runner, context, dynamic_context)
        return Result([sorted(intermediate_result.value)], parent=intermediate_result, op=self)

Pretty straightforward! We pass on the intermediate result as a parent as well as a reference to self.

Let’s try it out!

sorted_text = Output(Sort(VerbatimText("zdadf"))).run(runner)
sorted_text
+---------+-----------------------------+
| input   | output                      |
+=========+=============================+
| None    | [['a', 'd', 'd', 'f', 'z']] |
+---------+-----------------------------+
Constants: None

Because we passed on a reference to self, we can easily follow the call trace.

sorted_text.outputs[0].plot_call_trace()

2.1.1. Shortcut: LamdaExtractor#

The most flexibile solution to implement a custom function is writing your own Component which we did above. In many cases, however, it is enough to use a LambdaExtractor to apply a user-defined function (UDF).

from sammo.extractors import LambdaExtractor

sorted_text_udf = Output(LambdaExtractor(VerbatimText("zdadf"), "lambda x: sorted(x)")).run(runner)
sorted_text_udf
+---------+-----------------------------+
| input   | output                      |
+=========+=============================+
| None    | [['a', 'd', 'd', 'f', 'z']] |
+---------+-----------------------------+
Constants: None

2.2. Calling an LLM#

Perhaps the most important component is GenerateText() which calls an LLM to generate a response.

Using it is fairly straightforward.

first = GenerateText(
    "Hello! My name is Peter and I like horses.",
    system_prompt="Talk like Shakespeare.",
)
Output(first).run(runner)
+---------+--------------------------------------------------------------+
| input   | output                                                       |
+=========+==============================================================+
| None    | Hark! Good morrow, fair Peter! Thy name doth ring sweetly in |
|         | mine ears. Dost thou fancy the noble steeds that roam the    |
|         | fields? Verily, horses are a wondrous creature, full of      |
|         | grace and strength. Pray, tell me more of thy love for these |
|         | majestic beasts.                                             |
+---------+--------------------------------------------------------------+
Constants: None

To pass on history, you can use the history argument:

second = GenerateText("Write a four line poem about my favorite animal.", history=first)
poem = Output(second).run(runner)
poem
+---------+------------------------------------------------------------+
| input   | output                                                     |
+=========+============================================================+
| None    | In fields of green, the horse doth roam, With flowing mane |
|         | and spirit bold. A creature fair, a sight to behold, In my |
|         | heart, its beauty finds a home.                            |
+---------+------------------------------------------------------------+
Constants: None

When we plot the call trace, we can see that adding history is reflected in the dependencies:

poem.outputs[0].plot_call_trace()

2.3. Loops#

Loops are only needed if you need to loop over anything beyond all inputs. More on this in the next section on parallelization.

2.3.1. Static loops#

A common use case for this is when we want to repeat certain operations for a known number of times, e.g., sample LLM responses N times.

N = 5
fruits = [
    GenerateText("Generate the name of a random fruit.", randomness=0.9, seed=i)
    for i in range(N)
]
static_loop = Output(Union(*fruits))
static_loop.run(runner)
+---------+-----------------------------------------------------------+
| input   | output                                                    |
+=========+===========================================================+
| None    | ['Honeydewberry', 'Starfruit', 'Lemonberry', 'Starfruit', |
|         | 'Mangosteen.']                                            |
+---------+-----------------------------------------------------------+
Constants: None

Starfruit wins.

Note

We had to set seed to a different value in each GenerateText instance to disable local caching. Otherwise, we would get the same answer 5 times.

static_loop.plot_program()

2.3.2. Dynamic loops#

If you want to loop over all results of a previous layer, you can do this with a ForEach component.

fruits = ExtractRegex(
    GenerateText(
        "Generate a list of 5 fruits. Wrap each fruit with <item> and </item>."
    ),
    r"<item>(.*?)<.?item>"
)

fruit_blurbs = Output(ForEach(
    "fruit",
    fruits,
    GenerateText(Template("Why is {{fruit}} a good fruit in less than 25 words?")),
))
fruit_desc = fruit_blurbs.run(runner)
fruit_desc.outputs[0].plot_call_trace()

As an aside, this is a case where the static program graph looks different from the call trace. This is because we don’t know how many fruits will actually be generated until the LLM is called.

fruit_blurbs.plot_program()