3. Data & Templating#
SAMMO uses DataTables as a thin wrapper around lists of dictionaries. They also separate your data input from the desired or actual output.
3.1. Loading data#
First, letโs download the implicatures dataset from BIGBENCH as an example:
import requests
import json
URL = "https://github.com/google/BIG-bench/raw/main/bigbench/benchmark_tasks/implicatures/task.json"
task = json.loads(requests.get(URL).content)
# convert label to single string
for x in task["examples"]:
    x["output"] = max(x["target_scores"], key=x["target_scores"].get)
With DataTables, there are two kinds of information: inputs and outputs. Inputs are treated as immutable while outputs are mutable. This protects against accidentally changing the starting data. To build the DataTable, we need to specify which fields should be used as inputs and which as outputs.
from sammo.data import DataTable
mydata = DataTable.from_records(
    task["examples"],
    input_fields="input",
    constants={"instructions": task["task_prefix"]},
)
mydata[:3]
+-------------------------------------------------------------+----------+
| input                                                       | output   |
+=============================================================+==========+
| Speaker 1: 'But aren't you afraid?' Speaker 2: 'Ma'am,      | no       |
| sharks never attack anybody.'                               |          |
+-------------------------------------------------------------+----------+
| Speaker 1: 'Do you want to quit?' Speaker 2: 'I've never    | no       |
| been the type of person who throws in the towel when things |          |
| get tough.'                                                 |          |
+-------------------------------------------------------------+----------+
| Speaker 1: 'Should I convince these clients?' Speaker 2:    | yes      |
| 'These are really important clients with deep pockets.'     |          |
+-------------------------------------------------------------+----------+
Constants: {'instructions': "Does Speaker 2's answer mean yes or no? "}
Much easier to read! We also added task instructions as a constant.
There are other ways to construct DataTables, e.g., from a pandas DataFrame.
import pandas as pd
df = pd.DataFrame(task["examples"])
mydata = DataTable.from_pandas(df, input_fields="input", constants={"instructions": task["task_prefix"]})
mydata[:3]
+-------------------------------------------------------------+----------+
| input                                                       | output   |
+=============================================================+==========+
| Speaker 1: 'But aren't you afraid?' Speaker 2: 'Ma'am,      | no       |
| sharks never attack anybody.'                               |          |
+-------------------------------------------------------------+----------+
| Speaker 1: 'Do you want to quit?' Speaker 2: 'I've never    | no       |
| been the type of person who throws in the towel when things |          |
| get tough.'                                                 |          |
+-------------------------------------------------------------+----------+
| Speaker 1: 'Should I convince these clients?' Speaker 2:    | yes      |
| 'These are really important clients with deep pockets.'     |          |
+-------------------------------------------------------------+----------+
Constants: {'instructions': "Does Speaker 2's answer mean yes or no? "}
3.2. Indexing#
Outputs can be assigned new values using the usual slicing syntax:
cloned = mydata.copy()
cloned.outputs[:] = "yes"
cloned.outputs.unique()
['yes']
If inputs are dictionaries, we can use the .field() function to access those.
struc_dt = DataTable([{"one": 1, "two": 2}])
print(struc_dt)
print(struc_dt.inputs.field("one"))
+----------------------+----------+
| input                | output   |
+======================+==========+
| {'one': 1, 'two': 2} | None     |
+----------------------+----------+
Constants: None
+---------+----------+
| input   | output   |
+=========+==========+
| 1       | None     |
+---------+----------+
Constants: None
We can also individually query inputs and outputs, for example when we want only positive instances.
mydata.outputs.filtered_on(lambda x: x == "yes")
+--------------------------------------------------------------+----------+
| input                                                        | output   |
+==============================================================+==========+
| Speaker 1: 'Should I convince these clients?' Speaker 2:     | yes      |
| 'These are really important clients with deep pockets.'      |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'You have it, then?' Speaker 2: 'I had to slit a  | yes      |
| few throats to get it.'                                      |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Do they fight?' Speaker 2: 'They fight like cats | yes      |
| and dogs.'                                                   |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Do you want to come out for a juice?' Speaker 2: | yes      |
| 'I am so thirsty that my throat is as dry as a bone.'        |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'And do you want my permission?' Speaker 2: 'I'd  | yes      |
| like yours too.'                                             |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Did I do it well?' Speaker 2: 'You were as brave | yes      |
| as a lion.'                                                  |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'You want answers?!' Speaker 2: 'I want the       | yes      |
| truth.'                                                      |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Do you think hysteria could do that?' Speaker 2: | yes      |
| 'It's being doing it for centuries.'                         |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'This is a costume?' Speaker 2: 'Aaaiyyyy...      | yes      |
| worked on it all night long!'                                |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Should I bring my umbrella?' Speaker 2: 'Better  | yes      |
| safe than sorry.'                                            |          |
+--------------------------------------------------------------+----------+
Constants: {'instructions': "Does Speaker 2's answer mean yes or no? "}
3.3. Templating#
Letโs annotate 10 examples from the implicatures dataset. Below, we initialize our runner as before.
Show code cell source
# %load -r 3:25 _init.py
import pathlib
import sammo
from sammo.runners import OpenAIChat
from sammo.base import Template, EvaluationScore
from sammo.components import Output, GenerateText, ForEach, Union
from sammo.extractors import ExtractRegex
from sammo.data import DataTable
import json
import requests
import os
if not 'OPENAI_API_KEY' in os.environ:
    raise ValueError("Please set the environment variable 'OPENAI_API_KEY'.")
_ = sammo.setup_logger("WARNING")  # we're only interested in warnings for now
runner = OpenAIChat(
    model_id="gpt-3.5-turbo",
    api_config={"api_key": os.environ['OPENAI_API_KEY']},
    cache=os.getenv("CACHE_FILE", "cache.tsv"),
    timeout=30,
)
To refer to fields in the DataTable, SAMMO automatically recognizes the values constants and input (or inputs, if minibatching is activated).
Other than that, it follows the standard handlebar.js syntax.
labeling_prompt = GenerateText(
    Template(
        "Instructions:{{constants.instructions}}\nOutput labels: yes, no\nInput: {{input}}\nOutput:"
    )
)
sample = mydata.sample(10, seed=42)
result = Output(labeling_prompt).run(runner, sample)
result
minibatches[#################################################################################]10/10[00:00<??:??, 0.00it/s]
+--------------------------------------------------------------+----------+
| input                                                        | output   |
+==============================================================+==========+
| Speaker 1: 'You do this often?' Speaker 2: 'It's my first    | no       |
| time.'                                                       |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Are you trying to make me mad?' Speaker 2: 'I'm  | no       |
| just saying, I'd understand if you were upset. '             |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'You want answers?!' Speaker 2: 'I want the       | no       |
| truth.'                                                      |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Are you able to carry the box?' Speaker 2: 'It   | yes      |
| is as light as a feather.'                                   |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Is it hot outside?' Speaker 2: 'You could fry an | yes      |
| egg on the sidewalk.'                                        |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Should we repay you?' Speaker 2: 'There is no    | no       |
| charge for awesomeness, or attractiveness.'                  |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'I wonder, Bob, if you can handle my car?'        | no       |
| Speaker 2: 'It's an ordinary six cylinder.'                  |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Did you order the code red?' Speaker 2: 'You're  | yes      |
| goddamn right.'                                              |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'You've seen rain before... right?' Speaker 2:    | no       |
| 'We don't get out much.'                                     |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Does anyone know how to pick a lock?' Speaker 2: | yes      |
| 'Sure. Picking locks is my thing.'                           |          |
+--------------------------------------------------------------+----------+
Constants: {'instructions': "Does Speaker 2's answer mean yes or no? "}
Outputs have three different access methods. First, if we only want the list of final values, we can use the .values property. These are also the values shown by default.
y_pred = result.outputs.values
y_pred[:5]
['no', 'no', 'no', 'yes', 'yes']
If we want lower-level access to the result objects, we can call .raw.
result.outputs.raw_values
[LLMResult(value='no', parent=TextResult),
 LLMResult(value='no', parent=TextResult),
 LLMResult(value='no', parent=TextResult),
 LLMResult(value='yes', parent=TextResult),
 LLMResult(value='yes', parent=TextResult),
 LLMResult(value='no', parent=TextResult),
 LLMResult(value='no', parent=TextResult),
 LLMResult(value='yes', parent=TextResult),
 LLMResult(value='no', parent=TextResult),
 LLMResult(value='yes', parent=TextResult)]
This returns the set of underlying result objects which also keep track of the entire chain of calls.
Finally, we can use .normalized_values() to apply common post-processing steps, e.g., replacing missing values or making them a list.
This can be useful when computing metrics, e.g., accuracy below:
y_true = sample.outputs.normalized_values(on_empty="")
n_correct = sum([y_p == y_t for y_p, y_t in zip(y_pred, y_true)])
accuracy = n_correct / len(y_true)
accuracy
0.8
Not bad, but it still seems to be a hard task. Letโs see what we can do with basic prompt engineering in the next section.