5. Minibatching#

When annotating data, it is quite wasteful to use one LLM request per example, especially when the instructions are shared.

Instead, we can use in-context minibatching to improve call efficiency to annotate multiple data examples in one call. Let’s repeat the setup from the previous section.

Hide code cell source
# %load -r 3:25 _init.py
import pathlib
import sammo
from sammo.runners import OpenAIChat
from sammo.base import Template, EvaluationScore
from sammo.components import Output, GenerateText, ForEach, Union
from sammo.extractors import ExtractRegex
from sammo.data import DataTable
import json
import requests
import os

if not "OPENAI_API_KEY" in os.environ:
    raise ValueError("Please set the environment variable OPENAI_API_KEY'.")

_ = sammo.setup_logger("WARNING")  # we're only interested in warnings for now

runner = OpenAIChat(
    model_id="gpt-3.5-turbo-16k",
    api_config={"api_key": os.getenv("OPENAI_API_KEY")},
    cache=os.getenv("CACHE_FILE", "cache.tsv"),
    timeout=30,
)
Hide code cell source
# %load -s load_data,accuracy _init.py
def load_data(
    url="https://github.com/google/BIG-bench/raw/main/bigbench/benchmark_tasks/implicatures/task.json",
):
    task = json.loads(requests.get(url).content)
    # convert label to single string
    for x in task["examples"]:
        x["output"] = max(x["target_scores"], key=x["target_scores"].get)

    return DataTable.from_records(
        task["examples"],
        input_fields="input",
        constants={"instructions": task["task_prefix"]},
    )

def accuracy(y_true: DataTable, y_pred: DataTable) -> EvaluationScore:
    y_true = y_true.outputs.values
    y_pred = y_pred.outputs.values
    n_correct = sum([y_p == y_t for y_p, y_t in zip(y_pred, y_true)])

    return EvaluationScore(n_correct / len(y_true))

5.1. Manual Minibatching#

Let’s start by doing manual minibatching. SAMMO will split inputs into minibatches of a specified size for us. The only thing we have to do is loop over the template variable {{inputs}} using the handlebars syntax.

labeling_prompt = GenerateText(
    Template(
        "Instructions:{{constants.instructions}}\nOutput labels: yes, no\n"
        "{{#each inputs}}Input: {{this}}{{/each}}\nOutput:"
    )
)

The only other changes we need to make is to specify the minibatch size in the Output component and also make sure the output gets split into lines.

labeling_outputter = Output(labeling_prompt, minibatch_size=10)
mydata = load_data()
sample = mydata.sample(10, seed=42)

try:
    result = labeling_outputter.run(runner, sample)
except Exception as e:
    print(f"\nException: {e}")
minibatches[###################################################################################]1/1[00:01<00:00, 0.90it/s]

Exception: Minibatch results do not have right length (need: 10, got: 1)

Oh, no - there is something wrong with the minibatch results. The number of answers we get from a single LLM call need to be aligned with all the input rows which is where we fail.

Going back to the prompt, we realize that we forgot to extract all valid answers from the GenerateText call! Let’s fix that.

labeling_outputter = Output(
    ExtractRegex(labeling_prompt, "(?i)yes|no"), minibatch_size=10
)
result = labeling_outputter.run(runner, sample)
result
minibatches[###################################################################################]1/1[00:00<??:??, 0.00it/s]
+--------------------------------------------------------------+----------+
| input                                                        | output   |
+==============================================================+==========+
| Speaker 1: 'You do this often?' Speaker 2: 'It's my first    | yes      |
| time.'                                                       |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Are you trying to make me mad?' Speaker 2: 'I'm  | no       |
| just saying, I'd understand if you were upset. '             |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'You want answers?!' Speaker 2: 'I want the       | no       |
| truth.'                                                      |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Are you able to carry the box?' Speaker 2: 'It   | yes      |
| is as light as a feather.'                                   |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Is it hot outside?' Speaker 2: 'You could fry an | no       |
| egg on the sidewalk.'                                        |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Should we repay you?' Speaker 2: 'There is no    | no       |
| charge for awesomeness, or attractiveness.'                  |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'I wonder, Bob, if you can handle my car?'        | no       |
| Speaker 2: 'It's an ordinary six cylinder.'                  |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Did you order the code red?' Speaker 2: 'You're  | yes      |
| goddamn right.'                                              |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'You've seen rain before... right?' Speaker 2:    | no       |
| 'We don't get out much.'                                     |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Does anyone know how to pick a lock?' Speaker 2: | yes      |
| 'Sure. Picking locks is my thing.'                           |          |
+--------------------------------------------------------------+----------+
Constants: {'instructions': "Does Speaker 2's answer mean yes or no? "}

Nice! With a single LLM call, we now annotated 10 examples! It was, however, a bit annoying to have to manually format the minibatches. Luckily, SAMMO provides a MetaTemplate class for common data annotation tasks that simplifies the set-up considerably.

5.2. Automatic minibatching with the MetaPrompt component#

The MetaPrompt component takes a nested list of instructions, an argument specifying how instructions are rendered and a DataFormatter instance that is shared for in-context examples and input examples.

from sammo.instructions import MetaPrompt, Section, Paragraph, InputData, FewshotExamples
from sammo.dataformatters import (
    QuestionAnswerFormatter,
    JSONDataFormatter
)

mprompt = MetaPrompt(
    [
        Section("Instructions", mydata.constants["instructions"]),
        Section("Examples", FewshotExamples(mydata.sample(3, seed=43))),
        Paragraph("\nOutput labels: yes, no"),
        Paragraph(InputData()),
    ],
    render_as="markdown",
    data_formatter=QuestionAnswerFormatter(["yes", "no"]),
)
# automatically wraps it with the right parser component
mprompt_parsed = mprompt.with_extractor("empty_result")

We have now structured our labeling task into a section and a few paragraphs. The DataFormatter class does all the data formatting for us, and calling with_extractor() wraps the response with the right extractor class to match our data formatter. We have also added a section with fewshot (incontext) examples to show the model how the output format looks.

We can just look at the current metaprompt to see what was generated:

mprompt_parsed.plot_program()

We can see that the output from GenerateText gets parsed by ExtractRegex. Let’s run it on our data.

result = Output(mprompt_parsed, minibatch_size=5, on_error="empty_result").run(
    runner, sample
)
result[:5]
minibatches[###################################################################################]2/2[00:00<??:??, 0.00it/s]
+--------------------------------------------------------------+----------+
| input                                                        | output   |
+==============================================================+==========+
| Speaker 1: 'You do this often?' Speaker 2: 'It's my first    | no       |
| time.'                                                       |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Are you trying to make me mad?' Speaker 2: 'I'm  | no       |
| just saying, I'd understand if you were upset. '             |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'You want answers?!' Speaker 2: 'I want the       | yes      |
| truth.'                                                      |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Are you able to carry the box?' Speaker 2: 'It   | yes      |
| is as light as a feather.'                                   |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Is it hot outside?' Speaker 2: 'You could fry an | yes      |
| egg on the sidewalk.'                                        |          |
+--------------------------------------------------------------+----------+
Constants: {'instructions': "Does Speaker 2's answer mean yes or no? "}

Instead of plotting the call trace, we can also programatically access various intermediate values. Let’s look at what an actual prompt looked like:

print(result.outputs.llm_requests[0][0])
# Instructions
Does Speaker 2's answer mean yes or no? 

# Examples
Q[0]: Speaker 1: 'Should I bring my umbrella?' Speaker 2: 'Better safe than sorry.'
A[0]: yes

Q[1]: Speaker 1: 'Do you have a girl worth fighting for?' Speaker 2: 'Wish that I had.'
A[1]: no

Q[2]: Speaker 1: 'Do you think I should attend the interview?' Speaker 2: 'Do you want to be a failure for the rest of your life?'
A[2]: yes



Output labels: yes, no


Q[0]: Speaker 1: 'You do this often?' Speaker 2: 'It's my first time.'

Q[1]: Speaker 1: 'Are you trying to make me mad?' Speaker 2: 'I'm just saying, I'd understand if you were upset. '

Q[2]: Speaker 1: 'You want answers?!' Speaker 2: 'I want the truth.'

Q[3]: Speaker 1: 'Are you able to carry the box?' Speaker 2: 'It is as light as a feather.'

Q[4]: Speaker 1: 'Is it hot outside?' Speaker 2: 'You could fry an egg on the sidewalk.'

5.2.1. Changing the data format#

How about using JSON instead of this line-by-line format?

modified_mprompt = mprompt.clone().rebind({r"data_formatter": JSONDataFormatter()})

Voilà! Let’s run it on the data.

result = Output(
    modified_mprompt.with_extractor("empty_result"), minibatch_size=5, on_error="empty_result"
).run(runner, sample)
result[:5]
minibatches[###################################################################################]2/2[00:03<00:00, 0.62it/s]
+--------------------------------------------------------------+----------+
| input                                                        | output   |
+==============================================================+==========+
| Speaker 1: 'You do this often?' Speaker 2: 'It's my first    | no       |
| time.'                                                       |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Are you trying to make me mad?' Speaker 2: 'I'm  | no       |
| just saying, I'd understand if you were upset. '             |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'You want answers?!' Speaker 2: 'I want the       | no       |
| truth.'                                                      |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Are you able to carry the box?' Speaker 2: 'It   | yes      |
| is as light as a feather.'                                   |          |
+--------------------------------------------------------------+----------+
| Speaker 1: 'Is it hot outside?' Speaker 2: 'You could fry an | no       |
| egg on the sidewalk.'                                        |          |
+--------------------------------------------------------------+----------+
Constants: {'instructions': "Does Speaker 2's answer mean yes or no? "}
print(result.outputs.llm_requests[0][0])
# Instructions
Does Speaker 2's answer mean yes or no? 

# Examples
[{"id": 0, "input": "Speaker 1: 'Should I bring my umbrella?' Speaker 2: 'Better safe than sorry.'", "output": "yes"}, {"id": 1, "input": "Speaker 1: 'Do you have a girl worth fighting for?' Speaker 2: 'Wish that I had.'", "output": "no"}, {"id": 2, "input": "Speaker 1: 'Do you think I should attend the interview?' Speaker 2: 'Do you want to be a failure for the rest of your life?'", "output": "yes"}]


Output labels: yes, no


[{"id": 0, "input": "Speaker 1: 'You do this often?' Speaker 2: 'It's my first time.'"}, {"id": 1, "input": "Speaker 1: 'Are you trying to make me mad?' Speaker 2: 'I'm just saying, I'd understand if you were upset. '"}, {"id": 2, "input": "Speaker 1: 'You want answers?!' Speaker 2: 'I want the truth.'"}, {"id": 3, "input": "Speaker 1: 'Are you able to carry the box?' Speaker 2: 'It is as light as a feather.'"}, {"id": 4, "input": "Speaker 1: 'Is it hot outside?' Speaker 2: 'You could fry an egg on the sidewalk.'"}]

This is the convinience of using the MetaPrompt class!