Instruction Optimization

7. Instruction Optimization#

We use the term instruction optimization to refer to the problem of finding the task instructions that maximize some target metric (e.g., accuracy).

Note

We will work with an extremely small number of data instances here to show the general flow. We recommend using 100+ examples for train and test..

We start by initalizing things as before.

7.1. Step 1: Defining the set of initial candidates#

Our plan is to use beam search with mutation operators to refine a set of initial candidates. Similar to using grid search previously, we can use the same syntax to define a parametric set of initial candidates.

7.1.1. Using Callables to bind static values#

A very common problem is that of having a set of static values, e.g., configuration or input datasets, that are needed in constructing a metaprompt.

To bind these static values, we recommend using callables. These are objects that behave like functions but can be initalized with the static values for the task. In essence, they behave like partially bound functions but offer a cleaner interface.

Below, we show how we can bind the training dataset to the search space object so we can use its values during the construction of the initial candidate space.

from sammo.instructions import MetaPrompt, Section, Paragraph, InputData
from sammo.dataformatters import PlainFormatter
from sammo.search_op import one_of

class InititialCandidates:
    def __init__(self, dtrain):
        self.dtrain = dtrain

    def __call__(self):
        example_formatter = PlainFormatter(
            all_labels=self.dtrain.outputs.unique(), orient="item"
        )

        labels = self.dtrain.outputs.unique()
        instructions = MetaPrompt(
            [
                Paragraph("Instructions: "),
                Paragraph(
                    one_of(
                        [
                            self.dtrain.constants["instructions"],
                            "",
                            "Find the best output label given the input.",
                            self.dtrain.constants["instructions"] * 2,
                        ]
                    ),
                    reference_id="instructions",
                ),
                Paragraph("\n"),
                Paragraph(
                    f"Output labels: {', '.join(labels)}\n" if len(labels) <= 10 else ""
                ),
                Paragraph(InputData()),
                Paragraph("Output: "),
            ],
            render_as="raw",
            data_formatter=example_formatter,
        )

        return Output(
            instructions.with_extractor("raise"),
            minibatch_size=1,
            on_error="empty_result",
        )

7.2. Step 2: Define a set of mutation operators#

In each step of the beam search, SAMMO will sample a set of mutation operators and apply them to the current set of active candidates (beams).

from sammo.mutators import BagOfMutators, InduceInstructions, Paraphrase

mydata = load_data()
d_train = mydata.sample(10, seed=42)

mutation_operators = BagOfMutators(
    InititialCandidates(d_train),
    InduceInstructions("#instructions", d_train),
    Paraphrase("#instructions"),
    sample_for_init_candidates=False,
)

What we have done above is to define a set of mutators to be applied. We say that we want to initialize with our previously defined InitialCandidates set, and can apply two different mutation operations here: we can induce new instructions from labeled samples, or just paraphrase existing ones. To know what part of the metaprompt we want to apply a mutator to, we need to pass a path descriptor.

7.3. Step 3: Run the beam search#

Let’s set up our beam search and run it.

from sammo.search import BeamSearch

prompt_optimizer = BeamSearch(
            runner,
            mutation_operators,
            accuracy,
            maximize=True,
            depth=3,
            mutations_per_beam=2,
            n_initial_candidates=4,
            beam_width=4,
            add_previous=True,
    )
prompt_optimizer.fit(d_train)
prompt_optimizer.show_report()

search depth[############]3/3[00:00<00:00] >> eval[#################################]8/8 >> tasks[######]80/80[00:00<00:00, 62.50it/s]

Fitting log:
iteration    action              objective    costs                          parse_errors    prev_actions
-----------  ------------------  -----------  -----------------------------  --------------  --------------------------------------------------
-1           init                0.8          {'input': 466, 'output': 10}   0.0             ['init']
-1           init                0.8          {'input': 546, 'output': 10}   0.0             ['init']
-1           init                0.5          {'input': 576, 'output': 10}   0.0             ['init']
-1           init                0.5          {'input': 686, 'output': 10}   0.0             ['init']
0            Paraphrase          0.6          {'input': 636, 'output': 10}   0.0             ['Paraphrase', 'init']
0            Paraphrase          0.0          {'input': 516, 'output': 282}  0.0             ['Paraphrase', 'init']
0            InduceInstructions  0.6          {'input': 796, 'output': 10}   0.0             ['InduceInstructions', 'init']
0            Paraphrase          0.8          {'input': 566, 'output': 10}   0.0             ['Paraphrase', 'init']
0            Paraphrase          0.6          {'input': 586, 'output': 10}   0.0             ['Paraphrase', 'init']
0            Paraphrase          0.6          {'input': 586, 'output': 10}   0.0             ['Paraphrase', 'init']
0            InduceInstructions  0.8          {'input': 926, 'output': 10}   0.0             ['InduceInstructions', 'init']
0            Paraphrase          0.3          {'input': 696, 'output': 17}   0.0             ['Paraphrase', 'init']
1            InduceInstructions  0.7          {'input': 706, 'output': 10}   0.0             ['InduceInstructions', 'Paraphrase', 'init']
1            Paraphrase          0.8          {'input': 566, 'output': 10}   0.0             ['Paraphrase', 'Paraphrase', 'init']
1            InduceInstructions  0.8          {'input': 586, 'output': 10}   0.0             ['InduceInstructions', 'InduceInstructions',
                                                                                             'init']
1            InduceInstructions  0.9          {'input': 1136, 'output': 10}  0.0             ['InduceInstructions', 'InduceInstructions',
                                                                                             'init']
1            Paraphrase          0.3          {'input': 516, 'output': 137}  0.0             ['Paraphrase', 'init']
1            InduceInstructions  0.7          {'input': 546, 'output': 10}   0.0             ['InduceInstructions', 'init']
1            InduceInstructions  0.7          {'input': 546, 'output': 10}   0.0             ['InduceInstructions', 'init']
1            Paraphrase          0.8          {'input': 566, 'output': 10}   0.0             ['Paraphrase', 'init']
2            InduceInstructions  0.8          {'input': 626, 'output': 10}   0.0             ['InduceInstructions', 'InduceInstructions',
                                                                                             'InduceInstructions', 'init']
2            InduceInstructions  0.8          {'input': 856, 'output': 10}   0.0             ['InduceInstructions', 'InduceInstructions',
                                                                                             'InduceInstructions', 'init']
2            Paraphrase          0.8          {'input': 566, 'output': 10}   0.0             ['Paraphrase', 'Paraphrase', 'Paraphrase', 'init']
2            Paraphrase          0.8          {'input': 566, 'output': 10}   0.0             ['Paraphrase', 'Paraphrase', 'Paraphrase', 'init']
2            InduceInstructions  0.7          {'input': 636, 'output': 10}   0.0             ['InduceInstructions', 'InduceInstructions',
                                                                                             'InduceInstructions', 'init']
2            Paraphrase          0.9          {'input': 586, 'output': 10}   0.0             ['Paraphrase', 'InduceInstructions',
                                                                                             'InduceInstructions', 'init']
2            Paraphrase          0.8          {'input': 566, 'output': 10}   0.0             ['Paraphrase', 'Paraphrase', 'init']
2            Paraphrase          0.8          {'input': 566, 'output': 10}   0.0             ['Paraphrase', 'Paraphrase', 'init']
Action stats:
action              stats
------------------  -----------------------------
Paraphrase          {'chosen': 14, 'improved': 0}
InduceInstructions  {'chosen': 10, 'improved': 1}

Great! Our best prompt gets an accuracy of 0.9 on the training set. Let’s see what it came up with.

print(prompt_optimizer.best_prompt)

Output(
  child = StripWhitespace(
    child = GenerateText(
      child = MetaPrompt(
        structure = [
          0 : Paragraph(
            content = 'Instructions: ',
            id = None
          ),
          1 : Paragraph(
            content = 'The instruction given is to determine whether the second speaker's response indicates agreement or disagreement with the first speaker's statement. If the second speaker's response supports or aligns with the first speaker's statement, the output is "yes." If the second speaker's response contradicts or opposes the first speaker's statement, the output is "no."',
            id = 'instructions'
          ),
          2 : Paragraph(
            content = '
',
            id = None
          ),
          3 : Paragraph(
            content = 'Output labels: no, yes
',
            id = None
          ),
          4 : Paragraph(
            content = InputData(
              id_offset = 0,
              name = None
            ),
            id = None
          ),
          5 : Paragraph(
            content = 'Output: ',
            id = None
          )
        ],
        render_as = 'raw',
        data_formatter = PlainFormatter(
          all_labels = [
            0 : 'no',
            1 : 'yes'
          ],
          orient = 'item'
        ),
        name = None,
        seed = 0
      ),
      name = None,
      system_prompt = None,
      history = None,
      seed = 0,
      randomness = 0,
      max_tokens = None,
      on_error = 'empty_result'
    ),
    on_error = 'raise',
    flatten = True
  ),
  minibatch_size = 1,
  on_error = 'empty_result'
)