Multi-Agent: Negotiation Arena

Multi-Agent: Negotiation Arena#

Introduction#

This notebook will guide you through the process of setting up and optimizing prompts for a trading game between two players named Alice and Bob. The goal is to maximize the overall value of resources in both players’ inventories.

Setup#

First, we’ll import the necessary packages and set up our environment.

# Import necessary libraries
import os
from openai import OpenAI
import json

import opto.trace as trace
from opto.optimizers import OptoPrime
from autogen import config_list_from_json

config = config_list_from_json("OAI_CONFIG_LIST")
key = None
for c in config:
    if c['model'] == 'gpt-4-0125-preview':
        key = c['api_key']
        break
if key is None:
    raise Exception("No key found for gpt-4-0125-preview in the provided config file")

client = OpenAI(api_key=key)

Define Game Components#

Next, we’ll define the nodes and system prompts used in the game.

# Define nodes for player names and prompts
p1_name = trace.node('Alice', trainable=False)
p2_name = trace.node('Bob', trainable=False)
p1_prompt = trace.node('STOCKPILE THESE RESOURCES: N/A', trainable=True)
p2_prompt = trace.node('STOCKPILE THESE RESOURCES: N/A', trainable=True)

# Define system prompt
system_prompt = f"""
RULES of the TRADING GAME between two players named {p1_name.data} and {p2_name.data}.

Each player's inventory is private and consists of three resources, WOOD, STONE, and GOLD.
The higher the quantity of a resource a player has, the higher the value of that resource.
The value of a resource is determined by a scale that increases exponentially with quantity.
The goal of the game is to maximize the total value of the resources in all players' inventories (OVERALL SCORE).

The game is played in turns, with each player taking one action per turn.
Turns alternate between the two players, starting with {p1_name.data}.
Trading is the only way to exchange resources between players.
Players can choose to end the game when they think subsequent trades will be rejected or not beneficial.

Each player can do one of 4 actions: propose a trade, accept a trade, reject a trade or end the game.
Propose a trade: A proposed trade must barter the same quantity of one resource for another.
A player can only propose a trade if they have sufficient quantity of the resource they are trading away.
Accept a trade: A player can only accept a trade if they have sufficient quantity of the resource that they in turn are trading away.
Reject a trade: A player can reject a trade if they do not have sufficient quantity of the resource, or if it would lead to lower OVERALL SCORE.
End game: If both players select the end game action, the game ends and the overall value of both players' inventories are tallied up to produce the OVERALL SCORE.
NOTE: BOTH players must select the end game action during their respective turns for the game to end.

Each of the four actions must be formatted as a valid json object that can be parsed by python json.loads:
Example of proposing a trade = {{'action': 'TRADE', 'sell_resource': 'WOOD', 'buy_resource': 'STONE', 'quantity': 5}}
Example of accepting a trade = {{'action': 'ACCEPT'}}
Example of rejecting a trade = {{'action': 'REJECT'}}
Example of ending the game = {{'action': 'END'}}
"""

# Initialize game state variables
p1_inventory = {'WOOD': 4, 'STONE': 3, 'GOLD': 2}
p2_inventory = {'WOOD': 1, 'STONE': 5, 'GOLD': 2}
proposed_trade = None
proposed_end = False
conversation = []

Using bundle to wrap helper functions#

We’ll create functions to handle message formatting, parsing, and game actions.

# Function to create a message for the LLM
@trace.bundle(trainable=False)
def create_message(player, prompt, previous_message=None):
    global p1_inventory
    global p2_inventory

    player_prompt = f'In the trading game, you are named {player}.\n'
    messages = [{'role': 'system', 'content': player_prompt}, {'role': 'system', 'content': prompt}]
    
    current_inventory = p1_inventory if player == "Alice" else p2_inventory
    inventory_message = f'Your inventory consists of {current_inventory["WOOD"]} WOOD, {current_inventory["STONE"]} STONE, and {current_inventory["GOLD"]} GOLD.'
    messages.append({'role': 'user', 'content': inventory_message})

    return messages

# Function to parse responses
def parse(player, response_json):
    global p1_inventory
    global p2_inventory

    sell_resource = response_json['sell_resource']
    buy_resource = response_json['buy_resource']
    quantity = response_json['quantity']
    if player == "Alice":
        if p1_inventory[sell_resource] < quantity:
            return None
        if p2_inventory[buy_resource] < quantity:
            return None
        return {"Alice": {sell_resource: -quantity, buy_resource: quantity}, 
                "Bob": {sell_resource: quantity, buy_resource: -quantity}}
    else:
        if p2_inventory[sell_resource] < quantity:
            return None
        if p1_inventory[buy_resource] < quantity:
            return None
        return {"Alice": {sell_resource: quantity, buy_resource: -quantity}, 
                "Bob": {sell_resource: -quantity, buy_resource: quantity}}

# Function to accept a trade
def accept_trade():
    global proposed_trade
    global p1_inventory
    global p2_inventory

    current_dict = proposed_trade["Alice"]
    for key in current_dict:
        p1_inventory[key] += current_dict[key]
    
    current_dict = proposed_trade["Bob"]
    for key in current_dict:
        p2_inventory[key] += current_dict[key]

Define Chat Function#

We define a function to handle the chat between the different players.

@trace.bundle(trainable=False)
def chat(player, message):
    global system_prompt
    global conversation
    global proposed_trade
    global proposed_end
    
    current_message = [{'role': 'system', 'content': system_prompt}] + message

    if len(conversation) > 0:
        current_message.append({'role': 'user', 'content': 'This is the transcript of the conversation so far.'})
        conversation_history = ""
        for i in conversation:
            conversation_history += f'{i["role"]} said: {i["content"]}\n'
        current_message.append({'role': 'user', 'content': conversation_history})

    chat = client.chat.completions.create(
            model='gpt-4-0125-preview',
            messages=current_message,
            temperature=0,
            max_tokens=200,
            seed=42,
            response_format={ "type": "json_object" }
        )
    
    response = chat.choices[0].message.content
    response_json = json.loads(response)
    
    action = response_json['action']
    
    if action == 'END':
        if proposed_end:
            return 'TERMINATE'
        else:
            proposed_end = True
    elif action == 'REJECT':
        proposed_trade = None
        if proposed_end:
            proposed_end = False
    elif action == 'ACCEPT':
        if proposed_trade is not None:
            accept_trade()
        elif proposed_end:
            return 'TERMINATE'
    elif action == 'TRADE':
        proposed_trade = parse(player,response_json)
        if proposed_end:
            proposed_end = False
    
    return response

Define the end_game function#

This function calculates the final score based on the players’ inventories.

def end_game():
    global p1_inventory
    global p2_inventory
    
    value_scale = [1, 2, 4, 7, 12, 20, 33, 54, 88, 143, 250]

    p1_value = 0
    if p1_inventory['WOOD'] > 0:
        p1_value += value_scale[p1_inventory['WOOD']-1 if p1_inventory['WOOD'] <= 11 else 10]
    if p1_inventory['STONE'] > 0:
        p1_value += value_scale[p1_inventory['STONE']-1 if p1_inventory['STONE'] <= 11 else 10]
    if p1_inventory['GOLD'] > 0:
        p1_value += value_scale[p1_inventory['GOLD']-1 if p1_inventory['GOLD'] <= 11 else 10]

    p2_value = 0
    if p2_inventory['WOOD'] > 0:
        p2_value += value_scale[p2_inventory['WOOD']-1 if p2_inventory['WOOD'] <= 11 else 10]
    if p2_inventory['STONE'] > 0:
        p2_value += value_scale[p2_inventory['STONE']-1 if p2_inventory['STONE'] <= 11 else 10]
    if p2_inventory['GOLD'] > 0:
        p2_value += value_scale[p2_inventory['GOLD']-1 if p2_inventory['GOLD'] <= 11 else 10]

    return p1_value + p2_value, p1_value, p2_value

Optimize Prompts#

Finally, we use the optimizer to find better prompts for the players over multiple iterations.

# Initialize optimizer
optimizer = OptoPrime(
                [p1_prompt, p2_prompt], memory_size=0, config_list=config_list_from_json("OAI_CONFIG_LIST")
            )

# Run optimization loop
for i in range(5):
    p1_inventory = {'WOOD': 4, 'STONE': 3, 'GOLD': 2}
    p2_inventory = {'WOOD': 1, 'STONE': 5, 'GOLD': 2}
    proposed_trade = None
    proposed_end = False
    conversation = []

    current_message = None
    current_player = p2_name
    while (current_message is None) or (current_message.data != 'TERMINATE'):
        current_player = p1_name if current_player == p2_name else p2_name
        current_prompt = p1_prompt if current_player == p1_name else p2_prompt
        message_prompt = create_message(current_player, current_prompt, current_message)
        current_message = chat(current_player, message_prompt)
        if current_message.data != 'TERMINATE':
            conversation.append({'role': current_player.data, 'content': current_message.data})
        
    result_value, p1_value, p2_value = end_game()
    feedback = 'The game has ended. ' + \
                p1_name.data + f' has inventory with value of {p1_value} and ' + \
                p2_name.data + f' has inventory with value of {p2_value}.\n'
    feedback += 'OVERALL SCORE: ' + str(result_value)
    if result_value < 73:
        feedback += '\nOVERALL SCORE is less than optimal. Find better trades to increase the OVERALL SCORE.'

    print("ITERATION", i+1)
    print(p1_name.data, p1_prompt.data)
    print(p2_name.data, p2_prompt.data)
    print(feedback)

    optimizer.zero_feedback()
    optimizer.backward(current_message, feedback, visualize=False)
    optimizer.step(verbose=False)

ITERATION 1
Alice STOCKPILE THESE RESOURCES: N/A
Bob STOCKPILE THESE RESOURCES: N/A
The game has ended. Alice has inventory with value of 13 and Bob has inventory with value of 15.
OVERALL SCORE: 28
OVERALL SCORE is less than optimal. Find better trades to increase the OVERALL SCORE.
ITERATION 2
Alice STOCKPILE THESE RESOURCES: GOLD, STONE
Bob STOCKPILE THESE RESOURCES: GOLD, WOOD
The game has ended. Alice has inventory with value of 13 and Bob has inventory with value of 15.
OVERALL SCORE: 28
OVERALL SCORE is less than optimal. Find better trades to increase the OVERALL SCORE.
ITERATION 3
Alice STOCKPILE THESE RESOURCES: GOLD, WOOD
Bob STOCKPILE THESE RESOURCES: GOLD, STONE
The game has ended. Alice has inventory with value of 13 and Bob has inventory with value of 22.
OVERALL SCORE: 35
OVERALL SCORE is less than optimal. Find better trades to increase the OVERALL SCORE.
Cannot extract suggestion from LLM's response:
{
"reasoning": "The feedback indicates that the sum of values in the players' inventories at the end of the game is not optimal. The goal is to increase the OVERALL SCORE by making better trade decisions through the chat interactions modeled by the prompts and the responses in the chat variables (chat8, chat9, etc.). The trading decisions are based on the prompts 'STOCKPILE THESE RESOURCES: GOLD, WOOD' for Alice and 'STOCKPILE THESE RESOURCES: GOLD, STONE' for Bob. The trading actions 'TRADE', 'REJECT', and 'ACCEPT' suggest whether a proposed trade between Alice and Bob is successful or not. To optimize the overall score, we need to adjust the trading strategy, which could involve modifying the resources Alice and Bob are aiming to stockpile, to encourage more successful and beneficial trades. Since the only variables we can adjust are str2 and str3, which define the resources each player is trying to accumulate, changing these could potentially lead to better trade outcomes, increasing the overall value of the inventories. However, the instructions and feedback suggest that the strategy and prompts should be changed, rather than specific values. Without specific instructions on what values to change to what new values, there is no direct recommendation to improve the results other than considering changing the trading strategies.",
"answer": "",
"suggestion": {}
}
ITERATION 4
Alice STOCKPILE THESE RESOURCES: GOLD, WOOD
Bob STOCKPILE THESE RESOURCES: GOLD, STONE
The game has ended. Alice has inventory with value of 13 and Bob has inventory with value of 22.
OVERALL SCORE: 35
OVERALL SCORE is less than optimal. Find better trades to increase the OVERALL SCORE.
ITERATION 5
Alice STOCKPILE THESE RESOURCES: STONE, GOLD
Bob STOCKPILE THESE RESOURCES: WOOD, GOLD
The game has ended. Alice has inventory with value of 13 and Bob has inventory with value of 12.
OVERALL SCORE: 25
OVERALL SCORE is less than optimal. Find better trades to increase the OVERALL SCORE.

Now, you can run each cell in this notebook step by step to walk through the process of setting up and optimizing prompts for the trading game. Happy optimizing!