Skip to main content

One post tagged with "gemini"

View All Tags

· 11 min read
Mark Sze
Hrushikesh Dokala

agents

TL;DR

  • AutoGen has expanded integrations with a variety of cloud-based model providers beyond OpenAI.
  • Leverage models and platforms from Gemini, Anthropic, Mistral AI, Together.AI, and Groq for your AutoGen agents.
  • Utilise models specifically for chat, language, image, and coding.
  • LLM provider diversification can provide cost and resilience benefits.

In addition to the recently released AutoGen Google Gemini client, new client classes for Mistral AI, Anthropic, Together.AI, and Groq enable you to utilize over 75 different large language models in your AutoGen agent workflow.

These new client classes tailor AutoGen's underlying messages to each provider's unique requirements and remove that complexity from the developer, who can then focus on building their AutoGen workflow.

Using them is as simple as installing the client-specific library and updating your LLM config with the relevant api_type and model. We'll demonstrate how to use them below.

The community is continuing to enhance and build new client classes as cloud-based inference providers arrive. So, watch this space, and feel free to discuss or develop another one.

Benefits of choice

The need to use only the best models to overcome workflow-breaking LLM inconsistency has diminished considerably over the last 12 months.

These new classes provide access to the very largest trillion-parameter models from OpenAI, Google, and Anthropic, continuing to provide the most consistent and competent agent experiences. However, it's worth trying smaller models from the likes of Meta, Mistral AI, Microsoft, Qwen, and many others. Perhaps they are capable enough for a task, or sub-task, or even better suited (such as a coding model)!

Using smaller models will have cost benefits, but they also allow you to test models that you could run locally, allowing you to determine if you can remove cloud inference costs altogether or even run an AutoGen workflow offline.

On the topic of cost, these client classes also include provider-specific token cost calculations so you can monitor the cost impact of your workflows. With costs per million tokens as low as 10 cents (and some are even free!), cost savings can be noticeable.

Mix and match

How does Google's Gemini 1.5 Pro model stack up against Anthropic's Opus or Meta's Llama 3?

Now you have the ability to quickly change your agent configs and find out. If you want to run all three in the one workflow, AutoGen's ability to associate specific configurations to each agent means you can select the best LLM for each agent.

Capabilities

The common requirements of text generation and function/tool calling are supported by these client classes.

Multi-modal support, such as for image/audio/video, is an area of active development. The Google Gemini client class can be used to create a multimodal agent.

Tips

Here are some tips when working with these client classes:

  • Most to least capable - start with larger models and get your workflow working, then iteratively try smaller models.
  • Right model - choose one that's suited to your task, whether it's coding, function calling, knowledge, or creative writing.
  • Agent names - these cloud providers do not use the name field on a message, so be sure to use your agent's name in their system_message and description fields, as well as instructing the LLM to 'act as' them. This is particularly important for "auto" speaker selection in group chats as we need to guide the LLM to choose the next agent based on a name, so tweak select_speaker_message_template, select_speaker_prompt_template, and select_speaker_auto_multiple_template with more guidance.
  • Context length - as your conversation gets longer, models need to support larger context lengths, be mindful of what the model supports and consider using Transform Messages to manage context size.
  • Provider parameters - providers have parameters you can set such as temperature, maximum tokens, top-k, top-p, and safety. See each client class in AutoGen's API Reference or documentation for details.
  • Prompts - prompt engineering is critical in guiding smaller LLMs to do what you need. ConversableAgent, GroupChat, UserProxyAgent, and AssistantAgent all have customizable prompt attributes that you can tailor. Here are some prompting tips from Anthropic(+Library), Mistral AI, Together.AI, and Meta.
  • Help! - reach out on the AutoGen Discord or log an issue if you need help with or can help improve these client classes.

Now it's time to try them out.

Quickstart

Installation

Install the appropriate client based on the model you wish to use.

pip install pyautogen["mistral"] # for Mistral AI client
pip install pyautogen["anthropic"] # for Anthropic client
pip install pyautogen["together"] # for Together.AI client
pip install pyautogen["groq"] # for Groq client

Configuration Setup

Add your model configurations to the OAI_CONFIG_LIST. Ensure you specify the api_type to initialize the respective client (Anthropic, Mistral, or Together).

[
{
"model": "your anthropic model name",
"api_key": "your Anthropic api_key",
"api_type": "anthropic"
},
{
"model": "your mistral model name",
"api_key": "your Mistral AI api_key",
"api_type": "mistral"
},
{
"model": "your together.ai model name",
"api_key": "your Together.AI api_key",
"api_type": "together"
},
{
"model": "your groq model name",
"api_key": "your Groq api_key",
"api_type": "groq"
}
]

Usage

The [config_list_from_json](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils/#config_list_from_json) function loads a list of configurations from an environment variable or a json file.

import autogen
from autogen import AssistantAgent, UserProxyAgent

config_list = autogen.config_list_from_json(
"OAI_CONFIG_LIST"
)

Construct Agents

Construct a simple conversation between a User proxy and an Assistant agent

user_proxy =  UserProxyAgent(
name="User_proxy",
code_execution_config={
"last_n_messages": 2,
"work_dir": "groupchat",
"use_docker": False, # Please set use_docker = True if docker is available to run the generated code. Using docker is safer than running the generated code directly.
},
human_input_mode="ALWAYS",
is_termination_msg=lambda msg: not msg["content"]
)

assistant = AssistantAgent(
name="assistant",
llm_config = {"config_list": config_list}
)

Start chat


user_proxy.intiate_chat(assistant, message="Write python code to print Hello World!")

NOTE: To integrate this setup into GroupChat, follow the tutorial with the same config as above.

Function Calls

Now, let's look at how Anthropic's Sonnet 3.5 is able to suggest multiple function calls in a single response.

This example is a simple travel agent setup with an agent for function calling and a user proxy agent for executing the functions.

One thing you'll note here is Anthropic's models are more verbose than OpenAI's and will typically provide chain-of-thought or general verbiage when replying. Therefore we provide more explicit instructions to functionbot to not reply with more than necessary. Even so, it can't always help itself!

Let's start with setting up our configuration and agents.

import os
import autogen
import json
from typing import Literal
from typing_extensions import Annotated

# Anthropic configuration, using api_type='anthropic'
anthropic_llm_config = {
"config_list":
[
{
"api_type": "anthropic",
"model": "claude-3-5-sonnet-20240620",
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"cache_seed": None
}
]
}

# Our functionbot, who will be assigned two functions and
# given directions to use them.
functionbot = autogen.AssistantAgent(
name="functionbot",
system_message="For currency exchange tasks, only use "
"the functions you have been provided with. Do not "
"reply with helpful tips. Once you've recommended functions "
"reply with 'TERMINATE'.",
is_termination_msg=lambda x: x.get("content", "") and (x.get("content", "").rstrip().endswith("TERMINATE") or x.get("content", "") == ""),
llm_config=anthropic_llm_config,
)

# Our user proxy agent, who will be used to manage the customer
# request and conversation with the functionbot, terminating
# when we have the information we need.
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
system_message="You are a travel agent that provides "
"specific information to your customers. Get the "
"information you need and provide a great summary "
"so your customer can have a great trip. If you "
"have the information you need, simply reply with "
"'TERMINATE'.",
is_termination_msg=lambda x: x.get("content", "") and (x.get("content", "").rstrip().endswith("TERMINATE") or x.get("content", "") == ""),
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
)

We define the two functions.

CurrencySymbol = Literal["USD", "EUR"]

def exchange_rate(base_currency: CurrencySymbol, quote_currency: CurrencySymbol) -> float:
if base_currency == quote_currency:
return 1.0
elif base_currency == "USD" and quote_currency == "EUR":
return 1 / 1.1
elif base_currency == "EUR" and quote_currency == "USD":
return 1.1
else:
raise ValueError(f"Unknown currencies {base_currency}, {quote_currency}")

def get_current_weather(location, unit="fahrenheit"):
"""Get the weather for some location"""
if "chicago" in location.lower():
return json.dumps({"location": "Chicago", "temperature": "13", "unit": unit})
elif "san francisco" in location.lower():
return json.dumps({"location": "San Francisco", "temperature": "55", "unit": unit})
elif "new york" in location.lower():
return json.dumps({"location": "New York", "temperature": "11", "unit": unit})
else:
return json.dumps({"location": location, "temperature": "unknown"})

And then associate them with the user_proxy for execution and functionbot for the LLM to consider using them.

@user_proxy.register_for_execution()
@functionbot.register_for_llm(description="Currency exchange calculator.")
def currency_calculator(
base_amount: Annotated[float, "Amount of currency in base_currency"],
base_currency: Annotated[CurrencySymbol, "Base currency"] = "USD",
quote_currency: Annotated[CurrencySymbol, "Quote currency"] = "EUR",
) -> str:
quote_amount = exchange_rate(base_currency, quote_currency) * base_amount
return f"{quote_amount} {quote_currency}"

@user_proxy.register_for_execution()
@functionbot.register_for_llm(description="Weather forecast for US cities.")
def weather_forecast(
location: Annotated[str, "City name"],
) -> str:
weather_details = get_current_weather(location=location)
weather = json.loads(weather_details)
return f"{weather['location']} will be {weather['temperature']} degrees {weather['unit']}"

Finally, we start the conversation with a request for help from our customer on their upcoming trip to New York and the Euro they would like exchanged to USD.

Importantly, we're also using Anthropic's Sonnet to provide a summary through the summary_method. Using summary_prompt, we guide Sonnet to give us an email output.

# start the conversation
res = user_proxy.initiate_chat(
functionbot,
message="My customer wants to travel to New York and "
"they need to exchange 830 EUR to USD. Can you please "
"provide them with a summary of the weather and "
"exchanged currently in USD?",
summary_method="reflection_with_llm",
summary_args={
"summary_prompt": """Summarize the conversation by
providing an email response with the travel information
for the customer addressed as 'Dear Customer'. Do not
provide any additional conversation or apologise,
just provide the relevant information and the email."""
},
)

After the conversation has finished, we'll print out the summary.

print(f"Here's the LLM summary of the conversation:\n\n{res.summary['content']}")

Here's the resulting output.

user_proxy (to functionbot):

My customer wants to travel to New York and they need to exchange 830 EUR
to USD. Can you please provide them with a summary of the weather and
exchanged currently in USD?

--------------------------------------------------------------------------------
functionbot (to user_proxy):

Certainly! I'd be happy to help your customer with information about the
weather in New York and the currency exchange from EUR to USD. Let's use
the available tools to get this information.

***** Suggested tool call (toolu_016wBUKVX2TToBaMqmiGvhav): weather_forecast *****
Arguments:
{"location": "New York"}
**********************************************************************************
***** Suggested tool call (toolu_01Nxjeew2BspfKdZ85on3XkP): currency_calculator *****
Arguments:
{"base_amount": 830, "base_currency": "EUR", "quote_currency": "USD"}
*************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION weather_forecast...

>>>>>>>> EXECUTING FUNCTION currency_calculator...
user_proxy (to functionbot):

user_proxy (to functionbot):

***** Response from calling tool (toolu_016wBUKVX2TToBaMqmiGvhav) *****
New York will be 11 degrees fahrenheit
***********************************************************************

--------------------------------------------------------------------------------
user_proxy (to functionbot):

***** Response from calling tool (toolu_01Nxjeew2BspfKdZ85on3XkP) *****
913.0000000000001 USD
***********************************************************************

--------------------------------------------------------------------------------
functionbot (to user_proxy):

Thank you for providing the information. I can now give your customer a
summary of the weather in New York and the currency exchange from EUR to USD.

Weather in New York:
The current forecast for New York indicates that it will be 11 degrees
Fahrenheit. This is quite cold, so your customer should be prepared with
warm clothing for their trip.

Currency Exchange:
For 830 EUR, your customer will receive approximately 913 USD. This is based
on the current exchange rate provided by our currency calculator.

To summarize:
1. Weather in New York: 11°F (very cold)
2. Currency exchange: 830 EUR = 913 USD

Your customer should pack warm clothes for the cold weather in New York and
can expect to have about 913 USD for their trip after exchanging 830 EUR.

TERMINATE

--------------------------------------------------------------------------------
Here's the LLM summary of the conversation:

Certainly. I'll provide an email response to the customer with the travel
information as requested.

Dear Customer,

We are pleased to provide you with the following information for your
upcoming trip to New York:

Weather Forecast:
The current forecast for New York indicates a temperature of 11 degrees
Fahrenheit. Please be prepared for very cold weather and pack appropriate
warm clothing.

Currency Exchange:
We have calculated the currency exchange for you. Your 830 EUR will be
equivalent to approximately 913 USD at the current exchange rate.

We hope this information helps you prepare for your trip to New York. Have
a safe and enjoyable journey!

Best regards,
Travel Assistance Team

So we can see how Anthropic's Sonnet is able to suggest multiple tools in a single response, with AutoGen executing them both and providing the results back to Sonnet. Sonnet then finishes with a nice email summary that can be the basis for continued real-life conversation with the customer.

More tips and tricks

For an interesting chess game between Anthropic's Sonnet and Mistral's Mixtral, we've put together a sample notebook that highlights some of the tips and tricks for working with non-OpenAI LLMs. See the notebook here.