Local LLMs with LiteLLM & Ollama#

In this notebook we’ll create two agents, Joe and Cathy who like to tell jokes to each other. The agents will use locally running LLMs.

Follow the guide at https://microsoft.github.io/autogen/docs/topics/non-openai-models/local-litellm-ollama/ to understand how to install LiteLLM and Ollama.

We encourage going through the link, but if you’re in a hurry and using Linux, run these:

curl -fsSL https://ollama.com/install.sh | sh

ollama pull llama3.2:1b

pip install 'litellm[proxy]'
litellm --model ollama/llama3.2:1b

This will run the proxy server and it will be available at ‘http://0.0.0.0:4000/’.

To get started, let’s import some classes.

from dataclasses import dataclass

from autogen_core import (
    AgentId,
    DefaultTopicId,
    MessageContext,
    RoutedAgent,
    SingleThreadedAgentRuntime,
    default_subscription,
    message_handler,
)
from autogen_core.model_context import BufferedChatCompletionContext
from autogen_core.models import (
    AssistantMessage,
    ChatCompletionClient,
    SystemMessage,
    UserMessage,
)
from autogen_ext.models.openai import OpenAIChatCompletionClient

Set up out local LLM model client.

def get_model_client() -> OpenAIChatCompletionClient:  # type: ignore
    "Mimic OpenAI API using Local LLM Server."
    return OpenAIChatCompletionClient(
        model="llama3.2:1b",
        api_key="NotRequiredSinceWeAreLocal",
        base_url="http://0.0.0.0:4000",
        model_capabilities={
            "json_output": False,
            "vision": False,
            "function_calling": True,
        },
    )

Define a simple message class

@dataclass
class Message:
    content: str

Now, the Agent.

We define the role of the Agent using the SystemMessage and set up a condition for termination.

@default_subscription
class Assistant(RoutedAgent):
    def __init__(self, name: str, model_client: ChatCompletionClient) -> None:
        super().__init__("An assistant agent.")
        self._model_client = model_client
        self.name = name
        self.count = 0
        self._system_messages = [
            SystemMessage(
                content=f"Your name is {name} and you are a part of a duo of comedians."
                "You laugh when you find the joke funny, else reply 'I need to go now'.",
            )
        ]
        self._model_context = BufferedChatCompletionContext(buffer_size=5)

    @message_handler
    async def handle_message(self, message: Message, ctx: MessageContext) -> None:
        self.count += 1
        await self._model_context.add_message(UserMessage(content=message.content, source="user"))
        result = await self._model_client.create(self._system_messages + await self._model_context.get_messages())

        print(f"\n{self.name}: {message.content}")

        if "I need to go".lower() in message.content.lower() or self.count > 2:
            return

        await self._model_context.add_message(AssistantMessage(content=result.content, source="assistant"))  # type: ignore
        await self.publish_message(Message(content=result.content), DefaultTopicId())  # type: ignore

Set up the agents.

runtime = SingleThreadedAgentRuntime()

model_client = get_model_client()

cathy = await Assistant.register(
    runtime,
    "cathy",
    lambda: Assistant(name="Cathy", model_client=model_client),
)

joe = await Assistant.register(
    runtime,
    "joe",
    lambda: Assistant(name="Joe", model_client=model_client),
)

Let’s run everything!

runtime.start()
await runtime.send_message(
    Message("Joe, tell me a joke."),
    recipient=AgentId(joe, "default"),
    sender=AgentId(cathy, "default"),
)
await runtime.stop_when_idle()

# Close the connections to the model clients.
await model_client.close()

/tmp/ipykernel_1417357/2124203426.py:22: UserWarning: Resolved model mismatch: gpt-4o-2024-05-13 != ollama/llama3.1:8b. Model mapping may be incorrect.
  result = await self._model_client.create(self._system_messages + await self._model_context.get_messages())

Joe: Joe, tell me a joke.

Cathy: Here's one:

Why couldn't the bicycle stand up by itself?

(waiting for your reaction...)

Joe: *laughs* It's because it was two-tired! Ahahaha! That's a good one! I love it!

Cathy: *roars with laughter* HAHAHAHA! Oh man, that's a classic! I'm glad you liked it! The setup is perfect and the punchline is just... *chuckles* Two-tired! I mean, come on! That's genius! We should definitely add that one to our act!

Joe: I need to go now.