Skip to main content

Using AutoGen AgentChat with LangChain-based Custom Client and Hugging Face Models

Open In Colab Open on GitHub

Introduction

This notebook demonstrates how you can use LangChain’s extensive support for LLMs to enable flexible use of various Language Models (LLMs) in agent-based conversations in AutoGen.

What we’ll cover:

  1. Creating a custom model client that uses LangChain to load and interact with LLMs
  2. Configuring AutoGen to use our custom LangChain-based model
  3. Setting up AutoGen agents with the custom model
  4. Demonstrating a simple conversation using this setup

While we used a Hugging Face model in this example, the same approach can be applied to any LLM supported by LangChain, including models from OpenAI, Anthropic, or custom models. This integration opens up a wide range of possibilities for creating sophisticated, multi-model conversational agents using AutoGen

Requirements

Requirements

Some extra dependencies are needed for this notebook, which can be installed via pip:

pip install pyautogen torch transformers sentencepiece langchain-huggingface 

For more information, please refer to the installation guide.

NOTE: Depending on what model you use, you may need to play with the default prompts of the Agent’s

Setup and Imports

First, let’s import the necessary libraries and define our custom model client.

import json
import os
from types import SimpleNamespace

from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline

from autogen import AssistantAgent, UserProxyAgent, config_list_from_json

Create and configure the custom model

A custom model class can be created in many ways, but needs to adhere to the ModelClient protocol and response structure which is defined in client.py and shown below.

The response protocol has some minimum requirements, but can be extended to include any additional information that is needed. Message retrieval therefore can be customized, but needs to return a list of strings or a list of ModelClientResponseProtocol.Choice.Message objects.

class ModelClient(Protocol):
"""
A client class must implement the following methods:
- create must return a response object that implements the ModelClientResponseProtocol
- cost must return the cost of the response
- get_usage must return a dict with the following keys:
- prompt_tokens
- completion_tokens
- total_tokens
- cost
- model

This class is used to create a client that can be used by OpenAIWrapper.
The response returned from create must adhere to the ModelClientResponseProtocol but can be extended however needed.
The message_retrieval method must be implemented to return a list of str or a list of messages from the response.
"""

RESPONSE_USAGE_KEYS = ["prompt_tokens", "completion_tokens", "total_tokens", "cost", "model"]

class ModelClientResponseProtocol(Protocol):
class Choice(Protocol):
class Message(Protocol):
content: Optional[str]

message: Message

choices: List[Choice]
model: str

def create(self, params) -> ModelClientResponseProtocol:
...

def message_retrieval(
self, response: ModelClientResponseProtocol
) -> Union[List[str], List[ModelClient.ModelClientResponseProtocol.Choice.Message]]:
"""
Retrieve and return a list of strings or a list of Choice.Message from the response.

NOTE: if a list of Choice.Message is returned, it currently needs to contain the fields of OpenAI's ChatCompletion Message object,
since that is expected for function or tool calling in the rest of the codebase at the moment, unless a custom agent is being used.
"""
...

def cost(self, response: ModelClientResponseProtocol) -> float:
...

@staticmethod
def get_usage(response: ModelClientResponseProtocol) -> Dict:
"""Return usage summary of the response using RESPONSE_USAGE_KEYS."""
...

Example of simple custom client

Following the huggingface example for using Mistral’s Open-Orca

For the response object, python’s SimpleNamespace is used to create a simple object that can be used to store the response data, but any object that follows the ClientResponseProtocol can be used.

# custom client with custom model loader


class CustomModelClient:
"""Custom model client implementation for LangChain integration with AutoGen."""

def __init__(self, config, **kwargs):
"""Initialize the CustomModelClient."""
print(f"CustomModelClient config: {config}")
self.device = config.get("device", "cpu")

gen_config_params = config.get("params", {})
self.model_name = config["model"]
pipeline = HuggingFacePipeline.from_model_id(
model_id=self.model_name,
task="text-generation",
pipeline_kwargs=gen_config_params,
device=self.device,
)
self.model = ChatHuggingFace(llm=pipeline)
print(f"Loaded model {config['model']} to {self.device}")

def _to_chatml_format(self, message):
"""Convert message to ChatML format."""
if message["role"] == "system":
return SystemMessage(content=message["content"])
if message["role"] == "assistant":
return AIMessage(content=message["content"])
if message["role"] == "user":
return HumanMessage(content=message["content"])
raise ValueError(f"Unknown message type: {type(message)}")

def create(self, params):
"""Create a response using the model."""
if params.get("stream", False) and "messages" in params:
raise NotImplementedError("Local models do not support streaming.")

num_of_responses = params.get("n", 1)
response = SimpleNamespace()
inputs = [self._to_chatml_format(m) for m in params["messages"]]
response.choices = []
response.model = self.model_name

for _ in range(num_of_responses):
outputs = self.model.invoke(inputs)
text = outputs.content
choice = SimpleNamespace()
choice.message = SimpleNamespace()
choice.message.content = text
choice.message.function_call = None
response.choices.append(choice)

return response

def message_retrieval(self, response):
"""Retrieve messages from the response."""
return [choice.message.content for choice in response.choices]

def cost(self, response) -> float:
"""Calculate the cost of the response."""
response.cost = 0
return 0

@staticmethod
def get_usage(response):
"""Get usage statistics."""
return {}

Set your API Endpoint

The config_list_from_json function loads a list of configurations from an environment variable or a json file.

It first looks for an environment variable of a specified name (“OAI_CONFIG_LIST” in this example), which needs to be a valid json string. If that variable is not found, it looks for a json file with the same name. It filters the configs by models (you can filter by other keys as well).

The json looks like the following:

[
{
"model": "gpt-4",
"api_key": "<your OpenAI API key here>"
},
{
"model": "gpt-4",
"api_key": "<your Azure OpenAI API key here>",
"base_url": "<your Azure OpenAI API base here>",
"api_type": "azure",
"api_version": "2024-02-01"
},
{
"model": "gpt-4-32k",
"api_key": "<your Azure OpenAI API key here>",
"base_url": "<your Azure OpenAI API base here>",
"api_type": "azure",
"api_version": "2024-02-01"
}
]

You can set the value of config_list in any way you prefer. Please refer to this notebook for full code examples of the different methods.

Set the config for the custom model

You can add any paramteres that are needed for the custom model loading in the same configuration list.

It is important to add the model_client_cls field and set it to a string that corresponds to the class name: "CustomModelClient".

os.environ["OAI_CONFIG_LIST"] = json.dumps(
[
{
"model": "mistralai/Mistral-7B-Instruct-v0.2",
"model_client_cls": "CustomModelClient",
"device": 0,
"n": 1,
"params": {
"max_new_tokens": 500,
"top_k": 50,
"temperature": 0.1,
"do_sample": True,
"return_full_text": False,
},
}
]
)
config_list_custom = config_list_from_json(
"OAI_CONFIG_LIST",
filter_dict={"model_client_cls": ["CustomModelClient"]},
)
import getpass

from huggingface_hub import login

# The Mistral-7B-Instruct-v0.2 is a gated model which requires API token to access
login(token=getpass.getpass("Enter your HuggingFace API Token"))

Construct Agents

Consturct a simple conversation between a User proxy and an Assistent agent

assistant = AssistantAgent("assistant", llm_config={"config_list": config_list_custom})
user_proxy = UserProxyAgent("user_proxy", code_execution_config=False)
[autogen.oai.client: 09-01 12:53:51] {484} INFO - Detected custom model client in config: CustomModelClient, model client can not be used until register_model_client is called.

Register the custom client class to the assistant agent

assistant.register_model_client(model_client_cls=CustomModelClient)
CustomModelClient config: {'model': 'microsoft/Phi-3.5-mini-instruct', 'model_client_cls': 'CustomModelClient', 'device': 0, 'n': 1, 'params': {'max_new_tokens': 100, 'top_k': 50, 'temperature': 0.1, 'do_sample': True, 'return_full_text': False}}
Loaded model microsoft/Phi-3.5-mini-instruct to 0
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.51s/it]
user_proxy.initiate_chat(assistant, message="Write python code to print Hello World!")
user_proxy (to assistant):

Write python code to print Hello World!

--------------------------------------------------------------------------------
assistant (to user_proxy):

```python
# filename: hello_world.py

print("Hello World!")
```

To execute this code, save it in a file named `hello_world.py`. Then, open your terminal or command prompt, navigate to the directory containing the file, and run the following command:

```
python hello_world.py
```

The output should be:

```
Hello World!
```

If you encounter any errors,

--------------------------------------------------------------------------------
You are not running the flash-attention implementation, expect numerical differences.
ChatResult(chat_id=None, chat_history=[{'content': 'Write python code to print Hello World!', 'role': 'assistant', 'name': 'user_proxy'}, {'content': ' ```python\n# filename: hello_world.py\n\nprint("Hello World!")\n```\n\nTo execute this code, save it in a file named `hello_world.py`. Then, open your terminal or command prompt, navigate to the directory containing the file, and run the following command:\n\n```\npython hello_world.py\n```\n\nThe output should be:\n\n```\nHello World!\n```\n\nIf you encounter any errors,', 'role': 'user', 'name': 'assistant'}], summary=' ```python\n# filename: hello_world.py\n\nprint("Hello World!")\n```\n\nTo execute this code, save it in a file named `hello_world.py`. Then, open your terminal or command prompt, navigate to the directory containing the file, and run the following command:\n\n```\npython hello_world.py\n```\n\nThe output should be:\n\n```\nHello World!\n```\n\nIf you encounter any errors,', cost={'usage_including_cached_inference': {'total_cost': 0}, 'usage_excluding_cached_inference': {'total_cost': 0}}, human_input=['exit'])