This tutorial shows how to perform image chat with an agent using the OpenAIChatAgent as an example.
Note
To chat image with an agent, the model behind the agent needs to support image input. Here is a partial list of models that support image input:
- gpt-4o
- gemini-1.5
- llava
- claude-3
- ...
In this example, we are using the gpt-4o model as the backend model for the agent.
Note
The complete code example can be found in Image_Chat_With_Agent.cs
Step 1: Install AutoGen
First, install the AutoGen package using the following command:
dotnet add package AutoGen
Step 2: Add Using Statements
using AutoGen.Core;
using AutoGen.OpenAI;
using AutoGen.OpenAI.Extension;
Step 3: Create an OpenAIChatAgent
var gpt4o = LLMConfiguration.GetOpenAIGPT4o_mini();
var agent = new OpenAIChatAgent(
chatClient: gpt4o,
name: "agent",
systemMessage: "You are a helpful AI assistant")
.RegisterMessageConnector() // convert OpenAI message to AutoGen message
.RegisterPrintMessage();
Step 4: Prepare Image Message
In AutoGen, you can create an image message using either ImageMessage or MultiModalMessage. The ImageMessage takes a single image as input, whereas the MultiModalMessage allows you to pass multiple modalities like text or image.
Here is how to create an image message using ImageMessage:
var backgoundImagePath = Path.Combine("resource", "images", "background.png");
var imageBytes = File.ReadAllBytes(backgoundImagePath);
var imageMessage = new ImageMessage(Role.User, BinaryData.FromBytes(imageBytes, "image/png"));
Here is how to create a multimodal message using MultiModalMessage:
var textMessage = new TextMessage(Role.User, "what's in the picture");
var multimodalMessage = new MultiModalMessage(Role.User, [textMessage, imageMessage]);
Step 5: Generate Response
To generate response, you can use one of the overloaded methods of SendAsync method. The following code shows how to generate response with an image message:
var reply = await agent.SendAsync("what's in the picture", chatHistory: [imageMessage]);
// or use multimodal message to generate reply
reply = await agent.SendAsync(multimodalMessage);