This tutorial shows how to perform image chat with an agent using the Open
Note
To chat image with an agent, the model behind the agent needs to support image input. Here is a partial list of models that support image input:
- gpt-4o
- gemini-1.5
- llava
- claude-3
- ...
In this example, we are using the gpt-4o model as the backend model for the agent.
Note
The complete code example can be found in Image_Chat_With_Agent.cs
Step 1: Install AutoGen
First, install the AutoGen package using the following command:
dotnet add package AutoGen
Step 2: Add Using Statements
using AutoGen.Core;
using AutoGen.OpenAI;
using AutoGen.OpenAI.Extension;
Step 3: Create an OpenAIChatAgent
var gpt4o = LLMConfiguration.GetOpenAIGPT4o_mini();
var agent = new OpenAIChatAgent(
chatClient: gpt4o,
name: "agent",
systemMessage: "You are a helpful AI assistant")
.RegisterMessageConnector() // convert OpenAI message to AutoGen message
.RegisterPrintMessage();
Step 4: Prepare Image Message
In AutoGen, you can create an image message using either Image
Here is how to create an image message using Image
var backgoundImagePath = Path.Combine("resource", "images", "background.png");
var imageBytes = File.ReadAllBytes(backgoundImagePath);
var imageMessage = new ImageMessage(Role.User, BinaryData.FromBytes(imageBytes, "image/png"));
Here is how to create a multimodal message using Multi
var textMessage = new TextMessage(Role.User, "what's in the picture");
var multimodalMessage = new MultiModalMessage(Role.User, [textMessage, imageMessage]);
Step 5: Generate Response
To generate response, you can use one of the overloaded methods of Send
var reply = await agent.SendAsync("what's in the picture", chatHistory: [imageMessage]);
// or use multimodal message to generate reply
reply = await agent.SendAsync(multimodalMessage);