Table of Contents

This tutorial shows how to perform image chat with an agent using the OpenAIChatAgent as an example.

Note

To chat image with an agent, the model behind the agent needs to support image input. Here is a partial list of models that support image input:

  • gpt-4o
  • gemini-1.5
  • llava
  • claude-3
  • ...

In this example, we are using the gpt-4o model as the backend model for the agent.

Note

The complete code example can be found in Image_Chat_With_Agent.cs

Step 1: Install AutoGen

First, install the AutoGen package using the following command:

dotnet add package AutoGen

Step 2: Add Using Statements

using AutoGen.Core;
using AutoGen.OpenAI;
using AutoGen.OpenAI.Extension;

Step 3: Create an OpenAIChatAgent

var gpt4o = LLMConfiguration.GetOpenAIGPT4o_mini();
var agent = new OpenAIChatAgent(
    chatClient: gpt4o,
    name: "agent",
    systemMessage: "You are a helpful AI assistant")
    .RegisterMessageConnector() // convert OpenAI message to AutoGen message
    .RegisterPrintMessage();

Step 4: Prepare Image Message

In AutoGen, you can create an image message using either ImageMessage or MultiModalMessage. The ImageMessage takes a single image as input, whereas the MultiModalMessage allows you to pass multiple modalities like text or image.

Here is how to create an image message using ImageMessage:

var backgoundImagePath = Path.Combine("resource", "images", "background.png");
var imageBytes = File.ReadAllBytes(backgoundImagePath);
var imageMessage = new ImageMessage(Role.User, BinaryData.FromBytes(imageBytes, "image/png"));

Here is how to create a multimodal message using MultiModalMessage:

var textMessage = new TextMessage(Role.User, "what's in the picture");
var multimodalMessage = new MultiModalMessage(Role.User, [textMessage, imageMessage]);

Step 5: Generate Response

To generate response, you can use one of the overloaded methods of SendAsync method. The following code shows how to generate response with an image message:

var reply = await agent.SendAsync("what's in the picture", chatHistory: [imageMessage]);
// or use multimodal message to generate reply
reply = await agent.SendAsync(multimodalMessage);

Further Reading