Models#

In many cases, agents need access to LLM model services such as OpenAI, Azure OpenAI, or local models. Since there are many different providers with different APIs, autogen-core implements a protocol for model clients and autogen-ext implements a set of model clients for popular model services. AgentChat can use these model clients to interact with model services.

This section provides a quick overview of available model clients. For more details on how to use them directly, please refer to Model Clients in the Core API documentation.

Note

See ChatCompletionCache for a caching wrapper to use with the following clients.

OpenAI#

To access OpenAI models, install the openai extension, which allows you to use the OpenAIChatCompletionClient.

pip install "autogen-ext[openai]"

You will also need to obtain an API key from OpenAI.

from autogen_ext.models.openai import OpenAIChatCompletionClient

openai_model_client = OpenAIChatCompletionClient(
    model="gpt-4o-2024-08-06",
    # api_key="sk-...", # Optional if you have an OPENAI_API_KEY environment variable set.
)

To test the model client, you can use the following code:

from autogen_core.models import UserMessage

result = await openai_model_client.create([UserMessage(content="What is the capital of France?", source="user")])
print(result)

CreateResult(finish_reason='stop', content='The capital of France is Paris.', usage=RequestUsage(prompt_tokens=15, completion_tokens=7), cached=False, logprobs=None)

Note

You can use this client with models hosted on OpenAI-compatible endpoints, however, we have not tested this functionality. See OpenAIChatCompletionClient for more information.

Azure OpenAI#

Similarly, install the azure and openai extensions to use the AzureOpenAIChatCompletionClient.

pip install "autogen-ext[openai,azure]"

To use the client, you need to provide your deployment id, Azure Cognitive Services endpoint, api version, and model capabilities. For authentication, you can either provide an API key or an Azure Active Directory (AAD) token credential.

The following code snippet shows how to use AAD authentication. The identity used must be assigned the Cognitive Services OpenAI User role.

from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# Create the token provider
token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

az_model_client = AzureOpenAIChatCompletionClient(
    azure_deployment="{your-azure-deployment}",
    model="{model-name, such as gpt-4o}",
    api_version="2024-06-01",
    azure_endpoint="https://{your-custom-endpoint}.openai.azure.com/",
    azure_ad_token_provider=token_provider,  # Optional if you choose key-based authentication.
    # api_key="sk-...", # For key-based authentication.
)

See here for how to use the Azure client directly or for more information.

Azure AI Foundry#

Azure AI Foundry (previously known as Azure AI Studio) offers models hosted on Azure. To use those models, you use the AzureAIChatCompletionClient.

You need to install the azure extra to use this client.

pip install "autogen-ext[azure]"

Below is an example of using this client with the Phi-4 model from GitHub Marketplace.

import os

from autogen_core.models import UserMessage
from autogen_ext.models.azure import AzureAIChatCompletionClient
from azure.core.credentials import AzureKeyCredential

client = AzureAIChatCompletionClient(
    model="Phi-4",
    endpoint="https://models.inference.ai.azure.com",
    # To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings.
    # Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
    credential=AzureKeyCredential(os.environ["GITHUB_TOKEN"]),
    model_info={
        "json_output": False,
        "function_calling": False,
        "vision": False,
        "family": "unknown",
    },
)

result = await client.create([UserMessage(content="What is the capital of France?", source="user")])
print(result)

finish_reason='stop' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=14, completion_tokens=8) cached=False logprobs=None

Ollama#

Ollama is a local model server that can run models locally on your machine.

Currently, we recommend using the OpenAIChatCompletionClient to interact with Ollama server.

Note

Small local models are typically not as capable as larger models on the cloud. For some tasks they may not perform as well and the output may be suprising.

from autogen_core.models import UserMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(
    model="llama3.2:latest",
    base_url="http://localhost:11434/v1",
    api_key="placeholder",
    model_info={
        "vision": False,
        "function_calling": True,
        "json_output": False,
        "family": "unknown",
    },
)

response = await model_client.create([UserMessage(content="What is the capital of France?", source="user")])
print(response)

finish_reason='unknown' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=32, completion_tokens=8) cached=False logprobs=None

Gemini (experimental)#

Gemini currently offers an OpenAI-compatible API (beta). So you can use the OpenAIChatCompletionClient with the Gemini API.

Note

While some model providers may offer OpenAI-compatible APIs, they may still have minor differences. For example, the finish_reason field may be different in the response.

from autogen_core.models import UserMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(
    model="gemini-1.5-flash-8b",
    # api_key="GEMINI_API_KEY",
)

response = await model_client.create([UserMessage(content="What is the capital of France?", source="user")])
print(response)

finish_reason='stop' content='Paris\n' usage=RequestUsage(prompt_tokens=7, completion_tokens=2) cached=False logprobs=None thought=None

Semantic Kernel Adapter#

The SKChatCompletionAdapter allows you to use Semantic kernel model clients as a ChatCompletionClient by adapting them to the required interface.

You need to install the relevant provider extras to use this adapter.

The list of extras that can be installed:

semantic-kernel-anthropic: Install this extra to use Anthropic models.
semantic-kernel-google: Install this extra to use Google Gemini models.
semantic-kernel-ollama: Install this extra to use Ollama models.
semantic-kernel-mistralai: Install this extra to use MistralAI models.
semantic-kernel-aws: Install this extra to use AWS models.
semantic-kernel-hugging-face: Install this extra to use Hugging Face models.

For example, to use Anthropic models, you need to install semantic-kernel-anthropic.

# pip install "autogen-ext[semantic-kernel-anthropic]"

To use this adapter, you need create a Semantic Kernel model client and pass it to the adapter.

For example, to use the Anthropic model:

import os

from autogen_core.models import UserMessage
from autogen_ext.models.semantic_kernel import SKChatCompletionAdapter
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.anthropic import AnthropicChatCompletion, AnthropicChatPromptExecutionSettings
from semantic_kernel.memory.null_memory import NullMemory

sk_client = AnthropicChatCompletion(
    ai_model_id="claude-3-5-sonnet-20241022",
    api_key=os.environ["ANTHROPIC_API_KEY"],
    service_id="my-service-id",  # Optional; for targeting specific services within Semantic Kernel
)
settings = AnthropicChatPromptExecutionSettings(
    temperature=0.2,
)

anthropic_model_client = SKChatCompletionAdapter(
    sk_client, kernel=Kernel(memory=NullMemory()), prompt_settings=settings
)

# Call the model directly.
model_result = await anthropic_model_client.create(
    messages=[UserMessage(content="What is the capital of France?", source="User")]
)
print(model_result)

finish_reason='stop' content='The capital of France is Paris. It is also the largest city in France and one of the most populous metropolitan areas in Europe.' usage=RequestUsage(prompt_tokens=0, completion_tokens=0) cached=False logprobs=None

Read more about the Semantic Kernel Adapter.