autogen_ext.models.semantic_kernel#

class SKChatCompletionAdapter(sk_client: ChatCompletionClientBase, kernel: Kernel | None = None, prompt_settings: PromptExecutionSettings | None = None, model_info: ModelInfo | None = None, service_id: str | None = None)[source]#

Bases: ChatCompletionClient

SKChatCompletionAdapter is an adapter that allows using Semantic Kernel model clients as Autogen ChatCompletion clients. This makes it possible to seamlessly integrate Semantic Kernel connectors (e.g., Azure OpenAI, Google Gemini, Ollama, etc.) into Autogen agents that rely on a ChatCompletionClient interface.

By leveraging this adapter, you can:

Pass in a Kernel and any supported Semantic Kernel ChatCompletionClientBase connector.
Provide tools (via Autogen Tool or ToolSchema) for function calls during chat completion.
Stream responses or retrieve them in a single request.
Provide prompt settings to control the chat completion behavior either globally through the constructor
or on a per-request basis through the extra_create_args dictionary.

The list of extras that can be installed:

semantic-kernel-anthropic: Install this extra to use Anthropic models.
semantic-kernel-google: Install this extra to use Google Gemini models.
semantic-kernel-ollama: Install this extra to use Ollama models.
semantic-kernel-mistralai: Install this extra to use MistralAI models.
semantic-kernel-aws: Install this extra to use AWS models.
semantic-kernel-hugging-face: Install this extra to use Hugging Face models.

Parameters:

sk_client (ChatCompletionClientBase) – The Semantic Kernel client to wrap (e.g., AzureChatCompletion, GoogleAIChatCompletion, OllamaChatCompletion).
kernel (Optional[Kernel]) – The Semantic Kernel instance to use for executing requests. If not provided, one must be passed in the extra_create_args for each request.
prompt_settings (Optional[PromptExecutionSettings]) – Default prompt execution settings to use. Can be overridden per request.
model_info (Optional[ModelInfo]) – Information about the model’s capabilities.
service_id (Optional[str]) – Optional service identifier.

Examples

Anthropic models with function calling:

pip install "autogen-ext[semantic-kernel-anthropic]"

import asyncio
import os

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_core.models import ModelFamily, UserMessage
from autogen_ext.models.semantic_kernel import SKChatCompletionAdapter
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.anthropic import AnthropicChatCompletion, AnthropicChatPromptExecutionSettings
from semantic_kernel.memory.null_memory import NullMemory


async def get_weather(city: str) -> str:
    """Get the weather for a city."""
    return f"The weather in {city} is 75 degrees."


async def main() -> None:
    sk_client = AnthropicChatCompletion(
        ai_model_id="claude-3-5-sonnet-20241022",
        api_key=os.environ["ANTHROPIC_API_KEY"],
        service_id="my-service-id",  # Optional; for targeting specific services within Semantic Kernel
    )
    settings = AnthropicChatPromptExecutionSettings(
        temperature=0.2,
    )

    model_client = SKChatCompletionAdapter(
        sk_client,
        kernel=Kernel(memory=NullMemory()),
        prompt_settings=settings,
        model_info={
            "function_calling": True,
            "json_output": True,
            "vision": True,
            "family": ModelFamily.CLAUDE_3_5_SONNET,
            "structured_output": True,
        },
    )

    # Call the model directly.
    response = await model_client.create([UserMessage(content="What is the capital of France?", source="test")])
    print(response)

    # Create an assistant agent with the model client.
    assistant = AssistantAgent(
        "assistant", model_client=model_client, system_message="You are a helpful assistant.", tools=[get_weather]
    )
    # Call the assistant with a task.
    await Console(assistant.run_stream(task="What is the weather in Paris and London?"))


asyncio.run(main())

Google Gemini models with function calling:

pip install "autogen-ext[semantic-kernel-google]"

import asyncio
import os

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_core.models import UserMessage, ModelFamily
from autogen_ext.models.semantic_kernel import SKChatCompletionAdapter
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.google.google_ai import (
    GoogleAIChatCompletion,
    GoogleAIChatPromptExecutionSettings,
)
from semantic_kernel.memory.null_memory import NullMemory


def get_weather(city: str) -> str:
    """Get the weather for a city."""
    return f"The weather in {city} is 75 degrees."


async def main() -> None:
    sk_client = GoogleAIChatCompletion(
        gemini_model_id="gemini-2.0-flash",
        api_key=os.environ["GEMINI_API_KEY"],
    )
    settings = GoogleAIChatPromptExecutionSettings(
        temperature=0.2,
    )

    kernel = Kernel(memory=NullMemory())

    model_client = SKChatCompletionAdapter(
        sk_client,
        kernel=kernel,
        prompt_settings=settings,
        model_info={
            "family": ModelFamily.GEMINI_2_0_FLASH,
            "function_calling": True,
            "json_output": True,
            "vision": True,
            "structured_output": True,
        },
    )

    # Call the model directly.
    model_result = await model_client.create(
        messages=[UserMessage(content="What is the capital of France?", source="User")]
    )
    print(model_result)

    # Create an assistant agent with the model client.
    assistant = AssistantAgent(
        "assistant", model_client=model_client, tools=[get_weather], system_message="You are a helpful assistant."
    )
    # Call the assistant with a task.
    stream = assistant.run_stream(task="What is the weather in Paris and London?")
    await Console(stream)


asyncio.run(main())

Ollama models:

pip install "autogen-ext[semantic-kernel-ollama]"

import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_core.models import UserMessage
from autogen_ext.models.semantic_kernel import SKChatCompletionAdapter
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.ollama import OllamaChatCompletion, OllamaChatPromptExecutionSettings
from semantic_kernel.memory.null_memory import NullMemory


async def main() -> None:
    sk_client = OllamaChatCompletion(
        host="http://localhost:11434",
        ai_model_id="llama3.2:latest",
    )
    ollama_settings = OllamaChatPromptExecutionSettings(
        options={"temperature": 0.5},
    )

    model_client = SKChatCompletionAdapter(
        sk_client, kernel=Kernel(memory=NullMemory()), prompt_settings=ollama_settings
    )

    # Call the model directly.
    model_result = await model_client.create(
        messages=[UserMessage(content="What is the capital of France?", source="User")]
    )
    print(model_result)

    # Create an assistant agent with the model client.
    assistant = AssistantAgent("assistant", model_client=model_client)
    # Call the assistant with a task.
    result = await assistant.run(task="What is the capital of France?")
    print(result)


asyncio.run(main())

actual_usage() → RequestUsage[source]#

property capabilities: ModelInfo#

async close() → None[source]#

count_tokens(messages: Sequence[Annotated[SystemMessage | UserMessage | AssistantMessage | FunctionExecutionResultMessage, FieldInfo(annotation=NoneType, required=True, discriminator='type')]], *, tools: Sequence[Tool | ToolSchema] = []) → int[source]#

async create(messages: Sequence[Annotated[SystemMessage | UserMessage | AssistantMessage | FunctionExecutionResultMessage, FieldInfo(annotation=NoneType, required=True, discriminator='type')]], *, tools: Sequence[Tool | ToolSchema] = [], json_output: bool | type[BaseModel] | None = None, extra_create_args: Mapping[str, Any] = {}, cancellation_token: CancellationToken | None = None) → CreateResult[source]#

Create a chat completion using the Semantic Kernel client.

The extra_create_args dictionary can include two special keys:

“kernel” (optional):
An instance of semantic_kernel.Kernel used to execute the request. If not provided either in constructor or extra_create_args, a ValueError is raised.
“prompt_execution_settings” (optional):
An instance of a PromptExecutionSettings subclass corresponding to the underlying Semantic Kernel client (e.g., AzureChatPromptExecutionSettings, GoogleAIChatPromptExecutionSettings). If not provided, the adapter’s default prompt settings will be used.

Parameters:

messages – The list of LLM messages to send.
tools – The tools that may be invoked during the chat.
json_output – Whether the model is expected to return JSON.
extra_create_args – Additional arguments to control the chat completion behavior.
cancellation_token – Token allowing cancellation of the request.

Returns:

CreateResult – The result of the chat completion.

async create_stream(messages: Sequence[Annotated[SystemMessage | UserMessage | AssistantMessage | FunctionExecutionResultMessage, FieldInfo(annotation=NoneType, required=True, discriminator='type')]], *, tools: Sequence[Tool | ToolSchema] = [], json_output: bool | type[BaseModel] | None = None, extra_create_args: Mapping[str, Any] = {}, cancellation_token: CancellationToken | None = None) → AsyncGenerator[str | CreateResult, None][source]#

Create a streaming chat completion using the Semantic Kernel client.

The extra_create_args dictionary can include two special keys:

“kernel” (optional):
An instance of semantic_kernel.Kernel used to execute the request. If not provided either in constructor or extra_create_args, a ValueError is raised.
“prompt_execution_settings” (optional):
An instance of a PromptExecutionSettings subclass corresponding to the underlying Semantic Kernel client (e.g., AzureChatPromptExecutionSettings, GoogleAIChatPromptExecutionSettings). If not provided, the adapter’s default prompt settings will be used.

Parameters:

messages – The list of LLM messages to send.
tools – The tools that may be invoked during the chat.
json_output – Whether the model is expected to return JSON.
extra_create_args – Additional arguments to control the chat completion behavior.
cancellation_token – Token allowing cancellation of the request.

Yields:

Union[str, CreateResult] – Either a string chunk of the response or a CreateResult containing function calls.

property model_info: ModelInfo#

remaining_tokens(messages: Sequence[Annotated[SystemMessage | UserMessage | AssistantMessage | FunctionExecutionResultMessage, FieldInfo(annotation=NoneType, required=True, discriminator='type')]], *, tools: Sequence[Tool | ToolSchema] = []) → int[source]#

total_usage() → RequestUsage[source]#