{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model Clients\n",
    "\n",
    "AutoGen provides a suite of built-in model clients for using ChatCompletion API.\n",
    "All model clients implement the {py:class}`~autogen_core.models.ChatCompletionClient` protocol class."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Currently there are three built-in model clients:\n",
    "* {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient`\n",
    "* {py:class}`~autogen_ext.models.openai.AzureOpenAIChatCompletionClient`\n",
    "* {py:class}`~autogen_ext.models.azure.AzureAIChatCompletionClient`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## OpenAI\n",
    "\n",
    "To use the {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient`, you need to install the `openai` extra."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "# pip install \"autogen-ext[openai]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You also need to provide the API key\n",
    "either through the environment variable `OPENAI_API_KEY` or through the `api_key` argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogen_core.models import UserMessage\n",
    "from autogen_ext.models.openai import OpenAIChatCompletionClient\n",
    "\n",
    "# Create an OpenAI model client.\n",
    "model_client = OpenAIChatCompletionClient(\n",
    "    model=\"gpt-4o\",\n",
    "    # api_key=\"sk-...\", # Optional if you have an API key set in the environment.\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can call the {py:meth}`~autogen_ext.models.openai.BaseOpenAIChatCompletionClient.create` method to create a\n",
    "chat completion request, and await for an {py:class}`~autogen_core.models.CreateResult` object in return."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The capital of France is Paris.\n"
     ]
    }
   ],
   "source": [
    "# Send a message list to the model and await the response.\n",
    "messages = [\n",
    "    UserMessage(content=\"What is the capital of France?\", source=\"user\"),\n",
    "]\n",
    "response = await model_client.create(messages=messages)\n",
    "\n",
    "# Print the response\n",
    "print(response.content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RequestUsage(prompt_tokens=15, completion_tokens=7)\n"
     ]
    }
   ],
   "source": [
    "# Print the response token usage\n",
    "print(response.usage)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Azure OpenAI\n",
    "\n",
    "To use the {py:class}`~autogen_ext.models.openai.AzureOpenAIChatCompletionClient`, you need to provide\n",
    "the deployment id, Azure Cognitive Services endpoint, api version, and model capabilities.\n",
    "For authentication, you can either provide an API key or an Azure Active Directory (AAD) token credential."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "# pip install \"autogen-ext[openai,azure]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code snippet shows how to use AAD authentication.\n",
    "The identity used must be assigned the [**Cognitive Services OpenAI User**](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/role-based-access-control#cognitive-services-openai-user) role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogen_ext.models.openai import AzureOpenAIChatCompletionClient\n",
    "from azure.identity import DefaultAzureCredential, get_bearer_token_provider\n",
    "\n",
    "# Create the token provider\n",
    "token_provider = get_bearer_token_provider(DefaultAzureCredential(), \"https://cognitiveservices.azure.com/.default\")\n",
    "\n",
    "az_model_client = AzureOpenAIChatCompletionClient(\n",
    "    azure_deployment=\"{your-azure-deployment}\",\n",
    "    model=\"{model-name, such as gpt-4o}\",\n",
    "    api_version=\"2024-06-01\",\n",
    "    azure_endpoint=\"https://{your-custom-endpoint}.openai.azure.com/\",\n",
    "    azure_ad_token_provider=token_provider,  # Optional if you choose key-based authentication.\n",
    "    # api_key=\"sk-...\", # For key-based authentication.\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{note}\n",
    "See [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity#chat-completions) for how to use the Azure client directly or for more info.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Azure AI Foundry\n",
    "\n",
    "[Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-studio/) (previously known as Azure AI Studio) offers models hosted on Azure.\n",
    "To use those models, you use the {py:class}`~autogen_ext.models.azure.AzureAIChatCompletionClient`.\n",
    "\n",
    "You need to install the `azure` extra to use this client."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "# pip install \"autogen-ext[openai,azure]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below is an example of using this client with the Phi-4 model from [GitHub Marketplace](https://github.com/marketplace/models)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "finish_reason='stop' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=14, completion_tokens=8) cached=False logprobs=None\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "from autogen_core.models import UserMessage\n",
    "from autogen_ext.models.azure import AzureAIChatCompletionClient\n",
    "from azure.core.credentials import AzureKeyCredential\n",
    "\n",
    "client = AzureAIChatCompletionClient(\n",
    "    model=\"Phi-4\",\n",
    "    endpoint=\"https://models.inference.ai.azure.com\",\n",
    "    # To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings.\n",
    "    # Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens\n",
    "    credential=AzureKeyCredential(os.environ[\"GITHUB_TOKEN\"]),\n",
    "    model_info={\n",
    "        \"json_output\": False,\n",
    "        \"function_calling\": False,\n",
    "        \"vision\": False,\n",
    "        \"family\": \"unknown\",\n",
    "    },\n",
    ")\n",
    "\n",
    "result = await client.create([UserMessage(content=\"What is the capital of France?\", source=\"user\")])\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ollama\n",
    "\n",
    "You can use the {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient` to interact with OpenAI-compatible APIs such as Ollama and Gemini (beta).\n",
    "The below example shows how to use a local model running on [Ollama](https://ollama.com) server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "finish_reason='unknown' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=32, completion_tokens=8) cached=False logprobs=None\n"
     ]
    }
   ],
   "source": [
    "from autogen_core.models import UserMessage\n",
    "from autogen_ext.models.openai import OpenAIChatCompletionClient\n",
    "\n",
    "model_client = OpenAIChatCompletionClient(\n",
    "    model=\"llama3.2:latest\",\n",
    "    base_url=\"http://localhost:11434/v1\",\n",
    "    api_key=\"placeholder\",\n",
    "    model_info={\n",
    "        \"vision\": False,\n",
    "        \"function_calling\": True,\n",
    "        \"json_output\": False,\n",
    "        \"family\": \"unknown\",\n",
    "    },\n",
    ")\n",
    "\n",
    "response = await model_client.create([UserMessage(content=\"What is the capital of France?\", source=\"user\")])\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gemini (beta)\n",
    "\n",
    "The below example shows how to use the Gemini model via the {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "finish_reason='stop' content='Paris\\n' usage=RequestUsage(prompt_tokens=7, completion_tokens=2) cached=False logprobs=None thought=None\n"
     ]
    }
   ],
   "source": [
    "from autogen_core.models import UserMessage\n",
    "from autogen_ext.models.openai import OpenAIChatCompletionClient\n",
    "\n",
    "model_client = OpenAIChatCompletionClient(\n",
    "    model=\"gemini-1.5-flash-8b\",\n",
    "    # api_key=\"GEMINI_API_KEY\",\n",
    ")\n",
    "\n",
    "response = await model_client.create([UserMessage(content=\"What is the capital of France?\", source=\"user\")])\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Semantic Kernel Adapter\n",
    "\n",
    "The {py:class}`~autogen_ext.models.semantic_kernel.SKChatCompletionAdapter`\n",
    "allows you to use Semantic kernel model clients as a\n",
    "{py:class}`~autogen_core.models.ChatCompletionClient` by adapting them to the required interface.\n",
    "\n",
    "You need to install the relevant provider extras to use this adapter. \n",
    "\n",
    "The list of extras that can be installed:\n",
    "\n",
    "- `semantic-kernel-anthropic`: Install this extra to use Anthropic models.\n",
    "- `semantic-kernel-google`: Install this extra to use Google Gemini models.\n",
    "- `semantic-kernel-ollama`: Install this extra to use Ollama models.\n",
    "- `semantic-kernel-mistralai`: Install this extra to use MistralAI models.\n",
    "- `semantic-kernel-aws`: Install this extra to use AWS models.\n",
    "- `semantic-kernel-hugging-face`: Install this extra to use Hugging Face models.\n",
    "\n",
    "For example, to use Anthropic models, you need to install `semantic-kernel-anthropic`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "# pip install \"autogen-ext[semantic-kernel-anthropic]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To use this adapter, you need create a Semantic Kernel model client and pass it to the adapter.\n",
    "\n",
    "For example, to use the Anthropic model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "finish_reason='stop' content='The capital of France is Paris. It is also the largest city in France and one of the most populous metropolitan areas in Europe.' usage=RequestUsage(prompt_tokens=0, completion_tokens=0) cached=False logprobs=None\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "from autogen_core.models import UserMessage\n",
    "from autogen_ext.models.semantic_kernel import SKChatCompletionAdapter\n",
    "from semantic_kernel import Kernel\n",
    "from semantic_kernel.connectors.ai.anthropic import AnthropicChatCompletion, AnthropicChatPromptExecutionSettings\n",
    "from semantic_kernel.memory.null_memory import NullMemory\n",
    "\n",
    "sk_client = AnthropicChatCompletion(\n",
    "    ai_model_id=\"claude-3-5-sonnet-20241022\",\n",
    "    api_key=os.environ[\"ANTHROPIC_API_KEY\"],\n",
    "    service_id=\"my-service-id\",  # Optional; for targeting specific services within Semantic Kernel\n",
    ")\n",
    "settings = AnthropicChatPromptExecutionSettings(\n",
    "    temperature=0.2,\n",
    ")\n",
    "\n",
    "anthropic_model_client = SKChatCompletionAdapter(\n",
    "    sk_client, kernel=Kernel(memory=NullMemory()), prompt_settings=settings\n",
    ")\n",
    "\n",
    "# Call the model directly.\n",
    "model_result = await anthropic_model_client.create(\n",
    "    messages=[UserMessage(content=\"What is the capital of France?\", source=\"User\")]\n",
    ")\n",
    "print(model_result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Read more about the [Semantic Kernel Adapter](../../../reference/python/autogen_ext.models.semantic_kernel.rst)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Streaming Response\n",
    "\n",
    "You can use the {py:meth}`~autogen_ext.models.openai.BaseOpenAIChatCompletionClient.create_stream` method to create a\n",
    "chat completion request with streaming response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Streamed responses:\n",
      "In the heart of an ancient forest, beneath the shadow of snow-capped peaks, a dragon named Elara lived secretly for centuries. Elara was unlike any dragon from the old tales; her scales shimmered with a deep emerald hue, each scale engraved with symbols of lost wisdom. The villagers in the nearby valley spoke of mysterious lights dancing across the night sky, but none dared venture close enough to solve the enigma.\n",
      "\n",
      "One cold winter's eve, a young girl named Lira, brimming with curiosity and armed with the innocence of youth, wandered into Elara’s domain. Instead of fire and fury, she found warmth and a gentle gaze. The dragon shared stories of a world long forgotten and in return, Lira gifted her simple stories of human life, rich in laughter and scent of earth.\n",
      "\n",
      "From that night on, the villagers noticed subtle changes—the crops grew taller, and the air seemed sweeter. Elara had infused the valley with ancient magic, a guardian of balance, watching quietly as her new friend thrived under the stars. And so, Lira and Elara’s bond marked the beginning of a timeless friendship that spun tales of hope whispered through the leaves of the ever-verdant forest.\n",
      "\n",
      "------------\n",
      "\n",
      "The complete response:\n",
      "In the heart of an ancient forest, beneath the shadow of snow-capped peaks, a dragon named Elara lived secretly for centuries. Elara was unlike any dragon from the old tales; her scales shimmered with a deep emerald hue, each scale engraved with symbols of lost wisdom. The villagers in the nearby valley spoke of mysterious lights dancing across the night sky, but none dared venture close enough to solve the enigma.\n",
      "\n",
      "One cold winter's eve, a young girl named Lira, brimming with curiosity and armed with the innocence of youth, wandered into Elara’s domain. Instead of fire and fury, she found warmth and a gentle gaze. The dragon shared stories of a world long forgotten and in return, Lira gifted her simple stories of human life, rich in laughter and scent of earth.\n",
      "\n",
      "From that night on, the villagers noticed subtle changes—the crops grew taller, and the air seemed sweeter. Elara had infused the valley with ancient magic, a guardian of balance, watching quietly as her new friend thrived under the stars. And so, Lira and Elara’s bond marked the beginning of a timeless friendship that spun tales of hope whispered through the leaves of the ever-verdant forest.\n",
      "\n",
      "\n",
      "------------\n",
      "\n",
      "The token usage was:\n",
      "RequestUsage(prompt_tokens=0, completion_tokens=0)\n"
     ]
    }
   ],
   "source": [
    "messages = [\n",
    "    UserMessage(content=\"Write a very short story about a dragon.\", source=\"user\"),\n",
    "]\n",
    "\n",
    "# Create a stream.\n",
    "stream = model_client.create_stream(messages=messages)\n",
    "\n",
    "# Iterate over the stream and print the responses.\n",
    "print(\"Streamed responses:\")\n",
    "async for response in stream:  # type: ignore\n",
    "    if isinstance(response, str):\n",
    "        # A partial response is a string.\n",
    "        print(response, flush=True, end=\"\")\n",
    "    else:\n",
    "        # The last response is a CreateResult object with the complete message.\n",
    "        print(\"\\n\\n------------\\n\")\n",
    "        print(\"The complete response:\", flush=True)\n",
    "        print(response.content, flush=True)\n",
    "        print(\"\\n\\n------------\\n\")\n",
    "        print(\"The token usage was:\", flush=True)\n",
    "        print(response.usage, flush=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{note}\n",
    "The last response in the streaming response is always the final response\n",
    "of the type {py:class}`~autogen_core.models.CreateResult`.\n",
    "```\n",
    "\n",
    "```{note}\n",
    "The default usage response is to return zero values\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparing usage returns in the above Non Streaming `model_client.create(messages=messages)` vs streaming `model_client.create_stream(messages=messages)` we see differences.\n",
    "The non streaming response by default returns valid prompt and completion token usage counts. \n",
    "The streamed response by default returns zero values.\n",
    "\n",
    "as documented in the OPENAI API Reference an additional parameter `stream_options` can be specified to return valid usage counts. see [stream_options](https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options)\n",
    "\n",
    "Only set this when you using streaming ie , using `create_stream` \n",
    "\n",
    "to enable this in `create_stream` set `extra_create_args={\"stream_options\": {\"include_usage\": True}},`\n",
    "\n",
    "```{note}\n",
    "Note whilst other API's like LiteLLM also support this, it is not always guarenteed that it is fully supported or correct.\n",
    "\n",
    "See the example below for how to use the `stream_options` parameter to return usage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Streamed responses:\n",
      "In a lush, emerald valley hidden by towering peaks, there lived a dragon named Ember. Unlike others of her kind, Ember cherished solitude over treasure, and the songs of the stream over the roar of flames. One misty dawn, a young shepherd stumbled into her sanctuary, lost and frightened. \n",
      "\n",
      "Instead of fury, he was met with kindness as Ember extended a wing, guiding him back to safety. In gratitude, the shepherd visited yearly, bringing tales of his world beyond the mountains. Over time, a friendship blossomed, binding man and dragon in shared stories and laughter.\n",
      "\n",
      "As the years passed, the legend of Ember the gentle-hearted spread far and wide, forever changing the way dragons were seen in the hearts of many.\n",
      "\n",
      "------------\n",
      "\n",
      "The complete response:\n",
      "In a lush, emerald valley hidden by towering peaks, there lived a dragon named Ember. Unlike others of her kind, Ember cherished solitude over treasure, and the songs of the stream over the roar of flames. One misty dawn, a young shepherd stumbled into her sanctuary, lost and frightened. \n",
      "\n",
      "Instead of fury, he was met with kindness as Ember extended a wing, guiding him back to safety. In gratitude, the shepherd visited yearly, bringing tales of his world beyond the mountains. Over time, a friendship blossomed, binding man and dragon in shared stories and laughter.\n",
      "\n",
      "As the years passed, the legend of Ember the gentle-hearted spread far and wide, forever changing the way dragons were seen in the hearts of many.\n",
      "\n",
      "\n",
      "------------\n",
      "\n",
      "The token usage was:\n",
      "RequestUsage(prompt_tokens=17, completion_tokens=146)\n"
     ]
    }
   ],
   "source": [
    "messages = [\n",
    "    UserMessage(content=\"Write a very short story about a dragon.\", source=\"user\"),\n",
    "]\n",
    "\n",
    "# Create a stream.\n",
    "stream = model_client.create_stream(messages=messages, extra_create_args={\"stream_options\": {\"include_usage\": True}})\n",
    "\n",
    "# Iterate over the stream and print the responses.\n",
    "print(\"Streamed responses:\")\n",
    "async for response in stream:  # type: ignore\n",
    "    if isinstance(response, str):\n",
    "        # A partial response is a string.\n",
    "        print(response, flush=True, end=\"\")\n",
    "    else:\n",
    "        # The last response is a CreateResult object with the complete message.\n",
    "        print(\"\\n\\n------------\\n\")\n",
    "        print(\"The complete response:\", flush=True)\n",
    "        print(response.content, flush=True)\n",
    "        print(\"\\n\\n------------\\n\")\n",
    "        print(\"The token usage was:\", flush=True)\n",
    "        print(response.usage, flush=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Structured Output\n",
    "\n",
    "Structured output can be enabled by setting the `response_format` field in\n",
    "{py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient` and {py:class}`~autogen_ext.models.openai.AzureOpenAIChatCompletionClient` to\n",
    "as a [Pydantic BaseModel](https://docs.pydantic.dev/latest/concepts/models/) class.\n",
    "\n",
    "```{note}\n",
    "Structured output is only available for models that support it. It also\n",
    "requires the model client to support structured output as well.\n",
    "Currently, the {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient`\n",
    "and {py:class}`~autogen_ext.models.openai.AzureOpenAIChatCompletionClient`\n",
    "support structured output.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "I'm glad to hear that you're feeling happy! It's such a great emotion that can brighten your whole day. Is there anything in particular that's bringing you joy today? 😊\n",
      "happy\n"
     ]
    }
   ],
   "source": [
    "from typing import Literal\n",
    "\n",
    "from pydantic import BaseModel\n",
    "\n",
    "\n",
    "# The response format for the agent as a Pydantic base model.\n",
    "class AgentResponse(BaseModel):\n",
    "    thoughts: str\n",
    "    response: Literal[\"happy\", \"sad\", \"neutral\"]\n",
    "\n",
    "\n",
    "# Create an agent that uses the OpenAI GPT-4o model with the custom response format.\n",
    "model_client = OpenAIChatCompletionClient(\n",
    "    model=\"gpt-4o\",\n",
    "    response_format=AgentResponse,  # type: ignore\n",
    ")\n",
    "\n",
    "# Send a message list to the model and await the response.\n",
    "messages = [\n",
    "    UserMessage(content=\"I am happy.\", source=\"user\"),\n",
    "]\n",
    "response = await model_client.create(messages=messages)\n",
    "assert isinstance(response.content, str)\n",
    "parsed_response = AgentResponse.model_validate_json(response.content)\n",
    "print(parsed_response.thoughts)\n",
    "print(parsed_response.response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You also use the `extra_create_args` parameter in the {py:meth}`~autogen_ext.models.openai.BaseOpenAIChatCompletionClient.create` method\n",
    "to set the `response_format` field so that the structured output can be configured for each request."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Caching Model Responses\n",
    "\n",
    "`autogen_ext` implements {py:class}`~autogen_ext.models.cache.ChatCompletionCache` that can wrap any {py:class}`~autogen_core.models.ChatCompletionClient`. Using this wrapper avoids incurring token usage when querying the underlying client with the same prompt multiple times.\n",
    "\n",
    "{py:class}`~autogen_core.models.ChatCompletionCache` uses a {py:class}`~autogen_core.CacheStore` protocol. We have implemented some useful variants of {py:class}`~autogen_core.CacheStore` including {py:class}`~autogen_ext.cache_store.diskcache.DiskCacheStore` and {py:class}`~autogen_ext.cache_store.redis.RedisStore`.\n",
    "\n",
    "Here's an example of using `diskcache` for local caching:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "# pip install -U \"autogen-ext[openai, diskcache]\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "import asyncio\n",
    "import tempfile\n",
    "\n",
    "from autogen_core.models import UserMessage\n",
    "from autogen_ext.cache_store.diskcache import DiskCacheStore\n",
    "from autogen_ext.models.cache import CHAT_CACHE_VALUE_TYPE, ChatCompletionCache\n",
    "from autogen_ext.models.openai import OpenAIChatCompletionClient\n",
    "from diskcache import Cache\n",
    "\n",
    "\n",
    "async def main() -> None:\n",
    "    with tempfile.TemporaryDirectory() as tmpdirname:\n",
    "        # Initialize the original client\n",
    "        openai_model_client = OpenAIChatCompletionClient(model=\"gpt-4o\")\n",
    "\n",
    "        # Then initialize the CacheStore, in this case with diskcache.Cache.\n",
    "        # You can also use redis like:\n",
    "        # from autogen_ext.cache_store.redis import RedisStore\n",
    "        # import redis\n",
    "        # redis_instance = redis.Redis()\n",
    "        # cache_store = RedisCacheStore[CHAT_CACHE_VALUE_TYPE](redis_instance)\n",
    "        cache_store = DiskCacheStore[CHAT_CACHE_VALUE_TYPE](Cache(tmpdirname))\n",
    "        cache_client = ChatCompletionCache(openai_model_client, cache_store)\n",
    "\n",
    "        response = await cache_client.create([UserMessage(content=\"Hello, how are you?\", source=\"user\")])\n",
    "        print(response)  # Should print response from OpenAI\n",
    "        response = await cache_client.create([UserMessage(content=\"Hello, how are you?\", source=\"user\")])\n",
    "        print(response)  # Should print cached response\n",
    "\n",
    "\n",
    "asyncio.run(main())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inspecting `cached_client.total_usage()` (or `model_client.total_usage()`) before and after a cached response should yield idential counts.\n",
    "\n",
    "Note that the caching is sensitive to the exact arguments provided to `cached_client.create` or `cached_client.create_stream`, so changing `tools` or `json_output` arguments might lead to a cache miss."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build Agent using Model Client\n",
    "\n",
    "Let's create a simple AI agent that can respond to messages using the ChatCompletion API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dataclasses import dataclass\n",
    "\n",
    "from autogen_core import MessageContext, RoutedAgent, SingleThreadedAgentRuntime, message_handler\n",
    "from autogen_core.models import ChatCompletionClient, SystemMessage, UserMessage\n",
    "from autogen_ext.models.openai import OpenAIChatCompletionClient\n",
    "\n",
    "\n",
    "@dataclass\n",
    "class Message:\n",
    "    content: str\n",
    "\n",
    "\n",
    "class SimpleAgent(RoutedAgent):\n",
    "    def __init__(self, model_client: ChatCompletionClient) -> None:\n",
    "        super().__init__(\"A simple agent\")\n",
    "        self._system_messages = [SystemMessage(content=\"You are a helpful AI assistant.\")]\n",
    "        self._model_client = model_client\n",
    "\n",
    "    @message_handler\n",
    "    async def handle_user_message(self, message: Message, ctx: MessageContext) -> Message:\n",
    "        # Prepare input to the chat completion model.\n",
    "        user_message = UserMessage(content=message.content, source=\"user\")\n",
    "        response = await self._model_client.create(\n",
    "            self._system_messages + [user_message], cancellation_token=ctx.cancellation_token\n",
    "        )\n",
    "        # Return with the model's response.\n",
    "        assert isinstance(response.content, str)\n",
    "        return Message(content=response.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `SimpleAgent` class is a subclass of the\n",
    "{py:class}`autogen_core.RoutedAgent` class for the convenience of automatically routing messages to the appropriate handlers.\n",
    "It has a single handler, `handle_user_message`, which handles message from the user. It uses the `ChatCompletionClient` to generate a response to the message.\n",
    "It then returns the response to the user, following the direct communication model.\n",
    "\n",
    "```{note}\n",
    "The `cancellation_token` of the type {py:class}`autogen_core.CancellationToken` is used to cancel\n",
    "asynchronous operations. It is linked to async calls inside the message handlers\n",
    "and can be used by the caller to cancel the handlers.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Seattle is a vibrant city with a wide range of activities and attractions. Here are some fun things to do in Seattle:\n",
      "\n",
      "1. **Space Needle**: Visit this iconic observation tower for stunning views of the city and surrounding mountains.\n",
      "\n",
      "2. **Pike Place Market**: Explore this historic market where you can see the famous fish toss, buy local produce, and find unique crafts and eateries.\n",
      "\n",
      "3. **Museum of Pop Culture (MoPOP)**: Dive into the world of contemporary culture, music, and science fiction at this interactive museum.\n",
      "\n",
      "4. **Chihuly Garden and Glass**: Marvel at the beautiful glass art installations by artist Dale Chihuly, located right next to the Space Needle.\n",
      "\n",
      "5. **Seattle Aquarium**: Discover the diverse marine life of the Pacific Northwest at this engaging aquarium.\n",
      "\n",
      "6. **Seattle Art Museum**: Explore a vast collection of art from around the world, including contemporary and indigenous art.\n",
      "\n",
      "7. **Kerry Park**: For one of the best views of the Seattle skyline, head to this small park on Queen Anne Hill.\n",
      "\n",
      "8. **Ballard Locks**: Watch boats pass through the locks and observe the salmon ladder to see salmon migrating.\n",
      "\n",
      "9. **Ferry to Bainbridge Island**: Take a scenic ferry ride across Puget Sound to enjoy charming shops, restaurants, and beautiful natural scenery.\n",
      "\n",
      "10. **Olympic Sculpture Park**: Stroll through this outdoor park with large-scale sculptures and stunning views of the waterfront and mountains.\n",
      "\n",
      "11. **Underground Tour**: Discover Seattle's history on this quirky tour of the city's underground passageways in Pioneer Square.\n",
      "\n",
      "12. **Seattle Waterfront**: Enjoy the shops, restaurants, and attractions along the waterfront, including the Seattle Great Wheel and the aquarium.\n",
      "\n",
      "13. **Discovery Park**: Explore the largest green space in Seattle, featuring trails, beaches, and views of Puget Sound.\n",
      "\n",
      "14. **Food Tours**: Try out Seattle’s diverse culinary scene, including fresh seafood, international cuisines, and coffee culture (don’t miss the original Starbucks!).\n",
      "\n",
      "15. **Attend a Sports Game**: Catch a Seahawks (NFL), Mariners (MLB), or Sounders (MLS) game for a lively local experience.\n",
      "\n",
      "Whether you're interested in culture, nature, food, or history, Seattle has something for everyone to enjoy!\n"
     ]
    }
   ],
   "source": [
    "# Create the runtime and register the agent.\n",
    "from autogen_core import AgentId\n",
    "\n",
    "runtime = SingleThreadedAgentRuntime()\n",
    "await SimpleAgent.register(\n",
    "    runtime,\n",
    "    \"simple_agent\",\n",
    "    lambda: SimpleAgent(\n",
    "        OpenAIChatCompletionClient(\n",
    "            model=\"gpt-4o-mini\",\n",
    "            # api_key=\"sk-...\", # Optional if you have an OPENAI_API_KEY set in the environment.\n",
    "        )\n",
    "    ),\n",
    ")\n",
    "# Start the runtime processing messages.\n",
    "runtime.start()\n",
    "# Send a message to the agent and get the response.\n",
    "message = Message(\"Hello, what are some fun things to do in Seattle?\")\n",
    "response = await runtime.send_message(message, AgentId(\"simple_agent\", \"default\"))\n",
    "print(response.content)\n",
    "# Stop the runtime processing messages.\n",
    "await runtime.stop()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above `SimpleAgent` always responds with a fresh context that contains only\n",
    "the system message and the latest user's message.\n",
    "We can use model context classes from {py:mod}`autogen_core.model_context`\n",
    "to make the agent \"remember\" previous conversations.\n",
    "See the [Model Context](./model-context.ipynb) page for more details."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}