Skip to main content

9 posts tagged with "AutoGen"

View All Tags

· 4 min read

What are they doing?

One year ago, we launched AutoGen, a programming framework designed to build agentic AI systems. The release of AutoGen sparked massive interest within the developer community. As an early release, it provided us with a unique opportunity to engage deeply with users, gather invaluable feedback, and learn from a diverse range of use cases and contributions. By listening and engaging with the community, we gained insights into what people were building or attempting to build, how they were approaching the creation of agentic systems, and where they were struggling. This experience was both humbling and enlightening, revealing significant opportunities for improvement in our initial design, especially for power users developing production-level applications with AutoGen.

Through engagements with the community, we learned many lessons:

  • Developers value modular and reusable agents. For example, our built-in agents that could be directly plugged in or easily customized for specific use cases were particularly popular. At the same time, there was a desire for more customizability, such as integrating custom agents built using other programming languages or frameworks.
  • Chat-based agent-to-agent communication was an intuitive collaboration pattern, making it easy for developers to get started and involve humans in the loop. As developers began to employ agents in a wider range of scenarios, they sought more flexibility in collaboration patterns. For instance, developers wanted to build predictable, ordered workflows with agents, and to integrate them with new user interfaces that are not chat-based.
  • Although it was easy for developers to get started with AutoGen, debugging and scaling agent teams applications proved more challenging.
  • There were many opportunities for improving code quality.

These learnings, along with many others from other agentic efforts across Microsoft, prompted us to take a step back and lay the groundwork for a new direction. A few months ago, we started dedicating time to distilling these learnings into a roadmap for the future of AutoGen. This led to the development of AutoGen 0.4, a complete redesign of the framework from the foundation up. AutoGen 0.4 embraces the actor model of computing to support distributed, highly scalable, event-driven agentic systems. This approach offers many advantages, such as:

  • Composability. Systems designed in this way are more composable, allowing developers to bring their own agents implemented in different frameworks or programming languages and to build more powerful systems using complex agentic patterns.
  • Flexibility. It allows for the creation of both deterministic, ordered workflows and event-driven or decentralized workflows, enabling customers to bring their own orchestration or integrate with other systems more easily. It also opens more opportunities for human-in-the-loop scenarios, both active and reactive.
  • Debugging and Observability. Event-driven communication moves message delivery away from agents to a centralized component, making it easier to observe and debug their activities regardless of agent implementation.
  • Scalability. An event-based architecture enables distributed and cloud-deployed agents, which is essential for building scalable AI services and applications.

Today, we are delighted to share our progress and invite everyone to collaborate with us and provide feedback to evolve AutoGen and help shape the future of multi-agent systems.

As the first step, we are opening a pull request into the main branch with the current state of development of 0.4. After approximately a week, we plan to merge this into main and continue development. There's still a lot left to do before 0.4 is ready for release though, so keep in mind this is a work in progress.

Starting in AutoGen 0.4, the project will have three main libraries:

  • Core - the building blocks for an event-driven agentic system.
  • AgentChat - a task-driven, high-level API built with core, including group chat, code execution, pre-built agents, and more. This is the most similar API to AutoGen 0.2 and will be the easiest API to migrate to.
  • Extensions - implementations of core interfaces and third-party integrations (e.g., Azure code executor and OpenAI model client).

AutoGen 0.2 is still available, developed and maintained out of the 0.2 branch. For everyone looking for a stable version, we recommend continuing to use 0.2 for the time being. It can be installed using:

pip install autogen-agentchat~=0.2

This new package name was used to align with the new packages that will come with 0.4: autogen-core, autogen-agentchat, and autogen-ext.

Lastly, we will be using GitHub Discussion as the official community forum for the new version and, going forward, all discussions related to the AutoGen project. We look forward to meeting you there.

· 5 min read
Alex Reibman
Braelyn Boynton
AgentOps and AutoGen

TL;DR

  • AutoGen® offers detailed multi-agent observability with AgentOps.
  • AgentOps offers the best experience for developers building with AutoGen in just two lines of code.
  • Enterprises can now trust AutoGen in production with detailed monitoring and logging from AgentOps.

AutoGen is excited to announce an integration with AgentOps, the industry leader in agent observability and compliance. Back in February, Bloomberg declared 2024 the year of AI Agents. And it's true! We've seen AI transform from simplistic chatbots to autonomously making decisions and completing tasks on a user's behalf.

However, as with most new technologies, companies and engineering teams can be slow to develop processes and best practices. One part of the agent workflow we're betting on is the importance of observability. Letting your agents run wild might work for a hobby project, but if you're building enterprise-grade agents for production, it's crucial to understand where your agents are succeeding and failing. Observability isn't just an option; it's a requirement.

As agents evolve into even more powerful and complex tools, you should view them increasingly as tools designed to augment your team's capabilities. Agents will take on more prominent roles and responsibilities, take action, and provide immense value. However, this means you must monitor your agents the same way a good manager maintains visibility over their personnel. AgentOps offers developers observability for debugging and detecting failures. It provides the tools to monitor all the key metrics your agents use in one easy-to-read dashboard. Monitoring is more than just a “nice to have”; it's a critical component for any team looking to build and scale AI agents.

What is Agent Observability?

Agent observability, in its most basic form, allows you to monitor, troubleshoot, and clarify the actions of your agent during its operation. The ability to observe every detail of your agent's activity, right down to a timestamp, enables you to trace its actions precisely, identify areas for improvement, and understand the reasons behind any failures — a key aspect of effective debugging. Beyond enhancing diagnostic precision, this level of observability is integral for your system's reliability. Think of it as the ability to identify and address issues before they spiral out of control. Observability isn't just about keeping things running smoothly and maximizing uptime; it's about strengthening your agent-based solutions.

AI agent observability

Why AgentOps?

AutoGen has simplified the process of building agents, yet we recognized the need for an easy-to-use, native tool for observability. We've previously discussed AgentOps, and now we're excited to partner with AgentOps as our official agent observability tool. Integrating AgentOps with AutoGen simplifies your workflow and boosts your agents' performance through clear observability, ensuring they operate optimally. For more details, check out our AgentOps documentation.

Agent Session Replay

Enterprises and enthusiasts trust AutoGen as the leader in building agents. With our partnership with AgentOps, developers can now natively debug agents for efficiency and ensure compliance, providing a comprehensive audit trail for all of your agents' activities. AgentOps allows you to monitor LLM calls, costs, latency, agent failures, multi-agent interactions, tool usage, session-wide statistics, and more all from one dashboard.

By combining the agent-building capabilities of AutoGen with the observability tools of AgentOps, we're providing our users with a comprehensive solution that enhances agent performance and reliability. This collaboration establishes that enterprises can confidently deploy AI agents in production environments, knowing they have the best tools to monitor, debug, and optimize their agents.

The best part is that it only takes two lines of code. All you need to do is set an AGENTOPS_API_KEY in your environment (Get API key here: https://app.agentops.ai/account) and call agentops.init():

import os
import agentops

agentops.init(os.environ["AGENTOPS_API_KEY"])

AgentOps's Features

AgentOps includes all the functionality you need to ensure your agents are suitable for real-world, scalable solutions.

AgentOps overview dashboard
  • Analytics Dashboard: The AgentOps Analytics Dashboard allows you to configure and assign agents and automatically track what actions each agent is taking simultaneously. When used with AutoGen, AgentOps is automatically configured for multi-agent compatibility, allowing users to track multiple agents across runs easily. Instead of a terminal-level screen, AgentOps provides a superior user experience with its intuitive interface.
  • Tracking LLM Costs: Cost tracking is natively set up within AgentOps and provides a rolling total. This allows developers to see and track their run costs and accurately predict future costs.
  • Recursive Thought Detection: One of the most frustrating aspects of agents is when they get trapped and perform the same task repeatedly for hours on end. AgentOps can identify when agents fall into infinite loops, ensuring efficiency and preventing wasteful computation.

AutoGen users also have access to the following features in AgentOps:

  • Replay Analytics: Watch step-by-step agent execution graphs.
  • Custom Reporting: Create custom analytics on agent performance.
  • Public Model Testing: Test your agents against benchmarks and leaderboards.
  • Custom Tests: Run your agents against domain-specific tests.
  • Compliance and Security: Create audit logs and detect potential threats, such as profanity and leaks of Personally Identifiable Information.
  • Prompt Injection Detection: Identify potential code injection and secret leaks.

Conclusion

By integrating AgentOps into AutoGen, we've given our users everything they need to make production-grade agents, improve them, and track their performance to ensure they're doing exactly what you need them to do. Without it, you're operating blindly, unable to tell where your agents are succeeding or failing. AgentOps provides the required observability tools needed to monitor, debug, and optimize your agents for enterprise-level performance. It offers everything developers need to scale their AI solutions, from cost tracking to recursive thought detection.

Did you find this note helpful? Would you like to share your thoughts, use cases, and findings? Please join our observability channel in the AutoGen Discord.

· 6 min read
Joshua Kim
Yishen Sun

FSM Group Chat

Finite State Machine (FSM) Group Chat allows the user to constrain agent transitions.

TL;DR

Recently, FSM Group Chat is released that allows the user to input a transition graph to constrain agent transitions. This is useful as the number of agents increases because the number of transition pairs (N choose 2 combinations) increases exponentially increasing the risk of sub-optimal transitions, which leads to wastage of tokens and/or poor outcomes.

Possible use-cases for transition graph

  1. One-pass workflow, i.e., we want each agent to only have one pass at the problem, Agent A -> B -> C.
  2. Decision tree flow, like a decision tree, we start with a root node (agent), and flow down the decision tree with agents being nodes. For example, if the query is a SQL query, hand over to the SQL agent, else if the query is a RAG query, hand over to the RAG agent.
  3. Sequential Team Ops. Suppose we have a team of 3 developer agents, each responsible for a different GitHub repo. We also have a team of business analyst that discuss and debate the overall goal of the user. We could have the manager agent of the developer team speak to the manager agent of the business analysis team. That way, the discussions are more focused team-wise, and better outcomes can be expected.

Note that we are not enforcing a directed acyclic graph; the user can specify the graph to be acyclic, but cyclic workflows can also be useful to iteratively work on a problem, and layering additional analysis onto the solution.

Usage Guide

We have added two parameters allowed_or_disallowed_speaker_transitions and speaker_transitions_type.

  • allowed_or_disallowed_speaker_transitions: is a dictionary with the type expectation of {Agent: [Agent]}. The key refers to the source agent, while the value(s) in the list refers to the target agent(s). If none, a fully connection graph is assumed.
  • speaker_transitions_type: is a string with the type expectation of string, and specifically, one of ["allowed", "disallowed"]. We wanted the user to be able to supply a dictionary of allowed or disallowed transitions to improve the ease of use. In the code base, we would invert the disallowed transition into a allowed transition dictionary allowed_speaker_transitions_dict.

Application of the FSM Feature

A quick demonstration of how to initiate a FSM-based GroupChat in the AutoGen framework. In this demonstration, if we consider each agent as a state, and each agent speaks according to certain conditions. For example, User always initiates the task first, followed by Planner creating a plan. Then Engineer and Executor work alternately, with Critic intervening when necessary, and after Critic, only Planner should revise additional plans. Each state can only exist at a time, and there are transition conditions between states. Therefore, GroupChat can be well abstracted as a Finite-State Machine (FSM).

visualization

Usage

  1. Pre-requisites
pip install autogen[graph]
  1. Import dependencies

    from autogen.agentchat import GroupChat, AssistantAgent, UserProxyAgent, GroupChatManager
    from autogen.oai.openai_utils import config_list_from_dotenv
  2. Configure LLM parameters

    # Please feel free to change it as you wish
    config_list = config_list_from_dotenv(
    dotenv_file_path='.env',
    model_api_key_map={'gpt-4-1106-preview':'OPENAI_API_KEY'},
    filter_dict={
    "model": {
    "gpt-4-1106-preview"
    }
    }
    )

    gpt_config = {
    "cache_seed": None,
    "temperature": 0,
    "config_list": config_list,
    "timeout": 100,
    }
  3. Define the task

    # describe the task
    task = """Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE"."""
  4. Define agents

    # agents configuration
    engineer = AssistantAgent(
    name="Engineer",
    llm_config=gpt_config,
    system_message=task,
    description="""I am **ONLY** allowed to speak **immediately** after `Planner`, `Critic` and `Executor`.
    If the last number mentioned by `Critic` is not a multiple of 5, the next speaker must be `Engineer`.
    """
    )

    planner = AssistantAgent(
    name="Planner",
    system_message=task,
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `User` or `Critic`.
    If the last number mentioned by `Critic` is a multiple of 5, the next speaker must be `Planner`.
    """
    )

    executor = AssistantAgent(
    name="Executor",
    system_message=task,
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("FINISH"),
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `Engineer`.
    If the last number mentioned by `Engineer` is a multiple of 3, the next speaker can only be `Executor`.
    """
    )

    critic = AssistantAgent(
    name="Critic",
    system_message=task,
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `Engineer`.
    If the last number mentioned by `Engineer` is not a multiple of 3, the next speaker can only be `Critic`.
    """
    )

    user_proxy = UserProxyAgent(
    name="User",
    system_message=task,
    code_execution_config=False,
    human_input_mode="NEVER",
    llm_config=False,
    description="""
    Never select me as a speaker.
    """
    )
    1. Here, I have configured the system_messages as "task" because every agent should know what it needs to do. In this example, each agent has the same task, which is to count in sequence.
    2. The most important point is the description parameter, where I have used natural language to describe the transition conditions of the FSM. Because the manager knows which agents are available next based on the constraints of the graph, I describe in the description field of each candidate agent when it can speak, effectively describing the transition conditions in the FSM.
  5. Define the graph

    graph_dict = {}
    graph_dict[user_proxy] = [planner]
    graph_dict[planner] = [engineer]
    graph_dict[engineer] = [critic, executor]
    graph_dict[critic] = [engineer, planner]
    graph_dict[executor] = [engineer]
    1. The graph here and the transition conditions mentioned above together form a complete FSM. Both are essential and cannot be missing.
    2. You can visualize it as you wish, which is shown as follows

    visualization

  6. Define a GroupChat and a GroupChatManager

    agents = [user_proxy, engineer, planner, executor, critic]

    # create the groupchat
    group_chat = GroupChat(agents=agents, messages=[], max_round=25, allowed_or_disallowed_speaker_transitions=graph_dict, allow_repeat_speaker=None, speaker_transitions_type="allowed")

    # create the manager
    manager = GroupChatManager(
    groupchat=group_chat,
    llm_config=gpt_config,
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config=False,
    )
  7. Initiate the chat

    # initiate the task
    user_proxy.initiate_chat(
    manager,
    message="1",
    clear_history=True
    )
  8. You may get the following output(I deleted the ignorable warning):

    User (to chat_manager):

    1

    --------------------------------------------------------------------------------
    Planner (to chat_manager):

    2

    --------------------------------------------------------------------------------
    Engineer (to chat_manager):

    3

    --------------------------------------------------------------------------------
    Executor (to chat_manager):

    4

    --------------------------------------------------------------------------------
    Engineer (to chat_manager):

    5

    --------------------------------------------------------------------------------
    Critic (to chat_manager):

    6

    --------------------------------------------------------------------------------
    Engineer (to chat_manager):

    7

    --------------------------------------------------------------------------------
    Critic (to chat_manager):

    8

    --------------------------------------------------------------------------------
    Engineer (to chat_manager):

    9

    --------------------------------------------------------------------------------
    Executor (to chat_manager):

    10

    --------------------------------------------------------------------------------
    Engineer (to chat_manager):

    11

    --------------------------------------------------------------------------------
    Critic (to chat_manager):

    12

    --------------------------------------------------------------------------------
    Engineer (to chat_manager):

    13

    --------------------------------------------------------------------------------
    Critic (to chat_manager):

    14

    --------------------------------------------------------------------------------
    Engineer (to chat_manager):

    15

    --------------------------------------------------------------------------------
    Executor (to chat_manager):

    16

    --------------------------------------------------------------------------------
    Engineer (to chat_manager):

    17

    --------------------------------------------------------------------------------
    Critic (to chat_manager):

    18

    --------------------------------------------------------------------------------
    Engineer (to chat_manager):

    19

    --------------------------------------------------------------------------------
    Critic (to chat_manager):

    20

    --------------------------------------------------------------------------------
    Planner (to chat_manager):

    TERMINATE

Notebook examples

More examples can be found in the notebook. The notebook includes more examples of possible transition paths such as (1) hub and spoke, (2) sequential team operations, and (3) think aloud and debate. It also uses the function visualize_speaker_transitions_dict from autogen.graph_utils to visualize the various graphs.

· 2 min read
Gagan Bansal
AutoAnny Logo

Anny is a Discord bot powered by AutoGen to help AutoGen's Discord server.

TL;DR

We are adding a new sample app called Anny-- a simple Discord bot powered by AutoGen that's intended to assist AutoGen Devs. See samples/apps/auto-anny for details.

Introduction

Over the past few months, AutoGen has experienced large growth in number of users and number of community requests and feedback. However, accommodating this demand and feedback requires manually sifting through issues, PRs, and discussions on GitHub, as well as managing messages from AutoGen's 14000+ community members on Discord. There are many tasks that AutoGen's developer community has to perform everyday, but here are some common ones:

  • Answering questions
  • Recognizing and prioritizing bugs and features
  • Maintaining responsiveness for our incredible community
  • Tracking growth

This requires a significant amount of effort. Agentic-workflows and interfaces promise adding immense value-added automation for many tasks, so we thought why don't we use AutoGen to make our lives easier?! So we're turning to automation to help us and allow us to focus on what's most critical.

Current Version of Anny

The current version of Anny is pretty simple -- it uses the Discord API and AutoGen to enable a bot that can respond to a set of commands.

For example, it supports commands like /heyanny help for command listing, /heyanny ghstatus for GitHub activity summary, /heyanny ghgrowth for GitHub repo growth indicators, and /heyanny ghunattended for listing unattended issues and PRs. Most of these commands use multiple AutoGen agents to accomplish these task.

To use Anny, please follow instructions in samples/apps/auto-anny.

It's Not Just for AutoGen

If you're an open-source developer managing your own project, you can probably relate to our challenges. We invite you to check out Anny and contribute to its development and roadmap.

· 6 min read
Olga Vrousgou

TL;DR

AutoGen now supports custom models! This feature empowers users to define and load their own models, allowing for a more flexible and personalized inference mechanism. By adhering to a specific protocol, you can integrate your custom model for use with AutoGen and respond to prompts any way needed by using any model/API call/hardcoded response you want.

NOTE: Depending on what model you use, you may need to play with the default prompts of the Agent's

Quickstart

An interactive and easy way to get started is by following the notebook here which loads a local model from HuggingFace into AutoGen and uses it for inference, and making changes to the class provided.

Step 1: Create the custom model client class

To get started with using custom models in AutoGen, you need to create a model client class that adheres to the ModelClient protocol defined in client.py. The new model client class should implement these methods:

  • create(): Returns a response object that implements the ModelClientResponseProtocol (more details in the Protocol section).
  • message_retrieval(): Processes the response object and returns a list of strings or a list of message objects (more details in the Protocol section).
  • cost(): Returns the cost of the response.
  • get_usage(): Returns a dictionary with keys from RESPONSE_USAGE_KEYS = ["prompt_tokens", "completion_tokens", "total_tokens", "cost", "model"].

E.g. of a bare bones dummy custom class:

class CustomModelClient:
def __init__(self, config, **kwargs):
print(f"CustomModelClient config: {config}")

def create(self, params):
num_of_responses = params.get("n", 1)

# can create my own data response class
# here using SimpleNamespace for simplicity
# as long as it adheres to the ModelClientResponseProtocol

response = SimpleNamespace()
response.choices = []
response.model = "model_name" # should match the OAI_CONFIG_LIST registration

for _ in range(num_of_responses):
text = "this is a dummy text response"
choice = SimpleNamespace()
choice.message = SimpleNamespace()
choice.message.content = text
choice.message.function_call = None
response.choices.append(choice)
return response

def message_retrieval(self, response):
choices = response.choices
return [choice.message.content for choice in choices]

def cost(self, response) -> float:
response.cost = 0
return 0

@staticmethod
def get_usage(response):
return {}

Step 2: Add the configuration to the OAI_CONFIG_LIST

The field that is necessary is setting model_client_cls to the name of the new class (as a string) "model_client_cls":"CustomModelClient". Any other fields will be forwarded to the class constructor, so you have full control over what parameters to specify and how to use them. E.g.:

{
"model": "Open-Orca/Mistral-7B-OpenOrca",
"model_client_cls": "CustomModelClient",
"device": "cuda",
"n": 1,
"params": {
"max_length": 1000,
}
}

Step 3: Register the new custom model to the agent that will use it

If a configuration with the field "model_client_cls":"<class name>" has been added to an Agent's config list, then the corresponding model with the desired class must be registered after the agent is created and before the conversation is initialized:

my_agent.register_model_client(model_client_cls=CustomModelClient, [other args that will be forwarded to CustomModelClient constructor])

model_client_cls=CustomModelClient arg matches the one specified in the OAI_CONFIG_LIST and CustomModelClient is the class that adheres to the ModelClient protocol (more details on the protocol below).

If the new model client is in the config list but not registered by the time the chat is initialized, then an error will be raised.

Protocol details

A custom model class can be created in many ways, but needs to adhere to the ModelClient protocol and response structure which is defined in client.py and shown below.

The response protocol is currently using the minimum required fields from the autogen codebase that match the OpenAI response structure. Any response protocol that matches the OpenAI response structure will probably be more resilient to future changes, but we are starting off with minimum requirements to make adpotion of this feature easier.


class ModelClient(Protocol):
"""
A client class must implement the following methods:
- create must return a response object that implements the ModelClientResponseProtocol
- cost must return the cost of the response
- get_usage must return a dict with the following keys:
- prompt_tokens
- completion_tokens
- total_tokens
- cost
- model

This class is used to create a client that can be used by OpenAIWrapper.
The response returned from create must adhere to the ModelClientResponseProtocol but can be extended however needed.
The message_retrieval method must be implemented to return a list of str or a list of messages from the response.
"""

RESPONSE_USAGE_KEYS = ["prompt_tokens", "completion_tokens", "total_tokens", "cost", "model"]

class ModelClientResponseProtocol(Protocol):
class Choice(Protocol):
class Message(Protocol):
content: Optional[str]

message: Message

choices: List[Choice]
model: str

def create(self, params) -> ModelClientResponseProtocol:
...

def message_retrieval(
self, response: ModelClientResponseProtocol
) -> Union[List[str], List[ModelClient.ModelClientResponseProtocol.Choice.Message]]:
"""
Retrieve and return a list of strings or a list of Choice.Message from the response.

NOTE: if a list of Choice.Message is returned, it currently needs to contain the fields of OpenAI's ChatCompletion Message object,
since that is expected for function or tool calling in the rest of the codebase at the moment, unless a custom agent is being used.
"""
...

def cost(self, response: ModelClientResponseProtocol) -> float:
...

@staticmethod
def get_usage(response: ModelClientResponseProtocol) -> Dict:
"""Return usage summary of the response using RESPONSE_USAGE_KEYS."""
...

Troubleshooting steps

If something doesn't work then run through the checklist:

  • Make sure you have followed the client protocol and client response protocol when creating the custom model class
    • create() method: ModelClientResponseProtocol must be followed when returning an inference response during create call.
    • message_retrieval() method: returns a list of strings or a list of message objects. If a list of message objects is returned, they currently must contain the fields of OpenAI's ChatCompletion Message object, since that is expected for function or tool calling in the rest of the codebase at the moment, unless a custom agent is being used.
    • cost()method: returns an integer, and if you don't care about cost tracking you can just return 0.
    • get_usage(): returns a dictionary, and if you don't care about usage tracking you can just return an empty dictionary {}.
  • Make sure you have a corresponding entry in the OAI_CONFIG_LIST and that that entry has the "model_client_cls":"<custom-model-class-name>" field.
  • Make sure you have registered the client using the corresponding config entry and your new class agent.register_model_client(model_client_cls=<class-of-custom-model>, [other optional args])
  • Make sure that all of the custom models defined in the OAI_CONFIG_LIST have been registered.
  • Any other troubleshooting might need to be done in the custom code itself.

Conclusion

With the ability to use custom models, AutoGen now offers even more flexibility and power for your AI applications. Whether you've trained your own model or want to use a specific pre-trained model, AutoGen can accommodate your needs. Happy coding!

· 7 min read
Adam Fourney
Qingyun Wu

AutoGenBench

AutoGenBench is a standalone tool for evaluating AutoGen agents and workflows on common benchmarks.

TL;DR

Today we are releasing AutoGenBench - a tool for evaluating AutoGen agents and workflows on established LLM and agentic benchmarks.

AutoGenBench is a standalone command line tool, installable from PyPI, which handles downloading, configuring, running, and reporting supported benchmarks. AutoGenBench works best when run alongside Docker, since it uses Docker to isolate tests from one another.

Quick Start

Get started quickly by running the following commands in a bash terminal.

Note: You may need to adjust the path to the OAI_CONFIG_LIST, as appropriate.

export OAI_CONFIG_LIST=$(cat ./OAI_CONFIG_LIST)
pip install autogenbench
autogenbench clone HumanEval
cd HumanEval
cat README.md
autogenbench run --subsample 0.1 --repeat 3 Tasks/human_eval_two_agents.jsonl
autogenbench tabulate Results/human_eval_two_agents

Introduction

Measurement and evaluation are core components of every major AI or ML research project. The same is true for AutoGen. To this end, today we are releasing AutoGenBench, a standalone command line tool that we have been using to guide development of AutoGen. Conveniently, AutoGenBench handles: downloading, configuring, running, and reporting results of agents on various public benchmark datasets. In addition to reporting top-line numbers, each AutoGenBench run produces a comprehensive set of logs and telemetry that can be used for debugging, profiling, computing custom metrics, and as input to AgentEval. In the remainder of this blog post, we outline core design principles for AutoGenBench (key to understanding its operation); present a guide to installing and running AutoGenBench; outline a roadmap for evaluation; and conclude with an open call for contributions.

Design Principles

AutoGenBench is designed around three core design principles. Knowing these principles will help you understand the tool, its operation and its output. These three principles are:

  • Repetition: LLMs are stochastic, and in many cases, so too is the code they write to solve problems. For example, a Python script might call an external search engine, and the results may vary run-to-run. This can lead to variance in agent performance. Repetition is key to measuring and understanding this variance. To this end, AutoGenBench is built from the ground up with an understanding that tasks may be run multiple times, and that variance is a metric we often want to measure.

  • Isolation: Agents interact with their worlds in both subtle and overt ways. For example an agent may install a python library or write a file to disk. This can lead to ordering effects that can impact future measurements. Consider, for example, comparing two agents on a common benchmark. One agent may appear more efficient than the other simply because it ran second, and benefitted from the hard work the first agent did in installing and debugging necessary Python libraries. To address this, AutoGenBench isolates each task in its own Docker container. This ensures that all runs start with the same initial conditions. (Docker is also a much safer way to run agent-produced code, in general.)

  • Instrumentation: While top-line metrics are great for comparing agents or models, we often want much more information about how the agents are performing, where they are getting stuck, and how they can be improved. We may also later think of new research questions that require computing a different set of metrics. To this end, AutoGenBench is designed to log everything, and to compute metrics from those logs. This ensures that one can always go back to the logs to answer questions about what happened, run profiling software, or feed the logs into tools like AgentEval.

Installing and Running AutoGenBench

As noted above, isolation is a key design principle, and so AutoGenBench must be run in an environment where Docker is available (desktop or Engine). It will not run in GitHub codespaces, unless you opt for native execution (which is strongly discouraged). To install Docker Desktop see https://www.docker.com/products/docker-desktop/. Once Docker is installed, AutoGenBench can then be installed as a standalone tool from PyPI. With pip, installation can be achieved as follows:

pip install autogenbench

After installation, you must configure your API keys. As with other AutoGen applications, AutoGenBench will look for the OpenAI keys in the OAI_CONFIG_LIST file in the current working directory, or the OAI_CONFIG_LIST environment variable. This behavior can be overridden using a command-line parameter.

If you will be running multiple benchmarks, it is often most convenient to leverage the environment variable option. You can load your keys into the environment variable by executing:

export OAI_CONFIG_LIST=$(cat ./OAI_CONFIG_LIST)

A Typical Session

Once AutoGenBench and necessary keys are installed, a typical session will look as follows:

autogenbench clone HumanEval
cd HumanEval
cat README.md
autogenbench run --subsample 0.1 --repeat 3 Tasks/human_eval_two_agents.jsonl
autogenbench tabulate results/human_eval_two_agents

Where:

  • autogenbench clone HumanEval downloads and expands the HumanEval benchmark scenario.
  • cd HumanEval; cat README.md navigates to the benchmark directory, and prints the README (which you should always read!)
  • autogenbench run --subsample 0.1 --repeat 3 Tasks/human_eval_two_agents.jsonl runs a 10% subsample of the tasks defined in Tasks/human_eval_two_agents.jsonl. Each task is run 3 times.
  • autogenbench tabulate results/human_eval_two_agents tabulates the results of the run.

After running the above tabulate command, you should see output similar to the following:

                 Trial 0    Trial 1    Trial 2
Task Id Success Success Success
------------- --------- --------- ---------
HumanEval_107 False True True
HumanEval_22 True True True
HumanEval_43 True True True
HumanEval_88 True True True
HumanEval_14 True True True
HumanEval_157 True True True
HumanEval_141 True True True
HumanEval_57 True True True
HumanEval_154 True True True
HumanEval_153 True True True
HumanEval_93 False True False
HumanEval_137 True True True
HumanEval_143 True True True
HumanEval_13 True True True
HumanEval_49 True True True
HumanEval_95 True True True
------------- --------- --------- ---------
Successes 14 16 15
Failures 2 0 1
Missing 0 0 0
Total 16 16 16

CAUTION: 'autogenbench tabulate' is in early preview.
Please do not cite these values in academic work without first inspecting and verifying the results in the logs yourself.

From this output we can see the results of the three separate repetitions of each task, and final summary statistics of each run. In this case, the results were generated via GPT-4 (as defined in the OAI_CONFIG_LIST that was provided), and used the TwoAgents template. It is important to remember that AutoGenBench evaluates specific end-to-end configurations of agents (as opposed to evaluating a model or cognitive framework more generally).

Finally, complete execution traces and logs can be found in the Results folder. See the AutoGenBench README for more details about command-line options and output formats. Each of these commands also offers extensive in-line help via:

  • autogenbench --help
  • autogenbench clone --help
  • autogenbench run --help
  • autogenbench tabulate --help

Roadmap

While we are announcing AutoGenBench, we note that it is very much an evolving project in its own right. Over the next few weeks and months we hope to:

  • Onboard many additional benchmarks beyond those shipping today
  • Greatly improve logging and telemetry
  • Introduce new core metrics including total costs, task completion time, conversation turns, etc.
  • Provide tighter integration with AgentEval and AutoGen Studio

For an up to date tracking of our work items on this project, please see AutoGenBench Work Items

Call for Participation

Finally, we want to end this blog post with an open call for contributions. AutoGenBench is still nascent, and has much opportunity for improvement. New benchmarks are constantly being published, and will need to be added. Everyone may have their own distinct set of metrics that they care most about optimizing, and these metrics should be onboarded. To this end, we welcome any and all contributions to this corner of the AutoGen project. If contributing is something that interests you, please see the contributor’s guide and join our Discord discussion in the #autogenbench channel!

· 3 min read
Olga Vrousgou

TL;DR

AutoGen 0.2.8 enhances operational safety by making 'code execution inside a Docker container' the default setting, focusing on informing users about its operations and empowering them to make informed decisions regarding code execution.

The new release introduces a breaking change where the use_docker argument is set to True by default in code executing agents. This change underscores our commitment to prioritizing security and safety in AutoGen.

Introduction

AutoGen has code-executing agents, usually defined as a UserProxyAgent, where code execution is by default ON. Until now, unless explicitly specified by the user, any code generated by other agents would be executed by code-execution agents locally, i.e. wherever AutoGen was being executed. If AutoGen happened to be run in a docker container then the risks of running code were minimized. However, if AutoGen runs outside of Docker, it's easy particularly for new users to overlook code-execution risks.

AutoGen has now changed to by default execute any code inside a docker container (unless execution is already happening inside a docker container). It will launch a Docker image (either user-provided or default), execute the new code, and then terminate the image, preparing for the next code execution cycle.

We understand that not everyone is concerned about this especially when playing around with AutoGen for the first time. We have provided easy ways to turn this requirement off. But we believe that making sure that the user is aware of the fact that code will be executed locally, and prompting them to think about the security implications of running code locally is the right step for AutoGen.

Example

The example shows the default behaviour which is that any code generated by assistant agent and executed by user_proxy agent, will attempt to use a docker container to execute the code. If docker is not running, it will throw an error. User can decide to activate docker or opt in for local code execution.

from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding"})
user_proxy.initiate_chat(assistant, message="Plot a chart of NVDA and TESLA stock price change YTD.")

To opt out of from this default behaviour there are some options.

Disable code execution entirely

  • Set code_execution_config to False for each code-execution agent. E.g.:
user_proxy = autogen.UserProxyAgent(name="user_proxy", llm_config=llm_config, code_execution_config=False)

Run code execution locally

  • use_docker can be set to False in code_execution_config for each code-execution agent.
  • To set it for all code-execution agents at once: set AUTOGEN_USE_DOCKER to False as an environment variable.

E.g.:

user_proxy = autogen.UserProxyAgent(name="user_proxy", llm_config=llm_config,
code_execution_config={"work_dir":"coding", "use_docker":False})

Conclusion

AutoGen 0.2.8 now improves the code execution safety and is ensuring that the user is properly informed of what autogen is doing and can make decisions around code-execution.

· 9 min read
Adam Fourney

TL;DR

AutoGen 0.2.2 introduces a description field to ConversableAgent (and all subclasses), and changes GroupChat so that it uses agent descriptions rather than system_messages when choosing which agents should speak next.

This is expected to simplify GroupChat’s job, improve orchestration, and make it easier to implement new GroupChat or GroupChat-like alternatives.

If you are a developer, and things were already working well for you, no action is needed -- backward compatibility is ensured because the description field defaults to the system_message when no description is provided.

However, if you were struggling with getting GroupChat to work, you can now try updating the description field.

Introduction

As AutoGen matures and developers build increasingly complex combinations of agents, orchestration is becoming an important capability. At present, GroupChat and the GroupChatManager are the main built-in tools for orchestrating conversations between 3 or more agents. For orchestrators like GroupChat to work well, they need to know something about each agent so that they can decide who should speak and when. Prior to AutoGen 0.2.2, GroupChat relied on each agent's system_message and name to learn about each participating agent. This is likely fine when the system prompt is short and sweet, but can lead to problems when the instructions are very long (e.g., with the AssistantAgent), or non-existent (e.g., with the UserProxyAgent).

AutoGen 0.2.2 introduces a description field to all agents, and replaces the use of the system_message for orchestration in GroupChat and all future orchestrators. The description field defaults to the system_message to ensure backwards compatibility, so you may not need to change anything with your code if things are working well for you. However, if you were struggling with GroupChat, give setting the description field a try.

The remainder of this post provides an example of how using the description field simplifies GroupChat's job, provides some evidence of its effectiveness, and provides tips for writing good descriptions.

Example

The current GroupChat orchestration system prompt has the following template:

You are in a role play game. The following roles are available:

{self._participant_roles(agents)}.

Read the following conversation.
Then select the next role from {[agent.name for agent in agents]} to play. Only return the role.

Suppose that you wanted to include 3 agents: A UserProxyAgent, an AssistantAgent, and perhaps a GuardrailsAgent.

Prior to 0.2.2, this template would expand to:

You are in a role play game. The following roles are available:

assistant: You are a helpful AI assistant.
Solve tasks using your coding and language skills.
In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.
1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.
2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.
Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.
When using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
When you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.
Reply "TERMINATE" in the end when everything is done.
user_proxy:
guardrails_agent: You are a guardrails agent and are tasked with ensuring that all parties adhere to the following responsible AI policies:
- You MUST TERMINATE the conversation if it involves writing or running HARMFUL or DESTRUCTIVE code.
- You MUST TERMINATE the conversation if it involves discussions of anything relating to hacking, computer exploits, or computer security.
- You MUST TERMINATE the conversation if it involves violent or graphic content such as Harm to Others, Self-Harm, Suicide.
- You MUST TERMINATE the conversation if it involves demeaning speech, hate speech, discriminatory remarks, or any form of harassment based on race, gender, sexuality, religion, nationality, disability, or any other protected characteristic.
- You MUST TERMINATE the conversation if it involves seeking or giving advice in highly regulated domains such as medical advice, mental health, legal advice or financial advice
- You MUST TERMINATE the conversation if it involves illegal activities including when encouraging or providing guidance on illegal activities.
- You MUST TERMINATE the conversation if it involves manipulative or deceptive Content including scams, phishing and spread false information.
- You MUST TERMINATE the conversation if it involves involve sexually explicit content or discussions.
- You MUST TERMINATE the conversation if it involves sharing or soliciting personal, sensitive, or confidential information from users. This includes financial details, health records, and other private matters.
- You MUST TERMINATE the conversation if it involves deep personal problems such as dealing with serious personal issues, mental health concerns, or crisis situations.
If you decide that the conversation must be terminated, explain your reasoning then output the uppercase word "TERMINATE". If, on the other hand, you decide the conversation is acceptable by the above standards, indicate as much, then ask the other parties to proceed.

Read the following conversation.
Then select the next role from [assistant, user_proxy, guardrails_agent] to play. Only return the role.

As you can see, this description is super confusing:

  • It is hard to make out where each agent's role-description ends
  • You appears numerous times, and refers to three separate agents (GroupChatManager, AssistantAgent, and GuardrailsAgent)
  • It takes a lot of tokens!

Consequently, it's not hard to see why the GroupChat manager sometimes struggles with this orchestration task.

With AutoGen 0.2.2 onward, GroupChat instead relies on the description field. With a description field the orchestration prompt becomes:

You are in a role play game. The following roles are available:

assistant: A helpful and general-purpose AI assistant that has strong language skills, Python skills, and Linux command line skills.
user_proxy: A user that can run Python code or input command line commands at a Linux terminal and report back the execution results.
guradrails_agent: An agent that ensures the conversation conforms to responsible AI guidelines.

Read the following conversation.
Then select the next role from [assistant, user_proxy, guardrails_agent] to play. Only return the role.

This is much easier to parse and understand, and it doesn't use nearly as many tokens. Moreover, the following experiment provides early evidence that it works.

An Experiment with Distraction

To illustrate the impact of the description field, we set up a three-agent experiment with a reduced 26-problem subset of the HumanEval benchmark. Here, three agents were added to a GroupChat to solve programming problems. The three agents were:

  • Coder (default Assistant prompt)
  • UserProxy (configured to execute code)
  • ExecutiveChef (added as a distraction)

The Coder and UserProxy used the AssistantAgent and UserProxy defaults (provided above), while the ExecutiveChef was given the system prompt:

You are an executive chef with 28 years of industry experience. You can answer questions about menu planning, meal preparation, and cooking techniques.

The ExecutiveChef is clearly the distractor here -- given that no HumanEval problems are food-related, the GroupChat should rarely consult with the chef. However, when configured with GPT-3.5-turbo-16k, we can clearly see the GroupChat struggling with orchestration:

With versions prior to 0.2.2, using system_message:

  • The Agents solve 3 out of 26 problems on their first turn
  • The ExecutiveChef is called upon 54 times! (almost as much as the Coder at 68 times)

With version 0.2.2, using description:

  • The Agents solve 7 out of 26 problems on the first turn
  • The ExecutiveChef is called upon 27 times! (versus 84 times for the Coder)

Using the description field doubles performance on this task and halves the incidence of calling upon the distractor agent.

Tips for Writing Good Descriptions

Since descriptions serve a different purpose than system_messages, it is worth reviewing what makes a good agent description. While descriptions are new, the following tips appear to lead to good results:

  • Avoid using the 1st or 2nd person perspective. Descriptions should not contain "I" or "You", unless perhaps "You" is in reference to the GroupChat / orchestrator
  • Include any details that might help the orchestrator know when to call upon the agent
  • Keep descriptions short (e.g., "A helpful AI assistant with strong natural language and Python coding skills.").

The main thing to remember is that the description is for the benefit of the GroupChatManager, not for the Agent's own use or instruction.

Conclusion

AutoGen 0.2.2 introduces a description, becoming the main way agents describe themselves to orchestrators like GroupChat. Since the description defaults to the system_message, there's nothing you need to change if you were already satisfied with how your group chats were working. However, we expect this feature to generally improve orchestration, so please consider experimenting with the description field if you are struggling with GroupChat or want to boost performance.

· 10 min read
Victor Dibia
Gagan Bansal
Saleema Amershi

AutoGen Studio Playground View: Solving a task with multiple agents that generate a pdf document with images.

AutoGen Studio: Solving a task with multiple agents that generate a pdf document with images.

TL;DR

To help you rapidly prototype multi-agent solutions for your tasks, we are introducing AutoGen Studio, an interface powered by AutoGen. It allows you to:

  • Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to solve your task).
  • Use our UI to create chat sessions with the specified agents and view results (e.g., view chat history, generated files, and time taken).
  • Explicitly add skills to your agents and accomplish more tasks.
  • Publish your sessions to a local gallery.

See the official AutoGen Studio documentation here for more details.

AutoGen Studio is open source code here, and can be installed via pip. Give it a try!

pip install autogenstudio

Introduction

The accelerating pace of technology has ushered us into an era where digital assistants (or agents) are becoming integral to our lives. AutoGen has emerged as a leading framework for orchestrating the power of agents. In the spirit of expanding this frontier and democratizing this capability, we are thrilled to introduce a new user-friendly interface: AutoGen Studio.

With AutoGen Studio, users can rapidly create, manage, and interact with agents that can learn, adapt, and collaborate. As we release this interface into the open-source community, our ambition is not only to enhance productivity but to inspire a level of personalized interaction between humans and agents.

Note: AutoGen Studio is meant to help you rapidly prototype multi-agent workflows and demonstrate an example of end user interfaces built with AutoGen. It is not meant to be a production-ready app.

Getting Started with AutoGen Studio

The following guide will help you get AutoGen Studio up and running on your system.

Configuring an LLM Provider

To get started, you need access to a language model. You can get this set up by following the steps in the AutoGen documentation here. Configure your environment with either OPENAI_API_KEY or AZURE_OPENAI_API_KEY.

For example, in your terminal, you would set the API key like this:

export OPENAI_API_KEY=<your_api_key>

You can also specify the model directly in the agent's configuration as shown below.

llm_config = LLMConfig(
config_list=[{
"model": "gpt-4",
"api_key": "<azure_api_key>",
"base_url": "<azure api base url>",
"api_type": "azure",
"api_version": "2024-02-01"
}],
temperature=0,
)

Installation

There are two ways to install AutoGen Studio - from PyPi or from source. We recommend installing from PyPi unless you plan to modify the source code.

  1. Install from PyPi

    We recommend using a virtual environment (e.g., conda) to avoid conflicts with existing Python packages. With Python 3.10 or newer active in your virtual environment, use pip to install AutoGen Studio:

    pip install autogenstudio
  2. Install from Source

    Note: This approach requires some familiarity with building interfaces in React.

    If you prefer to install from source, ensure you have Python 3.10+ and Node.js (version above 14.15.0) installed. Here's how you get started:

    • Clone the AutoGen Studio repository and install its Python dependencies:

      pip install -e .
    • Navigate to the samples/apps/autogen-studio/frontend directory, install dependencies, and build the UI:

      npm install -g gatsby-cli
      npm install --global yarn
      yarn install
      yarn build

    For Windows users, to build the frontend, you may need alternative commands provided in the autogen studio readme.

Running the Application

Once installed, run the web UI by entering the following in your terminal:

autogenstudio ui --port 8081

This will start the application on the specified port. Open your web browser and go to http://localhost:8081/ to begin using AutoGen Studio.

Now that you have AutoGen Studio installed and running, you are ready to explore its capabilities, including defining and modifying agent workflows, interacting with agents and sessions, and expanding agent skills.

What Can You Do with AutoGen Studio?

The AutoGen Studio UI is organized into 3 high level sections - Build, Playground, and Gallery.

Build

Specify Agents.

This section focuses on defining the properties of agents and agent workflows. It includes the following concepts:

Skills: Skills are functions (e.g., Python functions) that describe how to solve a task. In general, a good skill has a descriptive name (e.g. generate_images), extensive docstrings and good defaults (e.g., writing out files to disk for persistence and reuse). You can add new skills to AutoGen Studio via the provided UI. At inference time, these skills are made available to the assistant agent as they address your tasks.

View and add skills.

AutoGen Studio Build View: View, add or edit skills that an agent can leverage in addressing tasks.

Agents: This provides an interface to declaratively specify properties for an AutoGen agent (mirrors most of the members of a base AutoGen conversable agent class).

Agent Workflows: An agent workflow is a specification of a set of agents that can work together to accomplish a task. The simplest version of this is a setup with two agents – a user proxy agent (that represents a user i.e. it compiles code and prints result) and an assistant that can address task requests (e.g., generating plans, writing code, evaluating responses, proposing error recovery steps, etc.). A more complex flow could be a group chat where even more agents work towards a solution.

Playground

AutoGen Studio Playground View: Solving a task with multiple agents that generate a pdf document with images.

AutoGen Studio Playground View: Agents collaborate, use available skills (ability to generate images) to address a user task (generate pdf's).

The playground section is focused on interacting with agent workflows defined in the previous build section. It includes the following concepts:

Session: A session refers to a period of continuous interaction or engagement with an agent workflow, typically characterized by a sequence of activities or operations aimed at achieving specific objectives. It includes the agent workflow configuration, the interactions between the user and the agents. A session can be “published” to a “gallery”.

Chat View: A chat is a sequence of interactions between a user and an agent. It is a part of a session.

This section is focused on sharing and reusing artifacts (e.g., workflow configurations, sessions, etc.).

AutoGen Studio comes with 3 example skills: fetch_profile, find_papers, generate_images. Please feel free to review the repo to learn more about how they work.

The AutoGen Studio API

While AutoGen Studio is a web interface, it is powered by an underlying python API that is reusable and modular. Importantly, we have implemented an API where agent workflows can be declaratively specified (in JSON), loaded and run. An example of the current API is shown below. Please consult the AutoGen Studio repo for more details.

import json
from autogenstudio import AutoGenWorkFlowManager, AgentWorkFlowConfig

# load an agent specification in JSON
agent_spec = json.load(open('agent_spec.json'))

# Create an AutoGen Workflow Configuration from the agent specification
agent_work_flow_config = FlowConfig(**agent_spec)

# Create a Workflow from the configuration
agent_work_flow = AutoGenWorkFlowManager(agent_work_flow_config)

# Run the workflow on a task
task_query = "What is the height of the Eiffel Tower?"
agent_work_flow.run(message=task_query)

Road Map and Next Steps

As we continue to develop and refine AutoGen Studio, the road map below outlines an array of enhancements and new features planned for future releases. Here's what users can look forward to:

  • Complex Agent Workflows: We're working on integrating support for more sophisticated agent workflows, such as GroupChat, allowing for richer interaction between multiple agents or dynamic topologies.
  • Improved User Experience: This includes features like streaming intermediate model output for real-time feedback, better summarization of agent responses, information on costs of each interaction. We will also invest in improving the workflow for composing and reusing agents. We will also explore support for more interactive human in the loop feedback to agents.
  • Expansion of Agent Skills: We will work towards improving the workflow for authoring, composing and reusing agent skills.
  • Community Features: Facilitation of sharing and collaboration within AutoGen Studio user community is a key goal. We're exploring options for sharing sessions and results more easily among users and contributing to a shared repository of skills, agents, and agent workflows.

Contribution Guide

We welcome contributions to AutoGen Studio. We recommend the following general steps to contribute to the project:

  • Review the overall AutoGen project contribution guide.
  • Please review the AutoGen Studio roadmap to get a sense of the current priorities for the project. Help is appreciated especially with Studio issues tagged with help-wanted.
  • Please initiate a discussion on the roadmap issue or a new issue to discuss your proposed contribution.
  • Please review the autogenstudio dev branch here [dev branch].(https://github.com/microsoft/autogen/tree/autogenstudio) and use as a base for your contribution. This way, your contribution will be aligned with the latest changes in the AutoGen Studio project.
  • Submit a pull request with your contribution!
  • If you are modifying AutoGen Studio in vscode, it has its own devcontainer to simplify dev work. See instructions in .devcontainer/README.md on how to use it.
  • Please use the tag studio for any issues, questions, and PRs related to Studio.

FAQ

Q: Where can I adjust the default skills, agent and workflow configurations? A: You can modify agent configurations directly from the UI or by editing the autogentstudio/utils/dbdefaults.json file which is used to initialize the database.

Q: If I want to reset the entire conversation with an agent, how do I go about it? A: To reset your conversation history, you can delete the database.sqlite file. If you need to clear user-specific data, remove the relevant autogenstudio/web/files/user/<user_id_md5hash> folder.

Q: Is it possible to view the output and messages generated by the agents during interactions? A: Yes, you can view the generated messages in the debug console of the web UI, providing insights into the agent interactions. Alternatively, you can inspect the database.sqlite file for a comprehensive record of messages.

Q: Where can I find documentation and support for AutoGen Studio? A: We are constantly working to improve AutoGen Studio. For the latest updates, please refer to the AutoGen Studio Readme. For additional support, please open an issue on GitHub or ask questions on Discord.

Q: Can I use Other Models with AutoGen Studio? Yes. AutoGen standardizes on the openai model api format, and you can use any api server that offers an openai compliant endpoint. In the AutoGen Studio UI, each agent has an llm_config field where you can input your model endpoint details including model name, api key, base url, model type and api version. For Azure OpenAI models, you can find these details in the Azure portal. Note that for Azure OpenAI, the model name is the deployment id or engine, and the model type is "azure". For other OSS models, we recommend using a server such as vllm to instantiate an openai compliant endpoint.

Q: The Server Starts But I Can't Access the UI A: If you are running the server on a remote machine (or a local machine that fails to resolve localhost correstly), you may need to specify the host address. By default, the host address is set to localhost. You can specify the host address using the --host <host> argument. For example, to start the server on port 8081 and local address such that it is accessible from other machines on the network, you can run the following command:

autogenstudio ui --port 8081 --host 0.0.0.0