# AutoGen √ºgyn√∂k√∂k √©les k√∂rnyezetben: Megfigyel√©s √©s √ârt√©kel√©s

Ebben az √∫tmutat√≥ban megtanuljuk, hogyan **figyelj√ºk meg az [Autogen √ºgyn√∂k√∂k](https://github.com/microsoft/autogen) bels≈ë l√©p√©seit (nyomait)**, √©s hogyan **√©rt√©kelj√ºk a teljes√≠tm√©ny√ºket** a [Langfuse](https://langfuse.com) seg√≠ts√©g√©vel.

Ez az √∫tmutat√≥ lefedi azokat az **online** √©s **offline** √©rt√©kel√©si metrik√°kat, amelyeket a csapatok haszn√°lnak az √ºgyn√∂k√∂k gyors √©s megb√≠zhat√≥ √©les√≠t√©s√©hez.

**Mi√©rt fontos az AI √ºgyn√∂k√∂k √©rt√©kel√©se:**
- Hib√°k elh√°r√≠t√°sa, amikor a feladatok meghi√∫sulnak vagy nem optim√°lis eredm√©nyeket hoznak
- K√∂lts√©gek √©s teljes√≠tm√©ny val√≥s idej≈± figyel√©se
- Megb√≠zhat√≥s√°g √©s biztons√°g jav√≠t√°sa folyamatos visszacsatol√°s r√©v√©n


## 1. l√©p√©s: K√∂rnyezeti v√°ltoz√≥k be√°ll√≠t√°sa

Szerezd meg a Langfuse API kulcsokat, ha regisztr√°lsz a [Langfuse Cloud](https://cloud.langfuse.com/) szolg√°ltat√°sra, vagy ha [Langfuse-t saj√°t k√∂rnyezetben √ºzemeltetsz](https://langfuse.com/self-hosting). 

_**Megjegyz√©s:** Azok, akik saj√°t k√∂rnyezetben √ºzemeltetik, haszn√°lhatj√°k a [Terraform modulokat](https://langfuse.com/self-hosting/azure) a Langfuse Azure-on t√∂rt√©n≈ë telep√≠t√©s√©hez. Alternat√≠v megold√°sk√©nt telep√≠theted a Langfuse-t Kubernetesre a [Helm chart](https://langfuse.com/self-hosting/kubernetes-helm) seg√≠ts√©g√©vel._


In [5]:
import os

# Get keys for your project from the project settings page: https://cloud.langfuse.com
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..." 
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..." 
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" # üá™üá∫ EU region
# os.environ["LANGFUSE_HOST"] = "https://us.cloud.langfuse.com" # üá∫üá∏ US region

A k√∂rnyezeti v√°ltoz√≥k be√°ll√≠t√°sa ut√°n most inicializ√°lhatjuk a Langfuse klienst. A `get_client()` inicializ√°lja a Langfuse klienst a k√∂rnyezeti v√°ltoz√≥kban megadott hiteles√≠t≈ë adatok seg√≠ts√©g√©vel.


In [6]:
from langfuse import Langfuse
 
# Filter out Autogen OpenTelemetryspans
langfuse = Langfuse(
    blocked_instrumentation_scopes=["autogen SingleThreadedAgentRuntime"]
)
 
# Verify connection
if langfuse.auth_check():
    print("Langfuse client is authenticated and ready!")
else:
    print("Authentication failed. Please check your credentials and host.")

Langfuse client is authenticated and ready!


## 2. l√©p√©s: OpenLit instrument√°ci√≥ inicializ√°l√°sa

Most inicializ√°ljuk az [OpenLit](https://github.com/openlit/openlit) instrument√°ci√≥t. Az OpenLit automatikusan r√∂gz√≠ti az AutoGen m≈±veleteket, √©s export√°lja az OpenTelemetry (OTel) spanokat a Langfuse sz√°m√°ra.


In [7]:
import openlit
 
# Initialize OpenLIT instrumentation. The disable_batch flag is set to true to process traces immediately.
openlit.init(tracer=langfuse._otel_tracer, disable_batch=True, disabled_instrumentors=["mistral"])

## 3. l√©p√©s: Futtasd az √ºgyn√∂k√∂det

Most be√°ll√≠tunk egy t√∂bbl√©p√©ses √ºgyn√∂k√∂t, hogy tesztelj√ºk az instrument√°ci√≥t.


In [2]:
import os

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.azure import AzureAIChatCompletionClient
from azure.core.credentials import AzureKeyCredential
from autogen_agentchat.base import TaskResult

from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.teams import RoundRobinGroupChat

In [3]:
client = AzureAIChatCompletionClient(
    model="gpt-4o-mini",
    endpoint="https://models.inference.ai.azure.com",
    # To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings.
    # Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
    credential=AzureKeyCredential(os.environ["GITHUB_TOKEN"]),
    model_info={
        "json_output": True,
        "function_calling": True,
        "vision": True,
        "family": "unknown",
        "structured_output": False
    },
)

In [8]:
# üç¥ Agent 1 ‚Äì proposes ONE healthy meal idea each turn
meal_planner_agent = AssistantAgent(
    "meal_planner_agent",
    model_client=client,
    description="A seasoned meal-planning coach who suggests balanced meals.",
    system_message="""
    You are a Meal-Planning Assistant with a decade of experience helping busy people prepare meals.
    Goal: propose the single best meal (breakfast, lunch, or dinner) given the user's context.
    Each response must contain ONLY one complete meal idea (title + very brief component list) ‚Äî no extras.
    Keep it concise: skip greetings, chit-chat, and filler.
    """,
)

# ü•ó Agent 2 ‚Äì checks nutritional quality & variety
nutritionist_agent = AssistantAgent(
    "nutritionist_agent",
    model_client=client,
    description="A registered dietitian ensuring meals meet nutritional standards.",
    system_message="""
    You are a Nutritionist focused on whole-food, macro-balanced eating.
    Evaluate the meal_planner_agent‚Äôs recommendation.
    If the meal is nutritionally sound, sufficiently varied, and portion-appropriate, respond with 'APPROVE'.
    Otherwise, give high-level guidance on how to improve it (e.g. 'add a plant-based protein') ‚Äî do NOT provide a full alternative recipe.
    """,
)

In [9]:
# ‚úÖ Chat stops once the nutritionist says APPROVE
termination = TextMentionTermination("APPROVE")

# üîÑ Alternate turns between the two agents until termination
team = RoundRobinGroupChat(
    [meal_planner_agent, nutritionist_agent],
    termination_condition=termination,
)

# Example kickoff
user_input = "I'm looking for a quick, delicious dinner I can prep after work. I have 30 minutes and minimal clean-up is ideal."

In [None]:
with langfuse.start_as_current_span(name="create_meal_plan") as span:
    async for message in team.run_stream(task=user_input):
        if isinstance(message, TaskResult):
            print("Stop Reason:", message.stop_reason)
        else:
            print(message)

    span.update_trace(
        input=user_input,
        output=message.stop_reason,
    )

# Flush the trace to Langfuse for short-lived environments such as Jupyter Notebooks
langfuse.flush()

### Nyomk√∂vet√©si Strukt√∫ra

A Langfuse r√∂gz√≠ti a **nyomk√∂vet√©st**, amely **szakaszokat** tartalmaz, √©s ezek az √ºgyn√∂k logik√°j√°nak egyes l√©p√©seit k√©pviselik. Ebben az esetben a nyomk√∂vet√©s tartalmazza az √ºgyn√∂k teljes fut√°s√°t, valamint al-szakaszokat az al√°bbiakhoz:
- Az √©tkez√©si tervet k√©sz√≠t≈ë √ºgyn√∂k
- A t√°pl√°lkoz√°si szak√©rt≈ë √ºgyn√∂k√∂k

Ezeket megvizsg√°lva pontosan l√°that√≥, hogy hol t√∂lt√∂tt id≈ë, h√°ny token ker√ºlt felhaszn√°l√°sra, √©s √≠gy tov√°bb:

![Nyomk√∂vet√©si fa a Langfuse-ban](https://langfuse.com/images/cookbook/example-autogen-evaluation/trace-tree.png)

_[Link a nyomk√∂vet√©shez](https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/dac2b33e7cd709e685ccf86a137ecc64)_


## Online √©rt√©kel√©s

Az online √©rt√©kel√©s azt jelenti, hogy az √ºgyn√∂k√∂t egy √©l≈ë, val√≥s k√∂rnyezetben, azaz a t√©nyleges haszn√°lat sor√°n √©rt√©kelj√ºk. Ez mag√°ban foglalja az √ºgyn√∂k teljes√≠tm√©ny√©nek figyelemmel k√≠s√©r√©s√©t val√≥s felhaszn√°l√≥i interakci√≥k sor√°n, valamint az eredm√©nyek folyamatos elemz√©s√©t.

### Gyakori metrik√°k, amelyeket √©rdemes k√∂vetni √©les k√∂rnyezetben

1. **K√∂lts√©gek** ‚Äî Az instrument√°ci√≥ r√∂gz√≠ti a tokenhaszn√°latot, amelyet hozz√°vet≈ëleges k√∂lts√©gekk√© alak√≠thatsz, ha egy √°rat rendelsz minden tokenhez.
2. **K√©sleltet√©s** ‚Äî Figyeld meg, mennyi id≈ëbe telik egy-egy l√©p√©s vagy a teljes folyamat v√©grehajt√°sa.
3. **Felhaszn√°l√≥i visszajelz√©s** ‚Äî A felhaszn√°l√≥k k√∂zvetlen visszajelz√©st adhatnak (p√©ld√°ul pozit√≠v/negat√≠v √©rt√©kel√©s), hogy seg√≠tsenek az √ºgyn√∂k finomhangol√°s√°ban vagy jav√≠t√°s√°ban.
4. **LLM mint b√≠r√≥** ‚Äî Haszn√°lj egy k√ºl√∂n√°ll√≥ LLM-et az √ºgyn√∂k kimenet√©nek k√∂zel val√≥s idej≈± √©rt√©kel√©s√©re (p√©ld√°ul a toxikuss√°g vagy helyess√©g ellen≈ërz√©s√©re).

Az al√°bbiakban bemutatjuk ezeknek a metrik√°knak a p√©ld√°it.


#### 1. K√∂lts√©gek

Az al√°bbi k√©perny≈ëk√©p a `gpt-4o-mini` h√≠v√°sok haszn√°lat√°t mutatja. Ez hasznos a k√∂lts√©ges l√©p√©sek azonos√≠t√°s√°hoz √©s az √ºgyn√∂k optimaliz√°l√°s√°hoz.

![K√∂lts√©gek](https://langfuse.com/images/cookbook/example-autogen-evaluation/gpt-4o-costs.png) 

_[Hivatkoz√°s a nyomk√∂vet√©sre](https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/dac2b33e7cd709e685ccf86a137ecc64)_


#### 2. K√©sleltet√©s

L√°thatjuk azt is, hogy mennyi id≈ëbe telt az egyes l√©p√©sek v√©grehajt√°sa. Az al√°bbi p√©ld√°ban az eg√©sz futtat√°s k√∂r√ºlbel√ºl 3 m√°sodpercet vett ig√©nybe, amit l√©p√©sekre bontva elemezhet√ºnk. Ez seg√≠t azonos√≠tani a sz≈±k keresztmetszeteket √©s optimaliz√°lni az √ºgyn√∂k m≈±k√∂d√©s√©t.

![K√©sleltet√©s](https://langfuse.com/images/cookbook/example-autogen-evaluation/agent-latency.png) 

_[Nyomk√∂vet√©s linkje](https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/dac2b33e7cd709e685ccf86a137ecc64?display=timeline)_


#### 3. Felhaszn√°l√≥i visszajelz√©s

Ha az √ºgyn√∂k√∂t egy felhaszn√°l√≥i fel√ºletbe √°gyazt√°k, k√∂zvetlen felhaszn√°l√≥i visszajelz√©seket is r√∂gz√≠thet (p√©ld√°ul egy "tetszik/nem tetszik" gombot egy chat fel√ºleten).


In [10]:
from langfuse import get_client
 
langfuse = get_client()
 
# Option 1: Use the yielded span object from the context manager
with langfuse.start_as_current_span(
    name="autogen-request-user-feedback-1") as span:
    
    async for message in team.run_stream(task="Create a meal with potatoes"):
            if isinstance(message, TaskResult):
                print("Stop Reason:", message.stop_reason)
            else:
                print(message)    
 
    # Score using the span object
    span.score_trace(
        name="user-feedback",
        value=1,
        data_type="NUMERIC",
        comment="This was delicious, thank you"
    )
 
# Option 2: Use langfuse.score_current_trace() if still in context
with langfuse.start_as_current_span(name="autogen-request-user-feedback-2") as span:
    # ... Autogen execution ...

    async for message in team.run_stream(task="I am allergic to gluten."):
            if isinstance(message, TaskResult):
                print("Stop Reason:", message.stop_reason)
            else:
                print(message)    
 
    # Score using current context
    langfuse.score_current_trace(
        name="user-feedback",
        value=1,
        data_type="NUMERIC"
    )

id='da068880-22ae-4f01-9f01-2bb231939089' source='user' models_usage=None metadata={} created_at=datetime.datetime(2025, 7, 2, 16, 20, 43, 732669, tzinfo=datetime.timezone.utc) content='Create a meal with potatoes' type='TextMessage'
id='ad937ce4-3534-493f-824b-ca9c226b5287' source='meal_planner_agent' models_usage=RequestUsage(prompt_tokens=95, completion_tokens=30) metadata={} created_at=datetime.datetime(2025, 7, 2, 16, 20, 45, 186423, tzinfo=datetime.timezone.utc) content='Potato and Spinach Frittata  \n- Eggs  \n- Potatoes  \n- Fresh spinach  \n- Onion  \n- Cheese (optional)  ' type='TextMessage'
id='50fd33c1-057f-49fe-afad-ee86d164296d' source='nutritionist_agent' models_usage=RequestUsage(prompt_tokens=132, completion_tokens=4) metadata={} created_at=datetime.datetime(2025, 7, 2, 16, 20, 45, 581059, tzinfo=datetime.timezone.utc) content='APPROVE' type='TextMessage'
Stop Reason: Text 'APPROVE' mentioned
id='e371de6c-e5fc-42c1-8eda-e5b8cd5accab' source='user' models_usage=None met

In [None]:
# Option 3: Use create_score() with trace ID (when outside context)
langfuse.create_score(
    trace_id="predefined_trace_id",
    name="user-feedback",
    value=1,
    data_type="NUMERIC",
    comment="This was correct, thank you"
)

A felhaszn√°l√≥i visszajelz√©seket ezut√°n a Langfuse r√∂gz√≠ti:

![A felhaszn√°l√≥i visszajelz√©seket a Langfuse r√∂gz√≠ti](https://langfuse.com/images/cookbook/example-autogen-evaluation/user-feedback.png)


#### 4. Automatiz√°lt LLM-mint-B√≠r√≥ Pontoz√°s

Az LLM-mint-B√≠r√≥ egy m√°sik m√≥dszer az √ºgyn√∂k√∂d √°ltal gener√°lt kimenet automatikus √©rt√©kel√©s√©re. Be√°ll√≠thatsz egy k√ºl√∂n LLM-h√≠v√°st, amely a kimenet helyess√©g√©t, toxikuss√°g√°t, st√≠lus√°t vagy b√°rmely m√°s sz√°modra fontos krit√©riumot vizsg√°lja.

**Munkafolyamat**:
1. Meghat√°rozol egy **√ârt√©kel√©si Sablont**, p√©ld√°ul: "Ellen≈ërizd, hogy a sz√∂veg toxikus-e."
2. Be√°ll√≠tasz egy modellt, amelyet b√≠r√≥-modellk√©nt haszn√°lsz; ebben az esetben az `gpt-4o-mini` modellt, amelyet az Azure-on kereszt√ºl k√©rdezel le.
3. Minden alkalommal, amikor az √ºgyn√∂k√∂d kimenetet gener√°l, azt a kimenetet a "b√≠r√≥" LLM-nek tov√°bb√≠tod a sablonnal egy√ºtt.
4. A b√≠r√≥ LLM egy √©rt√©kel√©ssel vagy c√≠mk√©vel v√°laszol, amelyet r√∂gz√≠tesz a megfigyel√©si eszk√∂z√∂dben.

P√©lda a Langfuse-b√≥l:

![LLM-mint-B√≠r√≥ √ârt√©kel≈ë](https://langfuse.com/images/cookbook/example-autogen-evaluation/evaluator.png)


In [12]:
with langfuse.start_as_current_span(name="autogen-request-user-feedback-2") as span:

    async for message in team.run_stream(task="I am a picky eater and not sure if you find something for me."):
            if isinstance(message, TaskResult):
                print("Stop Reason:", message.stop_reason)
            else:
                print(message) 

    span.update_trace(
        input=user_input,
        output=message.stop_reason,
    )

langfuse.flush()

id='eefc628d-502f-451a-8f70-be486f62f8c5' source='user' models_usage=None metadata={} created_at=datetime.datetime(2025, 7, 2, 16, 38, 29, 171393, tzinfo=datetime.timezone.utc) content='I am a picky eater and not sure if you find something for me.' type='TextMessage'
id='13b3e14b-bcf7-42a5-80d6-54b0c7be765e' source='meal_planner_agent' models_usage=RequestUsage(prompt_tokens=352, completion_tokens=27) metadata={} created_at=datetime.datetime(2025, 7, 2, 16, 38, 30, 433516, tzinfo=datetime.timezone.utc) content='Chicken Alfredo Pasta  \n- Gluten-free pasta  \n- Grilled chicken breast  \n- Heavy cream  \n- Parmesan cheese  \n- Garlic  ' type='TextMessage'
id='550f2dee-0e08-4bbd-b67f-991b467328f1' source='nutritionist_agent' models_usage=RequestUsage(prompt_tokens=386, completion_tokens=17) metadata={} created_at=datetime.datetime(2025, 7, 2, 16, 38, 31, 505173, tzinfo=datetime.timezone.utc) content='Consider incorporating some vegetables, like spinach or broccoli, to increase the nutrien

Az al√°bbi p√©lda v√°lasza nem min≈ës√ºl s√©rt≈ënek.

![LLM-as-a-Judge √ârt√©kel√©si Pontsz√°m](https://langfuse.com/images/cookbook/example-autogen-evaluation/llm-as-a-judge-score.png)


#### 5. Megfigyelhet≈ës√©gi metrik√°k √°ttekint√©se

Ezeket a metrik√°kat egy√ºtt lehet megjelen√≠teni a m≈±szerfalakon. Ez lehet≈ëv√© teszi, hogy gyorsan √°ttekintsd, hogyan teljes√≠t az √ºgyn√∂k√∂d sz√°mos munkamenet sor√°n, √©s seg√≠t nyomon k√∂vetni a min≈ës√©gi mutat√≥kat az id≈ë m√∫l√°s√°val.

![Megfigyelhet≈ës√©gi metrik√°k √°ttekint√©se](https://langfuse.com/images/cookbook/example-autogen-evaluation/dashboard.png)


## Offline √©rt√©kel√©s

Az online √©rt√©kel√©s elengedhetetlen az azonnali visszajelz√©shez, de sz√ºks√©g van **offline √©rt√©kel√©sre** is‚Äîrendszeres ellen≈ërz√©sekre a fejleszt√©s el≈ëtt vagy k√∂zben. Ez seg√≠t meg≈ërizni a min≈ës√©get √©s megb√≠zhat√≥s√°got, miel≈ëtt a v√°ltoztat√°sokat bevezetn√©nk a termel√©si k√∂rnyezetbe.


### Adatk√©szlet √©rt√©kel√©se

Offline √©rt√©kel√©s sor√°n √°ltal√°ban:
1. Van egy referencia adatk√©szleted (amely tartalmazza a k√©rd√©s √©s v√°rt v√°lasz p√°rokat)
2. Lefuttatod az √ºgyn√∂k√∂det ezen az adatk√©szleten
3. √ñsszehasonl√≠tod az eredm√©nyeket a v√°rt v√°laszokkal, vagy egy tov√°bbi pontoz√°si mechanizmust haszn√°lsz

Az al√°bbiakban bemutatjuk ezt a megk√∂zel√≠t√©st a [q&a-dataset](https://huggingface.co/datasets/junzhang1207/search-dataset) seg√≠ts√©g√©vel, amely k√©rd√©seket √©s v√°rt v√°laszokat tartalmaz.


In [16]:
import pandas as pd
from datasets import load_dataset
 
# Fetch search-dataset from Hugging Face
dataset = load_dataset("junzhang1207/search-dataset", split = "train")
df = pd.DataFrame(dataset)
print("First few rows of search-dataset:")
print(df.head())

  from .autonotebook import tqdm as notebook_tqdm


First few rows of search-dataset:
                                     id  \
0  20caf138-0c81-4ef9-be60-fe919e0d68d4   
1  1f37d9fd-1bcc-4f79-b004-bc0e1e944033   
2  76173a7f-d645-4e3e-8e0d-cca139e00ebe   
3  5f5ef4ca-91fe-4610-a8a9-e15b12e3c803   
4  64dbed0d-d91b-4acd-9a9c-0a7aa83115ec   

                                            question  \
0                 steve jobs statue location budapst   
1  Why is the Battle of Stalingrad considered a t...   
2  In what year did 'The Birth of a Nation' surpa...   
3  How many Russian soldiers surrendered to AFU i...   
4   What event led to the creation of Google Images?   

                                     expected_answer       category       area  
0  The Steve Jobs statue is located in Budapest, ...           Arts  Knowledge  
1  The Battle of Stalingrad is considered a turni...   General News       News  
2  This question is based on a false premise. 'Th...  Entertainment       News  
3  About 300 Russian soldiers surrendered to t

Ezut√°n l√©trehozunk egy adat√°llom√°ny-entit√°st a Langfuse-ban a fut√°sok nyomon k√∂vet√©s√©re. Ezut√°n hozz√°adjuk az adat√°llom√°ny minden elem√©t a rendszerhez.


In [17]:
from langfuse import Langfuse
langfuse = Langfuse()
 
langfuse_dataset_name = "qa-dataset_autogen-agent"
 
# Create a dataset in Langfuse
langfuse.create_dataset(
    name=langfuse_dataset_name,
    description="q&a dataset uploaded from Hugging Face",
    metadata={
        "date": "2025-03-21",
        "type": "benchmark"
    }
)

Dataset(id='cmcm7524d00kjad07s2cjwqcf', name='qa-dataset_autogen-agent', description='q&a dataset uploaded from Hugging Face', metadata={'date': '2025-03-21', 'type': 'benchmark'}, project_id='cloramnkj0002jz088vzn1ja4', created_at=datetime.datetime(2025, 7, 2, 16, 54, 7, 357000, tzinfo=datetime.timezone.utc), updated_at=datetime.datetime(2025, 7, 2, 16, 54, 7, 357000, tzinfo=datetime.timezone.utc))

In [18]:
df_25 = df.sample(25) # For this example, we upload only 25 dataset questions

for idx, row in df_25.iterrows():
    langfuse.create_dataset_item(
        dataset_name=langfuse_dataset_name,
        input={"text": row["question"]},
        expected_output={"text": row["expected_answer"]}
    )

![Adatk√©szlet elemei a Langfuse-ban](https://langfuse.com/images/cookbook/example-autogen-evaluation/example-dataset.png)


#### Az √ºgyn√∂k futtat√°sa az adathalmazon

El≈ësz√∂r √∂ssze√°ll√≠tunk egy egyszer≈± Autogen √ºgyn√∂k√∂t, amely az Azure OpenAI modellek seg√≠ts√©g√©vel v√°laszol a k√©rd√©sekre.


In [8]:
import os
from dotenv import load_dotenv

from autogen_agentchat.agents import AssistantAgent
from autogen_core.models import UserMessage
from autogen_ext.models.azure import AzureAIChatCompletionClient
from azure.core.credentials import AzureKeyCredential
from autogen_core import CancellationToken
from autogen_agentchat.messages import TextMessage

In [None]:
load_dotenv()
client = AzureAIChatCompletionClient(
    model="gpt-4o",
    endpoint="https://models.inference.ai.azure.com",
    # To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings.
    # Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
    credential=AzureKeyCredential(os.getenv("GITHUB_TOKEN")),
    max_tokens=5000,
    model_info={
        "json_output": True,
        "function_calling": False,
        "vision": False,
        "family": "unknown",
        "structured_output": True,
    },
)

result = await client.create([UserMessage(content="What is the capital of France?", source="user")])
print(result)

In [18]:
agent = AssistantAgent(
    name="assistant",
    model_client=client,
    tools=[],
    system_message="You are participant in a quizz show and you are given a question. You need to create a short answer to the question.",
)

Ezut√°n defini√°lunk egy seg√©df√ºggv√©nyt, `my_agent()`.


In [19]:
async def my_agent(user_query: str):

    with langfuse.start_as_current_span(name="autogen-trace") as span:

        # Execute the agent response
        response = await agent.on_messages(
            [TextMessage(content=user_query, source="user")],
            cancellation_token=CancellationToken(),
        )

        span.update_trace(
            input=user_query,
            output=response.chat_message.content,
        )

    return str(response.chat_message.content)

# Test the function
await my_agent("What is the capital of France?")

'The capital of France is Paris.'

V√©g√ºl v√©gigmegy√ºnk minden adat√°llom√°ny-elemen, futtatjuk az √ºgyn√∂k√∂t, √©s √∂sszekapcsoljuk a nyomvonalat az adat√°llom√°ny-elemmel. Ha szeretn√©nk, gyors √©rt√©kel√©si pontsz√°mot is csatolhatunk.


In [20]:
dataset_name = "qa-dataset_autogen-agent"
current_run_name = "dev_tasks_run-autogen_gpt-4.1" # Identifies this specific evaluation run
current_run_metadata={"model_provider": "Azure", "model": "gpt-4.1"}
current_run_description="Evaluation run for Autogen model on July 3rd"

dataset = langfuse.get_dataset('qa-dataset_autogen-agent')

for item in dataset.items:
    print(f"Running evaluation for item: {item.id} (Input: {item.input})")
 
    # Use the item.run() context manager
    with item.run(
        run_name=current_run_name,
        run_metadata=current_run_metadata,
        run_description=current_run_description
    ) as root_span: 
        # All subsequent langfuse operations within this block are part of this trace.
        generated_answer = await my_agent(user_query = item.input["text"])
    
    print("Generated Answer: ", generated_answer)
 
print(f"\nFinished processing dataset '{dataset_name}' for run '{current_run_name}'.")

langfuse.flush()

Running evaluation for item: 09810cc4-9992-4712-a3b2-7224da31776a (Input: {'text': 'In Hindu mythology, which deity is the Ganges river dolphin associated with?'})
Generated Answer:  In Hindu mythology, the Ganges river dolphin is associated with the deity Ganga.
Running evaluation for item: bb113f94-7723-47c6-8c34-59d883044514 (Input: {'text': 'What significant discovery did the LHCb collaboration report in 2015?'})
Generated Answer:  In 2015, the LHCb collaboration reported the discovery of pentaquark particles.
Running evaluation for item: 4d8ae54e-ceab-46d0-ad2c-6e8e223589a9 (Input: {'text': 'What is the M√Ñ\x81ori name for the red-crowned parakeet?'})
Generated Answer:  The MƒÅori name for the red-crowned parakeet is kƒÅkƒÅriki.
Running evaluation for item: 21e5a0d5-f619-4a73-868e-9955053b3e72 (Input: {'text': 'Who starred in the 1978 television film adaptation of Les Mis√É¬©rables?'})
Generated Answer:  Richard Jordan starred as Jean Valjean in the 1978 television film adaptation

Fontos szab√°lyok:
1. NE adj hozz√° '''markdown vagy m√°s c√≠mk√©ket a ford√≠t√°s k√∂r√©.
2. √úgyelj arra, hogy a ford√≠t√°s ne hangozzon t√∫l sz√≥ szerint.
3. Ford√≠tsd le a megjegyz√©seket is.
4. Ez a f√°jl Markdown form√°tumban van √≠rva ‚Äì ne kezeld XML-k√©nt vagy HTML-k√©nt.
5. Ne ford√≠tsd le:
   - [!NOTE], [!WARNING], [!TIP], [!IMPORTANT], [!CAUTION]
   - V√°ltoz√≥nevek, f√ºggv√©nynevek, oszt√°lynevek
   - Helykit√∂lt≈ëk, mint p√©ld√°ul @@INLINE_CODE_x@@ vagy @@CODE_BLOCK_x@@
   - URL-ek vagy el√©r√©si utak
6. Tartsd meg az eredeti Markdown form√°z√°st.
7. CSAK a ford√≠tott tartalmat add vissza, tov√°bbi c√≠mk√©k vagy jel√∂l√©sek n√©lk√ºl.

Ezt a folyamatot k√ºl√∂nb√∂z≈ë √ºgyn√∂ki konfigur√°ci√≥kkal is megism√©telheted, p√©ld√°ul:
- Modellek (gpt-4o-mini, gpt-4.1 stb.)
- Utas√≠t√°sok
- Eszk√∂z√∂k (keres√©s vs. keres√©s n√©lk√ºli)
- √úgyn√∂k komplexit√°sa (t√∂bb √ºgyn√∂k vs. egyetlen √ºgyn√∂k)

Ezut√°n hasonl√≠tsd √∂ssze ≈ëket egym√°s mellett a Langfuse-ban. Ebben a p√©ld√°ban az √ºgyn√∂k√∂t h√°romszor futtattam a 25 adatb√°zis-k√©rd√©sen. Minden futtat√°sn√°l m√°s Azure OpenAI modellt haszn√°ltam. L√°that√≥, hogy a helyesen megv√°laszolt k√©rd√©sek sz√°ma javul, ha nagyobb modellt haszn√°lunk (ahogy az v√°rhat√≥). A `correct_answer` pontsz√°mot egy [LLM-as-a-Judge Evaluator](https://langfuse.com/docs/scores/model-based-evals) hozza l√©tre, amelyet √∫gy √°ll√≠tottak be, hogy a k√©rd√©s helyess√©g√©t az adatb√°zisban megadott minta v√°lasz alapj√°n √©rt√©kelje.

![Adatb√°zis futtat√°s √°ttekint√©se](https://langfuse.com/images/cookbook/example-autogen-evaluation/dataset_runs.png)
![Adatb√°zis futtat√°s √∂sszehasonl√≠t√°sa](https://langfuse.com/images/cookbook/example-autogen-evaluation/dataset-run-comparison.png)



---

**Felel≈ëss√©g kiz√°r√°sa**:  
Ez a dokumentum az AI ford√≠t√°si szolg√°ltat√°s [Co-op Translator](https://github.com/Azure/co-op-translator) seg√≠ts√©g√©vel lett leford√≠tva. B√°r t√∂reksz√ºnk a pontoss√°gra, k√©rj√ºk, vegye figyelembe, hogy az automatikus ford√≠t√°sok hib√°kat vagy pontatlans√°gokat tartalmazhatnak. Az eredeti dokumentum az eredeti nyelv√©n tekintend≈ë hiteles forr√°snak. Kritikus inform√°ci√≥k eset√©n javasolt professzion√°lis emberi ford√≠t√°st ig√©nybe venni. Nem v√°llalunk felel≈ëss√©get semmilyen f√©lre√©rt√©s√©rt vagy t√©ves √©rtelmez√©s√©rt, amely a ford√≠t√°s haszn√°lat√°b√≥l eredhet.
