pandas_ext Basics¶
This notebook explains the core pandas_ext flow in order: input, outputs, and practical benefits.
1. Setup¶
Configure authentication and set a default responses model.
In [1]:
Copied!
import os
import pandas as pd
from pydantic import BaseModel, Field
from openaivec import pandas_ext
assert os.getenv("OPENAI_API_KEY") or os.getenv("AZURE_OPENAI_BASE_URL"), (
"Set OPENAI_API_KEY or Azure OpenAI environment variables before running this notebook."
)
pandas_ext.set_responses_model("gpt-5.2")
import os
import pandas as pd
from pydantic import BaseModel, Field
from openaivec import pandas_ext
assert os.getenv("OPENAI_API_KEY") or os.getenv("AZURE_OPENAI_BASE_URL"), (
"Set OPENAI_API_KEY or Azure OpenAI environment variables before running this notebook."
)
pandas_ext.set_responses_model("gpt-5.2")
2. Input: pandas DataFrame¶
The main input is a normal DataFrame or Series.
In [2]:
Copied!
fruits_df = pd.DataFrame({
"name": ["apple", "banana", "cherry", "orange"],
"price_usd": [1.2, 0.8, 2.5, 1.1],
})
fruits_df
fruits_df = pd.DataFrame({
"name": ["apple", "banana", "cherry", "orange"],
"price_usd": [1.2, 0.8, 2.5, 1.1],
})
fruits_df
Out[2]:
| name | price_usd | |
|---|---|---|
| 0 | apple | 1.2 |
| 1 | banana | 0.8 |
| 2 | cherry | 2.5 |
| 3 | orange | 1.1 |
3. Output A: plain-text responses¶
Use .ai.responses() when you want one text output per row.
In [3]:
Copied!
name_fr = fruits_df["name"].ai.responses(
"Translate this fruit name to French.",
batch_size=16,
show_progress=True,
)
fruits_df.assign(name_fr=name_fr)
name_fr = fruits_df["name"].ai.responses(
"Translate this fruit name to French.",
batch_size=16,
show_progress=True,
)
fruits_df.assign(name_fr=name_fr)
Processing batches: 0%| | 0/4 [00:00<?, ?item/s]
Out[3]:
| name | price_usd | name_fr | |
|---|---|---|---|
| 0 | apple | 1.2 | pomme |
| 1 | banana | 0.8 | banane |
| 2 | cherry | 2.5 | cerise |
| 3 | orange | 1.1 | orange |
4. Output B: structured responses¶
Use response_format when you want typed, structured output.
In [4]:
Copied!
class FruitFacts(BaseModel):
family: str = Field(description="Botanical family name")
color: str = Field(description="Typical fruit color")
short_note: str = Field(description="Short one-line note")
facts = fruits_df["name"].ai.responses(
"Return botanical family, typical color, and a short note for this fruit.",
response_format=FruitFacts,
batch_size=16,
show_progress=True,
)
facts.head()
class FruitFacts(BaseModel):
family: str = Field(description="Botanical family name")
color: str = Field(description="Typical fruit color")
short_note: str = Field(description="Short one-line note")
facts = fruits_df["name"].ai.responses(
"Return botanical family, typical color, and a short note for this fruit.",
response_format=FruitFacts,
batch_size=16,
show_progress=True,
)
facts.head()
Processing batches: 0%| | 0/4 [00:00<?, ?item/s]
Out[4]:
0 family='Rosaceae' color='Red, green, or yellow... 1 family='Musaceae' color='Yellow (ripe), green ... 2 family='Rosaceae' color='Red to deep purple' s... 3 family='Rutaceae' color='Orange' short_note='A... Name: name, dtype: object
5. Expand structured output into columns¶
In [5]:
Copied!
facts_df = facts.rename("facts").ai.extract()
fruits_df.join(facts_df)
facts_df = facts.rename("facts").ai.extract()
fruits_df.join(facts_df)
Out[5]:
| name | price_usd | facts_family | facts_color | facts_short_note | |
|---|---|---|---|---|---|
| 0 | apple | 1.2 | Rosaceae | Red, green, or yellow | A pome fruit from Malus domestica, widely eate... |
| 1 | banana | 0.8 | Musaceae | Yellow (ripe), green (unripe) | An elongated berry from Musa species; typicall... |
| 2 | cherry | 2.5 | Rosaceae | Red to deep purple | A stone fruit (drupe) from Prunus species, kno... |
| 3 | orange | 1.1 | Rutaceae | Orange | A citrus hesperidium (Citrus × sinensis), priz... |
6. Benefits¶
Main input
- A pandas
SeriesorDataFrame - One prompt, optionally with a
response_formatschema
Main outputs
- Plain-text outputs (
.ai.responses) - Structured typed outputs (
response_format=...) - Expandable columns via
.ai.extract()
Why this is useful
- Keeps LLM processing inside familiar pandas workflows
- Supports structured validation with Pydantic models
- Provides batching controls (
batch_size) for larger datasets