Intelligent Fill with the ai.fillna Method¶
This notebook explains the feature in order: the main input, the output, and the practical benefits.
1. Setup¶
Configure authentication and choose a responses model.
In [1]:
Copied!
import os
import pandas as pd
from openaivec import pandas_ext
assert os.getenv("OPENAI_API_KEY") or os.getenv("AZURE_OPENAI_BASE_URL"), (
"Set OPENAI_API_KEY or Azure OpenAI environment variables before running this notebook."
)
pandas_ext.set_responses_model("gpt-4.1-mini")
import os
import pandas as pd
from openaivec import pandas_ext
assert os.getenv("OPENAI_API_KEY") or os.getenv("AZURE_OPENAI_BASE_URL"), (
"Set OPENAI_API_KEY or Azure OpenAI environment variables before running this notebook."
)
pandas_ext.set_responses_model("gpt-4.1-mini")
2. Input: DataFrame and target column¶
The input is a normal DataFrame with missing values in one target column.
In [2]:
Copied!
products = pd.DataFrame({
"product": ["Wireless Earbuds", "Yoga Mat", "Desk Lamp", "Water Bottle", "Running Shoes"],
"category": ["Electronics", "Fitness", "Home", "Fitness", "Footwear"],
"price_usd": [129.0, 35.0, 48.0, 20.0, 95.0],
"marketing_copy": [
"Immersive sound with all-day battery life",
None,
"Warm light for focused evening work",
None,
None,
],
})
products
products = pd.DataFrame({
"product": ["Wireless Earbuds", "Yoga Mat", "Desk Lamp", "Water Bottle", "Running Shoes"],
"category": ["Electronics", "Fitness", "Home", "Fitness", "Footwear"],
"price_usd": [129.0, 35.0, 48.0, 20.0, 95.0],
"marketing_copy": [
"Immersive sound with all-day battery life",
None,
"Warm light for focused evening work",
None,
None,
],
})
products
Out[2]:
| product | category | price_usd | marketing_copy | |
|---|---|---|---|---|
| 0 | Wireless Earbuds | Electronics | 129.0 | Immersive sound with all-day battery life |
| 1 | Yoga Mat | Fitness | 35.0 | None |
| 2 | Desk Lamp | Home | 48.0 | Warm light for focused evening work |
| 3 | Water Bottle | Fitness | 20.0 | None |
| 4 | Running Shoes | Footwear | 95.0 | None |
3. Run intelligent fill¶
Call .ai.fillna() with the target column. The method fills only missing values in that column.
In [3]:
Copied!
filled_products = products.ai.fillna(
target_column_name="marketing_copy",
batch_size=16,
show_progress=True,
)
filled_products
filled_products = products.ai.fillna(
target_column_name="marketing_copy",
batch_size=16,
show_progress=True,
)
filled_products
Processing batches: 0%| | 0/3 [00:00<?, ?item/s]
Out[3]:
| product | category | price_usd | marketing_copy | |
|---|---|---|---|---|
| 0 | Wireless Earbuds | Electronics | 129.0 | Immersive sound with all-day battery life |
| 1 | Yoga Mat | Fitness | 35.0 | Non-slip surface for safe and comfortable work... |
| 2 | Desk Lamp | Home | 48.0 | Warm light for focused evening work |
| 3 | Water Bottle | Fitness | 20.0 | Durable and lightweight bottle to stay hydrate... |
| 4 | Running Shoes | Footwear | 95.0 | Lightweight and supportive design for optimal ... |
4. Output: before and after¶
The output keeps the same schema and index, with missing target values completed.
In [4]:
Copied!
summary = products[["product", "marketing_copy"]].rename(columns={"marketing_copy": "before"}).assign(
after=filled_products["marketing_copy"]
)
summary
summary = products[["product", "marketing_copy"]].rename(columns={"marketing_copy": "before"}).assign(
after=filled_products["marketing_copy"]
)
summary
Out[4]:
| product | before | after | |
|---|---|---|---|
| 0 | Wireless Earbuds | Immersive sound with all-day battery life | Immersive sound with all-day battery life |
| 1 | Yoga Mat | None | Non-slip surface for safe and comfortable work... |
| 2 | Desk Lamp | Warm light for focused evening work | Warm light for focused evening work |
| 3 | Water Bottle | None | Durable and lightweight bottle to stay hydrate... |
| 4 | Running Shoes | None | Lightweight and supportive design for optimal ... |
5. Benefits¶
Main input
- A DataFrame
- One target column name that contains missing values
Main output
- A DataFrame with the same rows and columns
- Missing values in the target column filled with context-aware values
Why this is useful
- Uses row context from other columns instead of simple mean/mode rules
- Keeps the pandas workflow simple (
df.ai.fillna(...)) - Supports batching controls (
batch_size) for larger datasets