DuckDB Integration — Customer Survey Sentiment Analysis¶
This notebook demonstrates how to use openaivec with DuckDB to analyze customer survey responses entirely in SQL. We register an AI-powered UDF once, then use standard SQL queries to extract structured sentiment from raw text.
1. Setup¶
Configure the OpenAI client and import the DuckDB extension.
import os
import openaivec
from openaivec import duckdb_ext
assert os.getenv("OPENAI_API_KEY") or os.getenv("AZURE_OPENAI_BASE_URL"), (
"Set OPENAI_API_KEY or Azure OpenAI environment variables before running this notebook."
)
openaivec.set_responses_model("gpt-5.4")
2. Prepare Sample Data¶
Create a small CSV file simulating customer survey responses.
import csv
from pathlib import Path
rows = [
{"id": 1, "customer": "Alice", "response": "The product quality is excellent and delivery was fast."},
{"id": 2, "customer": "Bob", "response": "Terrible experience. The item arrived broken and support was unhelpful."},
{"id": 3, "customer": "Charlie", "response": "It's okay, nothing special. Meets basic expectations."},
{"id": 4, "customer": "Diana", "response": "Absolutely love it! Best purchase I've made this year."},
{"id": 5, "customer": "Eve", "response": "Shipping took forever and the packaging was damaged."},
{"id": 6, "customer": "Frank", "response": "Good value for money. Would recommend to friends."},
{"id": 7, "customer": "Grace", "response": "The product quality is excellent and delivery was fast."},
{"id": 8, "customer": "Heidi", "response": "Not worth the price. Very disappointed."},
]
path = Path("survey.csv")
with open(path, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["id", "customer", "response"])
writer.writeheader()
writer.writerows(rows)
print(f"Wrote {len(rows)} rows to {path}")
Wrote 8 rows to survey.csv
3. Define the Response Model¶
A Pydantic model defines the structured output. The UDF will return a
DuckDB STRUCT matching these fields — no JSON parsing needed in SQL.
from typing import Literal
from pydantic import BaseModel
class SurveyAnalysis(BaseModel):
sentiment: Literal["positive", "negative", "neutral"]
confidence: float
summary: str
4. Register the UDF¶
One call registers an AI-powered function visible in SQL. Internally it batches rows, deduplicates identical inputs, and runs concurrent async requests — all transparent to the SQL user.
import duckdb
conn = duckdb.connect()
duckdb_ext.register_responses_udf(
conn,
"analyze_sentiment",
instructions="""Analyze the customer survey response.
Return the overall sentiment (positive/negative/neutral),
a confidence score between 0 and 1, and a one-sentence summary.""",
response_format=SurveyAnalysis,
)
5. Analyze — Pure SQL¶
Read the CSV directly with DuckDB and call the UDF. The result columns are native DuckDB types (VARCHAR, DOUBLE), not JSON strings.
conn.sql("""
SELECT
id,
customer,
response,
analyze_sentiment(response).sentiment AS sentiment,
analyze_sentiment(response).confidence AS confidence,
analyze_sentiment(response).summary AS summary
FROM 'survey.csv'
""").df()
FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))
| id | customer | response | sentiment | confidence | summary | |
|---|---|---|---|---|---|---|
| 0 | 1 | Alice | The product quality is excellent and delivery ... | positive | 0.98 | The customer is very satisfied with the excell... |
| 1 | 2 | Bob | Terrible experience. The item arrived broken a... | negative | 0.99 | The customer had a very poor experience due to... |
| 2 | 3 | Charlie | It's okay, nothing special. Meets basic expect... | neutral | 0.90 | The customer feels the product is acceptable b... |
| 3 | 4 | Diana | Absolutely love it! Best purchase I've made th... | positive | 1.00 | The customer is अत्यंत satisfied and considers... |
| 4 | 5 | Eve | Shipping took forever and the packaging was da... | negative | 0.97 | The customer is unhappy with the very slow shi... |
| 5 | 6 | Frank | Good value for money. Would recommend to friends. | positive | 0.95 | The customer believes the product offers good ... |
| 6 | 7 | Grace | The product quality is excellent and delivery ... | positive | 0.98 | The customer is very satisfied with the excell... |
| 7 | 8 | Heidi | Not worth the price. Very disappointed. | negative | 0.98 | The customer feels the product was overpriced ... |
6. Aggregate in SQL¶
Since sentiment is a proper VARCHAR column, standard SQL aggregation just works.
conn.sql("""
WITH analyzed AS (
SELECT
customer,
analyze_sentiment(response).sentiment AS sentiment,
analyze_sentiment(response).confidence AS confidence
FROM 'survey.csv'
)
SELECT
sentiment,
COUNT(*) AS count,
ROUND(AVG(confidence), 2) AS avg_confidence
FROM analyzed
GROUP BY sentiment
ORDER BY count DESC
""").df()
| sentiment | count | avg_confidence | |
|---|---|---|---|
| 0 | positive | 4 | 0.98 |
| 1 | negative | 3 | 0.98 |
| 2 | neutral | 1 | 0.90 |
7. Filter Unhappy Customers¶
Find customers who left negative feedback with high confidence.
conn.sql("""
SELECT customer, response, analyze_sentiment(response).summary AS summary
FROM 'survey.csv'
WHERE analyze_sentiment(response).sentiment = 'negative'
AND analyze_sentiment(response).confidence > 0.7
""").df()
| customer | response | summary | |
|---|---|---|---|
| 0 | Bob | Terrible experience. The item arrived broken a... | The customer had a very poor experience due to... |
| 1 | Eve | Shipping took forever and the packaging was da... | The customer is unhappy with the very slow shi... |
| 2 | Heidi | Not worth the price. Very disappointed. | The customer feels the product was overpriced ... |
8. Export Results to Parquet¶
Save the analyzed data for downstream use — all in one SQL statement.
conn.sql("""
COPY (
SELECT
id, customer, response,
analyze_sentiment(response).sentiment AS sentiment,
analyze_sentiment(response).confidence AS confidence,
analyze_sentiment(response).summary AS summary
FROM 'survey.csv'
) TO 'survey_analyzed.parquet' (FORMAT PARQUET)
""")
print("Exported to survey_analyzed.parquet")
conn.sql("SELECT * FROM 'survey_analyzed.parquet' LIMIT 3").df()
Exported to survey_analyzed.parquet
| id | customer | response | sentiment | confidence | summary | |
|---|---|---|---|---|---|---|
| 0 | 1 | Alice | The product quality is excellent and delivery ... | positive | 0.98 | The customer is very satisfied with the excell... |
| 1 | 2 | Bob | Terrible experience. The item arrived broken a... | negative | 0.99 | The customer had a very poor experience due to... |
| 2 | 3 | Charlie | It's okay, nothing special. Meets basic expect... | neutral | 0.90 | The customer feels the product is acceptable b... |
9. Cleanup¶
import os
for f in ["survey.csv", "survey_analyzed.parquet"]:
if os.path.exists(f):
os.remove(f)
conn.close()
Summary¶
| Step | Code | What happens |
|---|---|---|
| Setup | register_responses_udf(conn, ...) |
Register AI function once |
| Analyze | SELECT analyze_sentiment(text).* FROM csv |
Pure SQL, structured output |
| Aggregate | GROUP BY sentiment |
Standard SQL — no JSON parsing |
| Export | COPY ... TO parquet |
One-liner export |
The key insight: openaivec's DuckDB integration turns LLM calls into first-class SQL functions with proper types, batching, deduplication, and async concurrency — all invisible to the SQL query.