DuckDB Integration — Customer Survey Sentiment Analysis¶

This notebook demonstrates how to use openaivec with DuckDB to analyze customer survey responses entirely in SQL. We register an AI-powered UDF once, then use standard SQL queries to extract structured sentiment from raw text.

1. Setup¶

Configure the OpenAI client and import the DuckDB extension.

In [1]:

Copied!

import os

import openaivec
from openaivec import duckdb_ext

assert os.getenv("OPENAI_API_KEY") or os.getenv("AZURE_OPENAI_BASE_URL"), (
    "Set OPENAI_API_KEY or Azure OpenAI environment variables before running this notebook."
)

openaivec.set_responses_model("gpt-5.4")
import os

import openaivec
from openaivec import duckdb_ext

assert os.getenv("OPENAI_API_KEY") or os.getenv("AZURE_OPENAI_BASE_URL"), (
    "Set OPENAI_API_KEY or Azure OpenAI environment variables before running this notebook."
)

openaivec.set_responses_model("gpt-5.4")

2. Prepare Sample Data¶

Create a small CSV file simulating customer survey responses.

In [2]:

Copied!





import csv
from pathlib import Path

rows = [
    {"id": 1, "customer": "Alice",   "response": "The product quality is excellent and delivery was fast."},
    {"id": 2, "customer": "Bob",     "response": "Terrible experience. The item arrived broken and support was unhelpful."},
    {"id": 3, "customer": "Charlie", "response": "It's okay, nothing special. Meets basic expectations."},
    {"id": 4, "customer": "Diana",   "response": "Absolutely love it! Best purchase I've made this year."},
    {"id": 5, "customer": "Eve",     "response": "Shipping took forever and the packaging was damaged."},
    {"id": 6, "customer": "Frank",   "response": "Good value for money. Would recommend to friends."},
    {"id": 7, "customer": "Grace",   "response": "The product quality is excellent and delivery was fast."},
    {"id": 8, "customer": "Heidi",   "response": "Not worth the price. Very disappointed."},
]

path = Path("survey.csv")
with open(path, "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["id", "customer", "response"])
    writer.writeheader()
    writer.writerows(rows)

print(f"Wrote {len(rows)} rows to {path}")
import csv
from pathlib import Path

rows = [
    {"id": 1, "customer": "Alice",   "response": "The product quality is excellent and delivery was fast."},
    {"id": 2, "customer": "Bob",     "response": "Terrible experience. The item arrived broken and support was unhelpful."},
    {"id": 3, "customer": "Charlie", "response": "It's okay, nothing special. Meets basic expectations."},
    {"id": 4, "customer": "Diana",   "response": "Absolutely love it! Best purchase I've made this year."},
    {"id": 5, "customer": "Eve",     "response": "Shipping took forever and the packaging was damaged."},
    {"id": 6, "customer": "Frank",   "response": "Good value for money. Would recommend to friends."},
    {"id": 7, "customer": "Grace",   "response": "The product quality is excellent and delivery was fast."},
    {"id": 8, "customer": "Heidi",   "response": "Not worth the price. Very disappointed."},
]

path = Path("survey.csv")
with open(path, "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["id", "customer", "response"])
    writer.writeheader()
    writer.writerows(rows)

print(f"Wrote {len(rows)} rows to {path}")

Wrote 8 rows to survey.csv

3. Define the Response Model¶

A Pydantic model defines the structured output. The UDF will return a DuckDB STRUCT matching these fields — no JSON parsing needed in SQL.

In [3]:

Copied!

from typing import Literal

from pydantic import BaseModel

class SurveyAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float
    summary: str
from typing import Literal

from pydantic import BaseModel

class SurveyAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float
    summary: str

4. Register the UDF¶

One call registers an AI-powered function visible in SQL. Internally it batches rows, deduplicates identical inputs, and runs concurrent async requests — all transparent to the SQL user.

In [4]:

Copied!





import duckdb

conn = duckdb.connect()

duckdb_ext.register_responses_udf(
    conn,
    "analyze_sentiment",
    instructions="""Analyze the customer survey response.
Return the overall sentiment (positive/negative/neutral),
a confidence score between 0 and 1, and a one-sentence summary.""",
    response_format=SurveyAnalysis,
)
import duckdb

conn = duckdb.connect()

duckdb_ext.register_responses_udf(
    conn,
    "analyze_sentiment",
    instructions="""Analyze the customer survey response.
Return the overall sentiment (positive/negative/neutral),
a confidence score between 0 and 1, and a one-sentence summary.""",
    response_format=SurveyAnalysis,
)

5. Analyze — Pure SQL¶

Read the CSV directly with DuckDB and call the UDF. The result columns are native DuckDB types (VARCHAR, DOUBLE), not JSON strings.

In [5]:

Copied!





conn.sql("""
    SELECT
        id,
        customer,
        response,
        analyze_sentiment(response).sentiment   AS sentiment,
        analyze_sentiment(response).confidence  AS confidence,
        analyze_sentiment(response).summary     AS summary
    FROM 'survey.csv'
""").df()
conn.sql("""
    SELECT
        id,
        customer,
        response,
        analyze_sentiment(response).sentiment   AS sentiment,
        analyze_sentiment(response).confidence  AS confidence,
        analyze_sentiment(response).summary     AS summary
    FROM 'survey.csv'
""").df()

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Out[5]:

	id	customer	response	sentiment	confidence	summary
0	1	Alice	The product quality is excellent and delivery ...	positive	0.98	The customer is very satisfied with the excell...
1	2	Bob	Terrible experience. The item arrived broken a...	negative	0.99	The customer had a very poor experience due to...
2	3	Charlie	It's okay, nothing special. Meets basic expect...	neutral	0.90	The customer feels the product is acceptable b...
3	4	Diana	Absolutely love it! Best purchase I've made th...	positive	1.00	The customer is अत्यंत satisfied and considers...
4	5	Eve	Shipping took forever and the packaging was da...	negative	0.97	The customer is unhappy with the very slow shi...
5	6	Frank	Good value for money. Would recommend to friends.	positive	0.95	The customer believes the product offers good ...
6	7	Grace	The product quality is excellent and delivery ...	positive	0.98	The customer is very satisfied with the excell...
7	8	Heidi	Not worth the price. Very disappointed.	negative	0.98	The customer feels the product was overpriced ...

6. Aggregate in SQL¶

Since sentiment is a proper VARCHAR column, standard SQL aggregation just works.

In [6]:

Copied!





conn.sql("""
    WITH analyzed AS (
        SELECT
            customer,
            analyze_sentiment(response).sentiment  AS sentiment,
            analyze_sentiment(response).confidence AS confidence
        FROM 'survey.csv'
    )
    SELECT
        sentiment,
        COUNT(*)            AS count,
        ROUND(AVG(confidence), 2) AS avg_confidence
    FROM analyzed
    GROUP BY sentiment
    ORDER BY count DESC
""").df()
conn.sql("""
    WITH analyzed AS (
        SELECT
            customer,
            analyze_sentiment(response).sentiment  AS sentiment,
            analyze_sentiment(response).confidence AS confidence
        FROM 'survey.csv'
    )
    SELECT
        sentiment,
        COUNT(*)            AS count,
        ROUND(AVG(confidence), 2) AS avg_confidence
    FROM analyzed
    GROUP BY sentiment
    ORDER BY count DESC
""").df()

Out[6]:

	sentiment	count	avg_confidence
0	positive	4	0.98
1	negative	3	0.98
2	neutral	1	0.90

7. Filter Unhappy Customers¶

Find customers who left negative feedback with high confidence.

In [7]:

Copied!





conn.sql("""
    SELECT customer, response, analyze_sentiment(response).summary AS summary
    FROM 'survey.csv'
    WHERE analyze_sentiment(response).sentiment = 'negative'
      AND analyze_sentiment(response).confidence > 0.7
""").df()
conn.sql("""
    SELECT customer, response, analyze_sentiment(response).summary AS summary
    FROM 'survey.csv'
    WHERE analyze_sentiment(response).sentiment = 'negative'
      AND analyze_sentiment(response).confidence > 0.7
""").df()

Out[7]:

	customer	response	summary
0	Bob	Terrible experience. The item arrived broken a...	The customer had a very poor experience due to...
1	Eve	Shipping took forever and the packaging was da...	The customer is unhappy with the very slow shi...
2	Heidi	Not worth the price. Very disappointed.	The customer feels the product was overpriced ...

8. Export Results to Parquet¶

Save the analyzed data for downstream use — all in one SQL statement.

In [8]:

Copied!





conn.sql("""
    COPY (
        SELECT
            id, customer, response,
            analyze_sentiment(response).sentiment  AS sentiment,
            analyze_sentiment(response).confidence AS confidence,
            analyze_sentiment(response).summary    AS summary
        FROM 'survey.csv'
    ) TO 'survey_analyzed.parquet' (FORMAT PARQUET)
""")

print("Exported to survey_analyzed.parquet")
conn.sql("SELECT * FROM 'survey_analyzed.parquet' LIMIT 3").df()
conn.sql("""
    COPY (
        SELECT
            id, customer, response,
            analyze_sentiment(response).sentiment  AS sentiment,
            analyze_sentiment(response).confidence AS confidence,
            analyze_sentiment(response).summary    AS summary
        FROM 'survey.csv'
    ) TO 'survey_analyzed.parquet' (FORMAT PARQUET)
""")

print("Exported to survey_analyzed.parquet")
conn.sql("SELECT * FROM 'survey_analyzed.parquet' LIMIT 3").df()

Exported to survey_analyzed.parquet

Out[8]:

	id	customer	response	sentiment	confidence	summary
0	1	Alice	The product quality is excellent and delivery ...	positive	0.98	The customer is very satisfied with the excell...
1	2	Bob	Terrible experience. The item arrived broken a...	negative	0.99	The customer had a very poor experience due to...
2	3	Charlie	It's okay, nothing special. Meets basic expect...	neutral	0.90	The customer feels the product is acceptable b...

9. Cleanup¶

In [9]:

Copied!





import os

for f in ["survey.csv", "survey_analyzed.parquet"]:
    if os.path.exists(f):
        os.remove(f)
conn.close()
import os

for f in ["survey.csv", "survey_analyzed.parquet"]:
    if os.path.exists(f):
        os.remove(f)
conn.close()

Summary¶

Step	Code	What happens
Setup	`register_responses_udf(conn, ...)`	Register AI function once
Analyze	`SELECT analyze_sentiment(text).* FROM csv`	Pure SQL, structured output
Aggregate	`GROUP BY sentiment`	Standard SQL — no JSON parsing
Export	`COPY ... TO parquet`	One-liner export

The key insight: openaivec's DuckDB integration turns LLM calls into first-class SQL functions with proper types, batching, deduplication, and async concurrency — all invisible to the SQL query.