Token Count and Processing Time Considerations¶

Choosing an appropriate batch size can significantly impact the performance and stability of a system. An excessively small batch size can increase the overhead of API calls, leading to longer overall processing times. Conversely, an excessively large batch size can cause various issues, such as application timeouts.

How should we determine the appropriate batch size?

openaivec allows you to automatically determine the optimal batch size using a simple algorithm by specifying batch_size=None. In this document, we will introduce the results of investigating the relationship between batch size and processing time using BatchSizeSuggester from openaivec.

We will use the gpt-5-nano model for the experiments in this document.

In [ ]:

Copied!

import pandas as pd

from openaivec import pandas_ext

# Note: These are internal APIs and not recommended for general use
from openaivec._cache import BatchSizeSuggester
from openaivec._cache import AsyncBatchingMapProxy

pandas_ext.set_responses_model("gpt-5-nano")
import pandas as pd

from openaivec import pandas_ext

# Note: These are internal APIs and not recommended for general use
from openaivec._cache import BatchSizeSuggester
from openaivec._cache import AsyncBatchingMapProxy

pandas_ext.set_responses_model("gpt-5-nano")

⚠️ Warning: Advanced Internal API Usage

This notebook demonstrates the use of internal APIs (BatchSizeSuggester and AsyncBatchingMapProxy) for performance analysis purposes. These APIs are not part of the public interface and may change in future versions.

For typical use cases, simply use batch_size=None in pandas .ai or .aio methods, which will automatically optimize batch size without requiring direct interaction with these internal components.

This is a low-level API that you are unlikely to use. The AsyncBatchingMapProxy has a parameter called batch_size, which, when set to None, allows the BatchSizeSuggester to automatically determine the batch size.

The fields of BatchSizeSuggester are hyperparameters for batch size optimization, and we will use them for performance investigation.

This configuration determines the batch size sequentially through the following steps:

The initial batch size is set to 10.
As records accumulate, it retrieves the most recent sample_size=3 samples and calculates the average processing time.
If this average processing time is below min_duration=60 seconds, it increases the batch size by 1 + step_ratio.
Conversely, if the average processing time exceeds max_duration=120 seconds, it decreases the batch size by 1 - step_ratio.
This process is repeated to find the optimal batch size.

While a duration of 60 to 180 seconds is quite long for typical applications, we use these longer durations for performance analysis.

For example, in systems like Apache Spark, various processes have timeouts set to several minutes.

In [2]:

Copied!





suggester = BatchSizeSuggester(
    current_batch_size=10,
    min_batch_size=10,
    min_duration=60,
    max_duration=180,
    step_ratio=0.2,
    sample_size=3
)

cache = AsyncBatchingMapProxy(
    batch_size=None,
    max_concurrency=8,
    show_progress=True,
    suggester=suggester
)
suggester = BatchSizeSuggester(
    current_batch_size=10,
    min_batch_size=10,
    min_duration=60,
    max_duration=180,
    step_ratio=0.2,
    sample_size=3
)

cache = AsyncBatchingMapProxy(
    batch_size=None,
    max_concurrency=8,
    show_progress=True,
    suggester=suggester
)

For this time, we will repeat a simple task many times. Specifically, when given an employee ID, we will generate a random name.

We will execute this task while varying the batch_size and record the processing time for each batch.

In [3]:

Copied!





await pd.DataFrame({"id": list(range(10000))}).aio.assign(
    name=lambda df: df.aio.responses_with_cache(
        instructions="""
        For each employee ID, generate a random name in English, including both first and last names. 
        Just return the name without any additional text.
        """,
        temperature=None,
        cache=cache,
    )
)
await pd.DataFrame({"id": list(range(10000))}).aio.assign(
    name=lambda df: df.aio.responses_with_cache(
        instructions="""
        For each employee ID, generate a random name in English, including both first and last names. 
        Just return the name without any additional text.
        """,
        temperature=None,
        cache=cache,
    )
)

Processing batches:   0%|          | 0/10000 [00:00<?, ?item/s]

Out[3]:

	id	name
0	0	Oliver Reed
1	1	Amelia Hart
2	2	Benjamin Cole
3	3	Sophia Lane
4	4	Lucas Miles
...	...	...
9995	9995	Lena Stone
9996	9996	Owen Miles
9997	9997	Cleo Bryant
9998	9998	Trevor Lane
9999	9999	Nadia Parks

10000 rows × 2 columns

BatchSizeSuggester records the processing time and exceptions for each batch as follows. Let's aggregate this data to investigate the relationship between batch size and processing time.

In [4]:

Copied!

pd.DataFrame(suggester._history)
pd.DataFrame(suggester._history)

Out[4]:

	duration	batch_size	executed_at	exception
0	4.774321	10	2025-08-13 10:24:57.245258+00:00	None
1	6.607769	10	2025-08-13 10:24:57.127725+00:00	None
2	7.508548	10	2025-08-13 10:24:57.247499+00:00	None
3	8.048153	10	2025-08-13 10:24:57.246682+00:00	None
4	8.557814	10	2025-08-13 10:24:57.246033+00:00	None
...	...	...	...	...
187	25.374727	134	2025-08-13 10:30:41.271673+00:00	None
188	35.114657	151	2025-08-13 10:30:31.814291+00:00	None
189	33.780222	181	2025-08-13 10:30:34.815913+00:00	None
190	41.746972	151	2025-08-13 10:30:34.014817+00:00	None
191	37.831411	181	2025-08-13 10:30:39.196203+00:00	None

192 rows × 4 columns

In [5]:

Copied!

pd.DataFrame(suggester._history).to_csv("suggester_history.csv", index=False)
pd.DataFrame(suggester._history).to_csv("suggester_history.csv", index=False)

In [6]:

Copied!





pd.read_csv("suggester_history.csv").plot(
    x="batch_size",
    y="duration",
    kind="scatter",
    title="Batch Size vs Duration",
    xlabel="Batch Size",
    ylabel="Each Duration (seconds)",
)
pd.read_csv("suggester_history.csv").plot(
    x="batch_size",
    y="duration",
    kind="scatter",
    title="Batch Size vs Duration",
    xlabel="Batch Size",
    ylabel="Each Duration (seconds)",
)

Out[6]:

<Axes: title={'center': 'Batch Size vs Duration'}, xlabel='Batch Size', ylabel='Each Duration (seconds)'>

No description has been provided for this image

x-axis represents the batch size, and the y-axis represents the processing time for each batch. As expected, as the batch size increases, the processing time for each batch also increases.

However, it appears that the rate of increase in processing time becomes more gradual as the batch size increases. Let's verify this in the next figure.

In [8]:

Copied!





pd.read_csv("suggester_history.csv").assign(
    label = lambda df: df["batch_size"]
).pipe(
    lambda df: df.groupby("label").sum(["duration", "batch_size"]).reset_index()
).assign(
    unit_duration=lambda df: df["duration"] / df["batch_size"]
).plot(
    x="label",
    y="unit_duration",
    kind="bar",
    title="Batch Size vs Unit Duration",
    xlabel="Batch Size",
    ylabel="Average Duration per Item (seconds)",

)
pd.read_csv("suggester_history.csv").assign(
    label = lambda df: df["batch_size"]
).pipe(
    lambda df: df.groupby("label").sum(["duration", "batch_size"]).reset_index()
).assign(
    unit_duration=lambda df: df["duration"] / df["batch_size"]
).plot(
    x="label",
    y="unit_duration",
    kind="bar",
    title="Batch Size vs Unit Duration",
    xlabel="Batch Size",
    ylabel="Average Duration per Item (seconds)",

)

Out[8]:

<Axes: title={'center': 'Batch Size vs Unit Duration'}, xlabel='Batch Size', ylabel='Average Duration per Item (seconds)'>

This figure shows the relationship between batch size and average processing time per item.

Increasing the batch size gradually leads to larger processing times at first, because the OpenAI API's prompt caching benefits have not yet taken effect.

However, as the batch size increases, the average processing time gradually decreases. This is because the execution time of the batch includes overhead for API calls, so fewer calls result in shorter average processing times.

openaivec provides a feature to limit the processing time of each batch, allowing for stable data processing.

In [ ]: