Token Count and Processing Time Considerations¶
Choosing an appropriate batch size can significantly impact the performance and stability of a system. An excessively small batch size can increase the overhead of API calls, leading to longer overall processing times. Conversely, an excessively large batch size can cause various issues, such as application timeouts.
How should we determine the appropriate batch size?
openaivec
allows you to automatically determine the optimal batch size using a simple algorithm by specifying batch_size=None
.
In this document, we will introduce the results of investigating the relationship between batch size and processing time using BatchSizeSuggester
from openaivec
.
We will use the gpt-5-nano
model for the experiments in this document.
import pandas as pd
from openaivec import pandas_ext
# Note: These are internal APIs and not recommended for general use
from openaivec._optimize import BatchSizeSuggester
from openaivec._proxy import AsyncBatchingMapProxy
pandas_ext.responses_model("gpt-5-nano")
⚠️ Warning: Advanced Internal API Usage
This notebook demonstrates the use of internal APIs (BatchSizeSuggester
and AsyncBatchingMapProxy
) for performance analysis purposes. These APIs are not part of the public interface and may change in future versions.
For typical use cases, simply use batch_size=None
in pandas .ai
or .aio
methods, which will automatically optimize batch size without requiring direct interaction with these internal components.
This is a low-level API that you are unlikely to use. The AsyncBatchingMapProxy
has a parameter called batch_size
,
which, when set to None
, allows the BatchSizeSuggester
to automatically determine the batch size.
The fields of BatchSizeSuggester
are hyperparameters for batch size optimization, and we will use them for performance investigation.
This configuration determines the batch size sequentially through the following steps:
- The initial batch size is set to 10.
- As records accumulate, it retrieves the most recent
sample_size=3
samples and calculates the average processing time. - If this average processing time is below
min_duration=60
seconds, it increases the batch size by1 + step_ratio
. - Conversely, if the average processing time exceeds
max_duration=120
seconds, it decreases the batch size by1 - step_ratio
. - This process is repeated to find the optimal batch size.
While a duration of 60 to 180 seconds is quite long for typical applications, we use these longer durations for performance analysis.
For example, in systems like Apache Spark, various processes have timeouts set to several minutes.
suggester = BatchSizeSuggester(
current_batch_size=10,
min_batch_size=10,
min_duration=60,
max_duration=180,
step_ratio=0.2,
sample_size=3
)
cache = AsyncBatchingMapProxy(
batch_size=None,
max_concurrency=8,
show_progress=True,
suggester=suggester
)
For this time, we will repeat a simple task many times. Specifically, when given an employee ID, we will generate a random name.
We will execute this task while varying the batch_size
and record the processing time for each batch.
await pd.DataFrame({"id": list(range(10000))}).aio.assign(
name=lambda df: df.aio.responses_with_cache(
instructions="""
For each employee ID, generate a random name in English, including both first and last names.
Just return the name without any additional text.
""",
temperature=None,
cache=cache,
)
)
Processing batches: 0%| | 0/10000 [00:00<?, ?item/s]
id | name | |
---|---|---|
0 | 0 | Oliver Reed |
1 | 1 | Amelia Hart |
2 | 2 | Benjamin Cole |
3 | 3 | Sophia Lane |
4 | 4 | Lucas Miles |
... | ... | ... |
9995 | 9995 | Lena Stone |
9996 | 9996 | Owen Miles |
9997 | 9997 | Cleo Bryant |
9998 | 9998 | Trevor Lane |
9999 | 9999 | Nadia Parks |
10000 rows × 2 columns
BatchSizeSuggester
records the processing time and exceptions for each batch as follows. Let's aggregate this data to investigate the relationship between batch size and processing time.
pd.DataFrame(suggester._history)
duration | batch_size | executed_at | exception | |
---|---|---|---|---|
0 | 4.774321 | 10 | 2025-08-13 10:24:57.245258+00:00 | None |
1 | 6.607769 | 10 | 2025-08-13 10:24:57.127725+00:00 | None |
2 | 7.508548 | 10 | 2025-08-13 10:24:57.247499+00:00 | None |
3 | 8.048153 | 10 | 2025-08-13 10:24:57.246682+00:00 | None |
4 | 8.557814 | 10 | 2025-08-13 10:24:57.246033+00:00 | None |
... | ... | ... | ... | ... |
187 | 25.374727 | 134 | 2025-08-13 10:30:41.271673+00:00 | None |
188 | 35.114657 | 151 | 2025-08-13 10:30:31.814291+00:00 | None |
189 | 33.780222 | 181 | 2025-08-13 10:30:34.815913+00:00 | None |
190 | 41.746972 | 151 | 2025-08-13 10:30:34.014817+00:00 | None |
191 | 37.831411 | 181 | 2025-08-13 10:30:39.196203+00:00 | None |
192 rows × 4 columns
pd.DataFrame(suggester._history).to_csv("suggester_history.csv", index=False)
pd.read_csv("suggester_history.csv").plot(
x="batch_size",
y="duration",
kind="scatter",
title="Batch Size vs Duration",
xlabel="Batch Size",
ylabel="Each Duration (seconds)",
)
<Axes: title={'center': 'Batch Size vs Duration'}, xlabel='Batch Size', ylabel='Each Duration (seconds)'>
x-axis represents the batch size, and the y-axis represents the processing time for each batch. As expected, as the batch size increases, the processing time for each batch also increases.
However, it appears that the rate of increase in processing time becomes more gradual as the batch size increases. Let's verify this in the next figure.
pd.read_csv("suggester_history.csv").assign(
label = lambda df: df["batch_size"]
).pipe(
lambda df: df.groupby("label").sum(["duration", "batch_size"]).reset_index()
).assign(
unit_duration=lambda df: df["duration"] / df["batch_size"]
).plot(
x="label",
y="unit_duration",
kind="bar",
title="Batch Size vs Unit Duration",
xlabel="Batch Size",
ylabel="Average Duration per Item (seconds)",
)
<Axes: title={'center': 'Batch Size vs Unit Duration'}, xlabel='Batch Size', ylabel='Average Duration per Item (seconds)'>
This figure shows the relationship between batch size and average processing time per item.
Increasing the batch size gradually leads to larger processing times at first, because the OpenAI API's prompt caching benefits have not yet taken effect.
However, as the batch size increases, the average processing time gradually decreases. This is because the execution time of the batch includes overhead for API calls, so fewer calls result in shorter average processing times.
openaivec
provides a feature to limit the processing time of each batch, allowing for stable data processing.