Deduplication & Deterministic Task IDs
Overview
By default, ai4s-jobq generates deterministic task IDs — identical tasks
always produce the same ID (the MD5 hash of the serialized task content).
When combined with Azure Service Bus duplicate detection, re-submitting the
same task within the detection window is silently ignored by the broker.
This is useful for fault-tolerant pipelines where a producer might retry enqueueing after a transient failure without knowing whether the first attempt succeeded.
How it works
When a task is pushed, its ID is computed as
md5(serialized_payload).The task ID is sent as the Service Bus
MessageId.If the queue has duplicate detection enabled, Service Bus drops any message whose
MessageIdwas already seen within the detection window (7 days by default for queues created byai4s-jobq).
Queues created by ai4s-jobq v3.0+ have duplicate detection enabled
automatically. Existing queues created by earlier versions must be deleted
and recreated to gain this capability — Azure does not allow enabling duplicate
detection on an existing queue.
Configuration
Parameter |
CLI flag |
Default |
Description |
|---|---|---|---|
|
— |
|
When |
|
|
|
Duration of the Service Bus duplicate detection history window. Passed as integer days on the CLI or as a |
Set JOBQ_DETERMINISTIC_IDS=false if you intentionally want every push to
create a distinct message, even for identical payloads.
Mismatch warning
On startup, ai4s-jobq checks whether the queue’s duplicate-detection setting
is consistent with JOBQ_DETERMINISTIC_IDS and logs a warning if not:
Deterministic IDs enabled, but queue has no duplicate detection — IDs are deterministic but the broker won’t actually deduplicate. Delete and recreate the queue.
Queue has duplicate detection, but deterministic IDs are disabled — random UUIDs mean the broker will never see a duplicate. Either enable deterministic IDs or recreate the queue without duplicate detection.
Azure Storage Queue
Azure Storage Queues do not support duplicate detection at the broker level. Deterministic IDs are still generated (and can be useful for tracking), but deduplication must be handled application-side if needed.