This document proposes a new D3D12 feature that allows developers to resolve query data on the CPU timeline rather than requiring GPU-based resolution, improving convenience and performance for common query scenarios.
D3D12’s current query system requires developers to explicitly resolve query data on the GPU timeline using ResolveQueryData() Command List operations. This approach, while providing fine-grained control, introduces several inconveniences:
This proposal introduces CPU-based query resolution APIs that allow the runtime/driver to resolve query data using the CPU after GPU execution completes, providing a more convenient and often more efficient alternative for common query scenarios.
Current D3D12 query workflow requires several steps:
This workflow is cumbersome for scenarios where developers simply want to read query results on the CPU after GPU work completes. Common use cases that would benefit from CPU resolution include:
The CPU Timeline Query Resolution feature introduces one new API:
The design allows developers to resolve the query data on the CPU timeline once the GPU work that wrote the queries has completed. The method itself performs no GPU synchronization — ensuring the relevant GPU work has finished before resolving is the application’s responsibility (see the Remarks under ID3D12Device::ResolveQueryData). Any post processing of the query data is abstracted by the runtime and driver.
For drivers or hardware that do not support native CPU resolution, the runtime provides transparent fallback by automatically generating the necessary GPU-based resolution operations.
typedef enum D3D12_QUERY_HEAP_FLAGS
{
D3D12_QUERY_HEAP_FLAG_NONE = 0,
D3D12_QUERY_HEAP_FLAG_CPU_RESOLVE = 1,
};
Members:
D3D12_QUERY_HEAP_FLAG_NONE - No special flags. The query heap can only be resolved using the traditional GPU-based ID3D12GraphicsCommandList::ResolveQueryData() method.
D3D12_QUERY_HEAP_FLAG_CPU_RESOLVE - Enables CPU-based query resolution for this query heap. When this flag is set, applications can use the ID3D12Device::ResolveQueryData() method to resolve query data directly on the CPU timeline after GPU execution completes.
Remarks:
Query heaps created without the D3D12_QUERY_HEAP_FLAG_CPU_RESOLVE flag cannot be used with the ID3D12Device::ResolveQueryData() method and will return E_INVALIDARG if attempted. Conversely, heaps created with the CPU resolve flag cannot be used with the Command List resolve API.
Create a Query Heap with the option for CPU based resolves.
HRESULT CreateQueryHeap1(
[in] const D3D12_QUERY_HEAP_DESC *pDesc,
[in] D3D12_QUERY_HEAP_FLAGS Flags,
[in] REFIID riid,
[inout] void **ppvHeap);
Parameters:
pDesc - A pointer to a D3D12_QUERY_HEAP_DESC structure that describes the query heap. This structure contains the query heap type, number of queries, and node mask for multi-adapter scenarios.
Flags - A D3D12_QUERY_HEAP_FLAGS value that specifies additional options for the query heap. Use D3D12_QUERY_HEAP_FLAG_CPU_RESOLVE to enable CPU-based query resolution with the ID3D12Device::ResolveQueryData() method.
riid - The globally unique identifier (GUID) for the query heap interface. This parameter is typically IID_ID3D12QueryHeap.
ppvHeap - A pointer to a memory block that receives a pointer to the query heap object. The type of interface returned depends on the riid parameter.
Return Value:
Remarks:
This method extends the original CreateQueryHeap() method by adding support for query heap creation flags. Query heaps created with this method are functionally identical to those created with CreateQueryHeap() when using D3D12_QUERY_HEAP_FLAG_NONE.
When the D3D12_QUERY_HEAP_FLAG_CPU_RESOLVE flag is specified, the runtime and driver may optimize the query heap allocation for CPU access. This can enable more efficient CPU-based query resolution but may have different memory characteristics compared to standard query heaps.
CPU-based query resolution method on a Device.
HRESULT ResolveQueryData(
[in] ID3D12QueryHeap *pQueryHeap,
[in] D3D12_QUERY_TYPE Type,
[in] UINT StartIndex,
[in] UINT NumQueries,
[inout] void* pResolvedQueryData);
Parameters:
pQueryHeap - Query heap containing the queries to resolveType - Type of queries to resolveStartIndex - Index of first query to resolveNumQueries - Number of consecutive queries to resolvepResolvedQueryData - A pointer to CPU memory to receive the resolved query dataReturn Value:
S_OK - Resolution completed successfullyE_INVALIDARG - Invalid parametersRemarks: This method resolves query data on the CPU timeline. Any required post-processing of the raw query data generated on the GPU timeline is performed on the CPU immediately by the runtime and/or driver before this method returns.
This method does not block on the GPU and performs no implicit synchronization. It does not wait for, flush, or otherwise track the GPU work that writes the queries being resolved. The runtime associates no implicit fence with the query heap and does not track which queue(s) may write to it.
It is the application’s responsibility to ensure that all GPU work which writes the queries in the range [StartIndex, StartIndex + NumQueries) has completed execution on the GPU before calling this method — for example, by waiting on an ID3D12Fence that the application signals on the command queue after submitting the command list(s) containing the corresponding EndQuery operations. If this method is called before that GPU work has completed, the data written to pResolvedQueryData is undefined.
This mirrors the existing GPU-timeline ID3D12GraphicsCommandList::ResolveQueryData: in both cases the application must guarantee the queries have been fully written before the resolve is performed. The only difference is that resolution and readback occur synchronously on the CPU timeline rather than being scheduled on the GPU timeline.
The output buffer must be sized correctly for the query type and number of queries to resolve.
The query heap must have been created with the D3D12_QUERY_HEAP_FLAG_CPU_RESOLVE flag for this method to succeed.
typedef enum D3D12DDI_QUERY_HEAP_FLAGS
{
D3D12DDI_QUERY_HEAP_FLAG_NONE = 0,
D3D12DDI_QUERY_HEAP_FLAG_CPU_RESOLVE = 1,
};
DDI for CreateQueryHeap1
typedef HRESULT (APIENTRY* PFND3D12DDI_CREATE_QUERY_HEAP_0119)(
D3D12DDI_HDEVICE hDevice,
_In_ CONST D3D12DDIARG_CREATE_QUERY_HEAP_0001* pCreate,
D3D12DDI_QUERY_HEAP_FLAGS Flags,
D3D12DDI_HQUERYHEAP hQueryHeap
);
DDI for CPU-based query resolution.
typedef HRESULT (APIENTRY* PFND3D12DDI_RESOLVE_QUERY_DATA)(
D3D12DDI_HDEVICE hDevice,
D3D12DDI_HQUERYHEAP hQueryHeap,
D3D12_QUERY_TYPE Type,
UINT StartIndex,
UINT NumQueries,
void* pResolvedQueryData
);
For drivers or hardware that do not support native CPU-based query resolution, the D3D12 runtime provides transparent fallback behavior to ensure the feature works universally across all D3D12-capable hardware.
When a query heap is created with the D3D12_QUERY_HEAP_FLAG_CPU_RESOLVE flag on hardware or drivers that lack native support:
Automatic GPU Resolution Injection: The runtime automatically injects GPU-based ResolveQueryData operations into command lists at Close() time for any query heaps that have been used and marked for CPU resolution.
Internal Buffer Management: The runtime creates and manages internal GPU-visible destination buffers to store the resolved query results. These buffers are sized appropriately for the query types and counts used.
Transparent Operation: Applications using ID3D12Device::ResolveQueryData() are unaware of the fallback - the method behaves identically whether using native support or runtime emulation.
| Version | Date | Description |
|---|---|---|
| 1.0 | August 2025 | Initial version |
| 1.1 | September 2025 | Remove TOP/BOP Timestamps from proposal |
| 1.2 | June 2026 | Clarify that CPU ResolveQueryData does not block on the GPU or perform implicit synchronization; the application must ensure query-writing GPU work has completed before resolving |