DirectX-Specs

D3D12 Indirect Drawing


Contents


Summary

This document describes D3D12 features needed to allow applications to generate command buffers on the GPU.

Motivation

Some game developers see significant performance advantages by moving scene-traversal and culling onto the GPU. This is hard to do with the D3D API because D3D requires command buffers to be generated by the CPU. This proposal contains additions to D3D12 which would allow a limited degree of GPU-based command buffer generation.


Detailed Design


Overview

A new API object is added to D3D12, the command signature. This object enables applications to specify:

At startup, an application would create a small set of command signatures. At runtime, the application would fill a buffer with commands (via whatever means that application chooses). The application would then use D3D12 command list APIs to set state (render target bindings, PSO, etc), and then use a command list API to cause the GPU to interpret the contents of the indirect argument buffer according to the format defined by a particular command signature.

For example, suppose an application wants a unique root constant to be specified per-draw call in the indirect argument buffer. The application would create a command signature that enables the indirect argument buffer to specify the following parameters per draw call:

The indirect argument buffer generated by the application would contain an array of fixed-size records. Each structure corresponds to 1 draw call. Each structure contains the drawing arguments, and the value of the root constant. The number of draw calls is specified in a separate GPU-visible buffer.

An example command buffer generated by the application would look like:

Command Buffer Format  
RootConstant (RootParameterIndex=1) Draw structure #1
VertexCount  
InstanceCount  
StartVertexLocation  
StartInstanceLocation  
RootConstant (RootParameterIndex=1) Draw structure #2
VertexCount  
InstanceCount  
StartVertexLocation  
StartInstanceLocation  
RootConstant (RootParameterIndex=1) Draw structure #3
VertexCount  
InstanceCount  
StartVertexLocation  
StartInstanceLocation  

Indirect Argument Buffer Structures

The following structures define how particular arguments appear in an indirect argument buffer. These structures do not appear in any D3D12 API. Applications use these definitions when writing to an indirect argument buffer (with the CPU or GPU)

typedef struct D3D12_DRAW_ARGUMENTS
{
    UINT VertexCountPerInstance;
    UINT InstanceCount;
    UINT StartVertexLocation;
    UINT StartInstanceLocation;
} D3D12_DRAW_ARGUMENTS;

typedef struct D3D12_DRAW_INDEXED_ARGUMENTS
{
    UINT IndexCountPerInstance;
    UINT InstanceCount;
    UINT StartIndexLocation;
    INT BaseVertexLocation;
    UINT StartInstanceLocation;
} D3D12_DRAW_INDEXED_ARGUMENTS;

typedef struct D3D12_DISPATCH_ARGUMENTS
{
    UINT ThreadGroupCountX;
    UINT ThreadGroupCountY;
    UINT ThreadGroupCountZ;
} D3D12_DISPATCH_ARGUMENTS;

typedef struct D3D12_VERTEX_BUFFER_VIEW
{
    D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
    UINT SizeInBytes;
    UINT StrideInBytes;
} D3D12_VERTEX_BUFFER_VIEW;

typedef struct D3D12_INDEX_BUFFER_VIEW
{
    D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
    UINT SizeInBytes;
    DXGI_FORMAT Format;
} D3D12_INDEX_BUFFER_VIEW;

typedef struct D3D12_CONSTANT_BUFFER_VIEW
{
    D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
    UINT SizeInBytes;
    UINT Padding;
} D3D12_CONSTANT_BUFFER_VIEW;

Command signature Creation

Applications use the following API to create a command signature.

typedef enum D3D12_INDIRECT_PARAMETER_TYPE
{
    D3D12_INDIRECT_PARAMETER_DRAW,
    D3D12_INDIRECT_PARAMETER_DRAW_INDEXED,
    D3D12_INDIRECT_PARAMETER_DISPATCH,
    D3D12_INDIRECT_PARAMETER_VERTEX_BUFFER_VIEW,
    D3D12_INDIRECT_PARAMETER_INDEX_BUFFER_VIEW,
    D3D12_INDIRECT_PARAMETER_CONSTANT,
    D3D12_INDIRECT_PARAMETER_CONSTANT_BUFFER_VIEW,
    D3D12_INDIRECT_PARAMETER_SHADER_RESOURCE_VIEW,
    D3D12_INDIRECT_PARAMETER_UNORDERED_ACCESS_VIEW,
} D3D12_INDIRECT_PARAMETER_TYPE;

typedef struct D3D12_INDIRECT_PARAMETER
{
    D3D12_INDIRECT_PARAMETER_TYPE Type;
    union
    {
        struct
        {
            UINT Slot;
        } VertexBuffer;

        struct
        {
            UINT RootParameterIndex;
            UINT DestOffsetIn32BitValues;
            UINT Num32BitValuesToSet;
        } Constant;

        struct
        {
            UINT RootParameterIndex;
        } ConstantBufferView;

        struct
        {
            UINT RootParameterIndex;
        } ShaderResourceView;

        struct
        {
            UINT RootParameterIndex;
        } UnorderedAccessView;
    };
} D3D12_INDIRECT_PARAMETER;

typedef struct D3D12_COMMAND_SIGNATURE
{
    // The number of bytes between each drawing structure
    UINT ByteStride;
    UINT ParameterCount;
    const D3D12_INDIRECT_PARAMETER* pParameters;
} D3D12_COMMAND_SIGNATURE;

HRESULT ID3D12Device::CreateCommandSignature(
    const D3D12_COMMAND_SIGNATURE* pDesc,
    ID3D12RootSignature* pRootSignature,
    REFIID riid, // Expected: ID3D12CommandSignature
    void** ppCommandSignature
);

The ordering of arguments within an indirect argument buffer is defined to exactly match the order of arguments specified in D3D12_COMMAND_SIGNATURE::pArguments. All of the arguments for 1 draw/dispatch call within an indirect argument buffer are tightly packed. However, applications are allowed to specify an arbitrary byte stride between draw/dispatch commands in an indirect argument buffer.

The root signature must be specified if and only if the command signature changes one of the root arguments.

For root SRV/UAV/CBV, the application specified size in in bytes. The debug layer will validate the following restrictions on the sizes and address:

  1. CBV – Address and size must be a multiple of 256 bytes

  2. Raw UAV – Address and size must be a multiple of 4 bytes

  3. Typed UAV – Address and size must be a multiple of the UAV format size

  4. Structured UAV – Address and size must be a multiple of the structure byte stride (declared in the shader)

  5. SRV - Address and size must be a multiple of the SRV format size

A given command signature is either a draw or a compute command signature. If a command signature contains a drawing operation, then it is a graphics command signature. Otherwise, the command signature must contain a dispatch operation, and it is a compute command signature.

Graphics command signatures only affect graphics root arguments. Likewise, compute command signatures only affect compute root arguments.


Example Command signatures


Plain MultiDrawIndirect

In this example, the indirect argument buffer generated by the application holds an array of 36-byte structures. Each structure only contains the 5 parameters passed to DrawIndexedInstanced (plus padding).

The code to create the command signature description is:

D3D12_INDIRECT_PARAMETER Args[1];
Args[0].Type = D3D12_INDIRECT_PARAMETER_DRAW_INDEXED_INSTANCED;

D3D12_COMMAND_SIGNATURE ProgramDesc;
ProgramDesc.ByteStride = 36;
ProgramDesc.ArgumentCount = 1;
ProgramDesc.pArguments = Args;

The layout of a single structure within an indirect argument buffer is:

   
Bytes 0:3 IndexCountPerInstance
Bytes 4:7 InstanceCount
Bytes 8:11 StartIndexLocation
Bytes 12:15 BaseVertexLocation
Bytes 16:19 StartInstanceLocation
Bytes 20:35 Padding

Root Constants + Vertex Buffers

In this example, each structure in an indirect argument buffer changes 2 root constants, changes 1 vertex buffer binding, and performs 1 drawing non-indexed operation. There is no padding between structures.

The code to create the command signature description is:

D3D12_INDIRECT_PARAMETER Args[4];
Args[0].Type = D3D12_INDIRECT_PARAMETER_CONSTANT;
Args[0].Constant.RootParameterIndex = 2;
Args[1].Type = D3D12_INDIRECT_PARAMETER_CONSTANT;
Args[1].Constant.RootParameterIndex = 6;
Args[2].Type = D3D12_INDIRECT_PARAMETER_VERTEX_BUFFER;
Args[2].VertexBuffer.VBSlot = 3;
Args[3].Type = D3D12_INDIRECT_PARAMETER_DRAW_INSTANCED;

D3D12_COMMAND_SIGNATURE ProgramDesc;
ProgramDesc.ByteStride = 40;
ProgramDesc.ArgumentCount = 4;
ProgramDesc.pArguments = Args;

The layout of a single structure within the indirect argument buffer is:

   
Bytes 0:3 Data for root parameter index 2
Bytes 4:7 Data for root parameter index 6
Bytes 8:15 Virtual address of VB (64-bit)
Bytes 16:19 VB stride
Bytes 20:23 VB size
Bytes 24:27 VertexCountPerInstance
Bytes 28:31 InstanceCount
Bytes 32:35 StartVertexLocation
Bytes 36:39 StartInstanceLocation

Validation

The runtime will validate the following:


Drawing

Applications perform indirect draws/dispatches via the following API:

void ID3D12CommandList::ExecuteIndirect(
    ID3D12CommandSignature* pCommandSignature,
    UINT MaxCommandCount,
    ID3D12Resource* pArgumentBuffer,
    UINT64 ArgumentBufferOffset,
    ID3D12Resource* pCountBuffer,
    UINT64 CountBufferOffset
);

There are 2 ways that command counts can be specified:

If pCountBuffer is not NULL, then MaxCommandCount specifies the maximum number of operations which will be performed. The actual number of operations to be performed are defined by the minimum of this value, and a 32-bit unsigned integer contained in pCountBuffer (at the byte offset specified by CountBufferOffset).

If pCounterBuffer is NULL, the MaxCommandCount specifies the exact number of operations which will be performed.

The semantics of this API are defined with the following pseudo-code:

Non-NULL pCountBuffer:

// Read draw count out of count buffer
UINT CommandCount = pCountBuffer->ReadUINT32(CountBufferOffset);
CommandCount = min(CommandCount, MaxCommandCount)

// Get pointer to first Commanding argument
BYTE* Arguments = pArgumentBuffer->GetBase() + ArgumentBufferOffset;

for(UINT CommandIndex = 0; CommandIndex < CommandCount; CommandIndex++)
{
    // Interpret the data contained in *Arguments
    // according to the command signature
    pCommandSignature->Interpret(Arguments);
    Arguments += pCommandSignature ->GetByteStride();
}

NULL pCountBuffer:

// Get pointer to first Commanding argument
BYTE* Arguments = pArgumentBuffer->GetBase() + ArgumentBufferOffset;

for(UINT CommandIndex = 0; CommandIndex < MaxCommandCount;CommandIndex++)
{
  // Interpret the data contained in *Arguments
  // according to the command signature
  pCommandSignature->Interpret(Arguments);
  Arguments += pCommandSignature ->GetByteStride();
}

The debug layer will issue an error if either the count buffer or the argument buffer are not in the D3D12_RESOURCE_USAGE_INDIRECT_ARGUMENT state.

The core runtime will validate:

The debug layer will validate:

ID3D12CommandList::DrawInstancedIndirect and ID3D12CommandList::DrawIndexedInstancedIndirect are removed from the D3D12 API because they can be implemented with the features described here.


Bundles

ID3D12CommandList::ExecuteIndirect is allowed inside of bundle command lists only if all of the following are true:

  1. CountBuffer is NULL (CPU-specified count only)

  2. The command signature contains exactly 1 operation. This implies that the command signature does not contain root arguments changes, nor contain VB/IB binding changes.


State leakage

ExecuteIndirect is defined to reset all bindings affected by the ExecuteIndirect to known values. In particular.

This enables drivers to easily track bindings. This is implemented by the D3D12 runtime by making a series of DDI calls after the ExecuteIndirect is called.


Obtaining buffer virtual addresses

A new API is added whereby an application can retrieve the GPU virtual address of a buffer.

typedef UINT64 D3D12_GPU_VIRTUAL_ADDRESS;

D3D12_GPU_VIRTUAL_ADDRESS ID3D12Resource::GetGPUVirtualAddress();

Applications are free to apply byte offsets to virtual addresses before placing them in an indirect argument buffer. Note that all of the D3D12 alignment requirements for VB/IB/CB still apply to the resulting GPU virtual address.

This API returns 0’s for non-buffer resources.


Implementation Details

Both of the following implementations are acceptable:

Note that in the 2^nd^ approach, the hidden compute shader invocations associated with many ExecuteIndirect calls can be combined together. If a command list has no resource transition barrier to the D3D12_RESOURCE_USAGE_INDIRECT_ARGUMENT state, then it is safe to move all of the hidden compute shader invocations to the beginning of the command list.


GPU Validation

In order to achieve consistent behavior across machines, GPUs are expected to perform the following validation:

  1. The draw count specified in the indirect argument buffer is guaranteed to not exceed the MaxCommandCount specified in the ExecuteIndirect API call. This is achieved by having the GPU compute min(MaxCommandCount, CommandCount).

Test Plan


Runtime Functional Tests


Driver Conformance Tests