DirectX-Specs

D3D12 RS5 Render Passes -- Unified Spec

Version 1.0


Contents


Background

Render Passes are intended to help TBDR-based (and other) renderers improve GPU efficiency by reducing memory traffic to/from off-chip memory, by enabling applications to better identify resource rendering ordering requirements/data dependencies.

The name “Render Passes” may be changed in a future spec.

Optimization Goals

Goals

Allow applications to avoid unnecessary loads/stores of resources from/to main memory on TBDR architectures

This feature introduces the concept of Render Passes, which are intended to provide a central location for applications to indicate their data dependencies for a set of rendering operations.

These data dependencies are intended to allow drivers to inspect this data at bind/barrier time, and issue instructions that minimize resource loads/stores from/to main memory.

Allow TBDR architectures to opportunistically persistent resources in on-chip cache across Render Passes (even in separate Command Lists)

Case A: Reading/Writing One-to-One

A common rendering pattern is for an application to render to RTV A, then turn around and texture from that resource as SRV A at some time in future (while rendering to RTV B). For cases in which the writes to RTV B are reading from pixels ‘one-to-one’ (mapped to the identical location in SRV A), some architectures may be able to continue the current binning pass during the writes to RTV A, and avoid a flush to main memory (since the SRV A reads only have a dependency on the current tile).

A design goal is to enable these two passes to be coalesced, without a intervening flush to main memory.

[v0.08] Postponed to a future release, were not able to identify common ISV scenarios for this functionality.

Case B: Writes to the same Render Targets across multiple Command Lists

Another common rendering pattern is for the application to render to the same render target(s) across multiple command lists serially, even though the rendering commands are generated in parallel. This design seeks to allow drivers to avoid a flush to main memory on Command List boundaries, when the application knows it will resume rendering on the immediate succeeding Command List.

A design goal is to allow these two passes to be combined in a way that avoids an intervening flush to main memory.

Allow the new APIs to be used with existing drivers

The design seeks to allow new APIs to be run on existing drivers (not necessarily with any performance improvements), to ensure as wide-as-possible an install base for the APIs.

Design allows UMD to choose optimal rendering path without heavy CPU penalty

This feature should be aligned with the low CPU-overhead goals of D3D12, and should be designed in such a way to not significantly impact CPU usage for common rendering workloads (no more than ~20%).

Allow ISVs to verify proper use of the feature even on drivers that aren’t necessarily making behavior changes based on feature use

The debug layer should be able to help identify incorrect use of the feature even when running on a non-supporting driver. (e.g. how the DX11 debug layer clears a resource to a random color in response to a Discard call).


Non-goals


Summary of API changes

A Render Pass is defined by:


Render Pass output bindings

Render Target and Depth Buffer bindings are declared at the start of a Render Pass, directly in the ID3D12GraphicsCommandList::BeginRenderPass call.

The BeginRenderPass call accepts the lists of RTVs/DSV – either are optional, though at least one must be specified. All bindings are via CPU descriptors, and like OMSetRenderTargets they are ‘snapped’ at BeginRenderPass time from their respective (CPU) descriptor heaps.

RTVs/the DSV are not inherited into the Render Pass, they must be set. The RTVs/DSV set in BeginRenderPass the Render Pass are not propagated out to the Command List; they are in an undefined state following the Render Pass (the driver does not need to manually clear these to something well-defined).


Render Pass workloads

A Render Pass is a subset of commands within a Command List, a Command List may contain multiple Render Passes. The boundaries of a Render Pass are declared explicitly on the Command List via the BeginRenderPass/EndRenderPass Command List APIs.

Render Passes may not be nested, and they must be ended within the current Command List (they cannot straddle Command Lists, see below for optimizations designed to enable efficient multi-threaded Render Pass generation).

Writes from within a Render Pass are not ‘valid’ to be read until the end of the Render Pass. This precludes some types of barriers from within the Render Pass (e.g. barriering from RENDER_TARGET to SHADER_RESOURCE on the currently-bound render target), but some other barriers may be allowed. See sub-section below for more details.

The sole exception to the ‘no reads on writes that occurred within the render pass’ are the implicit reads that occur as part of depth-testing and render target blending.

Thus, the list of disallowed APIs within a Render Pass is:

The core runtime will remove the Command List if any of the above APIs are called during Command List recording (the UMD can assume those DDIs will never be called during a render pass).


Resource Barriers within a Render Pass

As stated earlier, an application may not read from or consume writes that occurred within the same render pass. This disallows certain barriers, for example from RENDER_TARGET to SHADER_RESOURCE on the currently-bound render target (and the debug layer will error to that effect). But, that same barrier on a render target that was written outside the current Render Pass is completely valid, as the writes should have all completed ahead of the Render Pass even starting.

So as a (crucial) optimization, given a conformant workload, the UMD is free to assume that the application is not depending on any writes within a Render Pass, and thus it is free to automatically move any barriers encountered in a Render Pass to the beginning of the Render Pass, where they can be coalesced (and not interfere with any tiling/binning operations). This should be valid, as all writes should have been finished by the time the Render Pass starts.

To give a more complete example, if a rendering engine has a ~DX11 resource binding design, and does barriers ‘on demand’ based on how resources are bound, when writing into a UAV near the end of the frame (and will be consumed in the next frame), it may happen to leave the resource in the UNORDERED_ACCESS state at the conclusion of the frame.

In the next frame, when the engine goes to bind the resource as an SRV, it will then find that the resource is not in the correct state, and issue a UNORDERED_ACCESS -> PIXEL_SHADER_RESOURCE barrier. If this barrier occurs within the Render Pass, the UMD is free to assume all writes occurred outside the Render Pass, and thus ‘move’ the barrier up to the start of the Render Pass – even if the UMD can’t prove that (given how UAV writes can’t be tracked with the bindless design we have).

Example non-conformant barriers:

Example conformant barriers:


Resource Access declaration

At BeginRenderPass time, the user must declare all Resources that are serving as RTVs/DSVs within that Render Pass, and specify their beginning/engine ‘access’ characteristics.

   
D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE_DISCARD, D3D12_RENDER_PASS_ENDING_ACCESS_TYPE_DISCARD,
D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE_PRESERVE, D3D12_RENDER_PASS_ENDING_ACCESS_TYPE_PRESERVE,
D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE_CLEAR, D3D12_RENDER_PASS_ENDING_ACCESS_TYPE_RESOLVE,
D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE_NO_ACCESS, D3D12_RENDER_PASS_ENDING_ACCESS_TYPE_NO_ACCESS

The “BEGINING” and “ENDING” enums must both be provided for all resources, and both are provided at BeginRenderPass time.


D3D12_RENDER_PASS_BEGINNING_ACCESS_DISCARD

BEGINNING_DISCARD signifies that the application does not have any dependency on the prior contents of the resource. A given implementation may return the previously-written contents, or it may return uninitialized data. However, reading from the resource must not produce a GPU hang, the read may ‘at worst’ only return undefined data.

A read is defined as a traditional UAV/SRV/CBV/VBV/IBV/IndirectArg binding/read, or as a blend/depth-testing-induced read.


D3D12_RENDER_PASS_BEGINNING_ACCESS_PRESERVE

BEGIN_PRESERVE signifies the application has a dependency on the prior contents of the resource, and the contents must be loaded from main memory.


D3D12_RENDER_PASS_BEGINNING_ACCESS_CLEAR

BEGIN_CLEAR signifies the application has a dependency on the resource being cleared to a specific (app-provided) color. This clear occurs whether or not the resource is interacted with any further in the Render Pass.

The API will allow the application to specify the clear values at BeginRenderPass time, via a D3D12_CLEAR_VALUE struct.


D3D12_RENDER_PASS_BEGINNING_ACCESS_NO_ACCESS

NO_ACCESS signifies the resource will not be read from or written to during the Render Pass. It is most expected to used to denote whether the depth/stencil plane for a DSV is not accessed.

Must be paired with ENDING_ACCESS_NO_ACCESS.


D3D12_RENDER_PASS_BEGINNING_ACCESS_RESUME_WRITING

BEGIN_RESUME_WRITING signifies that the application is resuming writing to a surface that was previously written to in the previous Render Pass and had the ENDING_ACCESS_SUSPEND_WRITING flag, and no intervening GPU work occurred between these two Render Passes.

It is spec’d that the writes in the ‘resuming’ Render Pass occur after the writes in the ‘suspending’ Render Pass.

The intent of this flag is to allow writes to the same Render Target to span multiple command lists, without flushing any on-chip caches.

RESUME_WRITING may resume from a Render Pass in a separate Command List, as long as the suspending/resuming Command Lists are executed (back to back) in the same ExecuteCommandLists group.

[v0.08] – In v0.08, suspend/resume was moved from being a per-view flag and moved to be a general command list flag. Same semantics (suspend/resumes must not have any intervening GPU work, and if they span command lists those command lists must be in the same ExecuteCommandLists call.)


D3D12_RENDER_PASS_BEGINNING_ACCESS_RESUME_LOCAL_READ

BEGINNING_ACCESS_RESUME_LOCAL_READ signifies that the application wants to read from a resource that was previously immediately written to by the GPU (i.e. still potentially in tile cache), and will be read from in a pixel-local fashion (i.e. reads will always be on the exact output pixel, or adjacent pixels).

The goal of this enum (paired with ENDING_ACCESS_SUSPEND_LOCAL_READ) is to allow resources to be read from (when possible) without issuing a flush between two subsequent Render Pass operations.

Specifically, the application warrants:

The kernel size of the read (how many surrounding pixels are needed) are specified at BeginRenderPass time, through the AdditionalWidth/AdditionalHeight fields on D3D12_RENDER_PASS_BEGINNING_ACCESS_RESUME_LOCAL_READ_PARAMETERS. The AdditionalWidth/AdditionalHeight parameters must match those on the previous ENDING_ACCESS_SUSPEND_LOCAL_READ.

[v0.08] Removed from API/DDI, may be added in future release in response to ISV usage scenarios.


D3D12_RENDER_PASS_ENDING_ACCESS_DISCARD

ENDING_ACCESS_DISCARD signifies the application will have no future dependency on the data written to the resource during this Render Pass (may be appropriate for a depth buffer where the depth buffer will never be textured from prior to future writes).


D3D12_RENDER_PASS_ENDING_ACCESS_PRESERVE

ENDING_ACCESS_PRESERVE signifies the application will have a dependency on the written contents of this resource in the future (and they must be preserved).


D3D12_RENDER_PASS_ENDING_ACCESS_RESOLVE

ENDING_ACCESS_RESOLVE allows the application to directly resolve a MSAA surface to a separate resource at the conclusion of the Render Pass (ideally while the MSAA contents are currently in the tile cache, for TBDRs).

The resolve dest is expected to be in the RESOLVE_DEST resource state at the time the Render Pass ends.

The resolve source will be left in its initial resource state at the time the Render Pass ends. Resolve operations sumbmitted by a render pass will not implicitly change the state of any resources.


D3D12_RENDER_PASS_ENDING_ACCESS_NO_ACCESS

ENDING_ACCESS_NO_ACCESS signifies the resource will not be read from or written to during the Render Pass. It is most expected to used to denote whether the depth/stencil plane for a DSV is not accessed.

Must be paired with BEGINNING_ACCESS_NO_ACCESS.


D3D12_RENDER_PASS_ENDING_ACCESS_SUSPEND_WRITING

See D3D12_RENDER_PASS_BEGINNING_ACCESS_RESUME_WRITING.

ENDING_ACCESS_SUSPEND_WRITING signifies that the application will continue writing to the resource in an immediately succeeding Render Pass (with no intervening GPU work).


D3D12_RENDER_PASS_ENDING_ACCESS_SUSPEND_LOCAL_READ

See D3D12_RENDER_PASS_BEGINNING_ACCESS_RESUME_LOCAL_READ.

Signifies the application has written to the resource, and it will read from the resource in the future in a one-to-one (plus optional gutter pixels) fashion (in the pixel shader, only the current pixel plus an optional number of surrounding pixels will be read from). Most importantly, this enum signifies that no GPU operations will occur between the SUSPEND and RESUME.


Render Pass Flags

typedef enum D3D12_RENDER_PASS_FLAGS
{
    D3D12_RENDER_PASS_FLAG_NONE = 0,
    D3D12_RENDER_PASS_FLAG_ALLOW_UAV_WRITES = 0x1,
    D3D12_RENDER_PASS_FLAG_SUSPENDING_PASS = 0x2,
    D3D12_RENDER_PASS_FLAG_RESUMING_PASS = 0x4
} D3D12_RENDER_PASS_FLAGS;

UAV writes within a Render Pass

UAV writes are permitted within a Render Pass, but the user must specifically indicate that they will be issuing UAV writes within the Render Pass with the ALLOW_UAV_WRITES flag, to let UMDs opt out of tiling if necessary.

UAV accesses must follow the earlier “writes are not valid to read until the end of the Render Pass”. Relatedly, UAV barriers are not permitted within a Render Pass.

UAV bindings (via root tables or root descriptors) are inherited into Render Passes, and are propagated out of Render Passes.


Suspend/Resume

In v0.08, applications now suspend/resume an entire render pass. Suspending/Resuming render passes must have identical views/access flags between the passes, and may not have any intervening GPU ops (draws, dispatches, discards, clears, copies, updatetilemappings, writebufferimmediates, queries, query resolves, ..) between the suspending/resuming render passes.

The intended use case is multi-threaded rendering, where say four CLs (each with their own render passes) could target the same render targets. When render passes are suspended/resumed across command lists, the command lists must be executed in the same ExecuteCommandLists call.

A render pass can be both resuming and suspending – in the multi-threaded example, CLs 2 and 3 would be resuming from 1 and 2 respectively, and suspending to 3 and 4 respectively.


Render Pass tiers/CheckFeatureSupport

Applications will be able to query the extent to which a UMD/HW efficiently supports Render Passes.

Though Render Passes will always function given the runtime’s mapping logic, this allows applicatinos (notably 11on12) to determine when it is (possibly) worth their while to issue their commands as Render Passes, and when it is definitely not a benefit (when the runtime is just mapping to the existing API surface).

There will be three tiers of support:

The UMD will report back these tiers to the runtime. The runtime will validate that UMDs which fill out the DDI table at least report back TIER_1, and at the same time will validate that UMDs which do not fill out the DDI table do not claim anything other than TIER_0 support (the runtime will fail device creation in this case).

This requirement will only be present for UMDs supporting a the DDI build version in which this change is made.


Open Issues


API Headers

// Beginning Access
typedef enum D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE
{
    D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE_DISCARD,
    D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE_PRESERVE,
    D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE_CLEAR,
    D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE_NO_ACCESS
} D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE;

typedef struct D3D12_RENDER_PASS_BEGINNING_ACCESS_CLEAR_PARAMETERS
{
    D3D12_CLEAR_VALUE ClearValue;
} D3D12_RENDER_PASS_BEGINNING_ACCESS_CLEAR_PARAMETERS;

typedef struct D3D12_RENDER_PASS_BEGINNING_ACCESS
{
    D3D12_RENDER_PASS_BEGINNING_ACCESS_TYPE Type;
    union
    {
        D3D12_RENDER_PASS_BEGINNING_ACCESS_CLEAR_PARAMETERS Clear;
    };
} D3D12_RENDER_PASS_BEGINNING_ACCESS;

// Ending Access
typedef enum D3D12_RENDER_PASS_ENDING_ACCESS_TYPE
{
    D3D12_RENDER_PASS_ENDING_ACCESS_TYPE_DISCARD,
    D3D12_RENDER_PASS_ENDING_ACCESS_TYPE_PRESERVE,
    D3D12_RENDER_PASS_ENDING_ACCESS_TYPE_RESOLVE,
    D3D12_RENDER_PASS_ENDING_ACCESS_TYPE_NO_ACCESS
} D3D12_RENDER_PASS_ENDING_ACCESS_TYPE;

typedef struct D3D12_RENDER_PASS_ENDING_ACCESS_RESOLVE_SUBRESOURCE_PARAMETERS
{
  UINT SrcSubresource;
  UINT DstSubresource;
  UINT DstX;
  UINT DstY;
  D3D12_RECT* pSrcRect;
} D3D12_RENDER_PASS_ENDING_ACCESS_RESOLVE_SUBRESOURCE_PARAMETERS;

typedef struct D3D12_RENDER_PASS_ENDING_ACCESS_RESOLVE_PARAMETERS
{
    ID3D12Resource* pSrcResource;
    ID3D12Resource* pDstResource;

    // Can be a subset of RT's array slices, but can't target
    // subresources that weren't part of RTV/DSV.
    UINT SubresourceCount;

    const D3D12_RENDER_PASS_ENDING_ACCESS_RESOLVE_SUBRESOURCE_PARAMETERS* pSubresourceParameters;

    DXGI_FORMAT Format;
    D3D12_RESOLVE_MODE ResolveMode;
    BOOL PreserveResolveSource; // New to v0.08
} D3D12_RENDER_PASS_ENDING_ACCESS_RESOLVE_PARAMETERS;

typedef struct D3D12_RENDER_PASS_ENDING_ACCESS
{
    D3D12_RENDER_PASS_ENDING_ACCESS_TYPE Type;
    union
    {
        D3D12_RENDER_PASS_ENDING_ACCESS_RESOLVE_PARAMETERS Resolve;
    };
} D3D12_RENDER_PASS_ENDING_ACCESS;

// Render Target Desc
typedef struct D3D12_RENDER_PASS_RENDER_TARGET_DESC
{
    D3D12_CPU_DESCRIPTOR_HANDLE cpuDescriptor;
    D3D12_RENDER_PASS_BEGINNING_ACCESS BeginningAccess;
    D3D12_RENDER_PASS_ENDING_ACCESS EndingAccess;
} D3D12_RENDER_PASS_RENDER_TARGET_DESC;

// Depth-Stencil Desc
typedef struct D3D12_RENDER_PASS_DEPTH_STENCIL_DESC
{
    D3D12_CPU_DESCRIPTOR_HANDLE cpuDescriptor;
    D3D12_RENDER_PASS_BEGINNING_ACCESS DepthBeginningAccess;
    D3D12_RENDER_PASS_BEGINNING_ACCESS StencilBeginningAccess;
    D3D12_RENDER_PASS_ENDING_ACCESS DepthEndingAccess;
    D3D12_RENDER_PASS_ENDING_ACCESS StencilEndingAccess;
} D3D12_RENDER_PASS_DEPTH_STENCIL_DESC;

// UAV Access Mode
typedef enum D3D12_RENDER_PASS_FLAGS
{
    D3D12_RENDER_PASS_FLAG_NONE = 0x0,
    D3D12_RENDER_PASS_FLAG_ALLOW_UAV_WRITES = 0x1,
    D3D12_RENDER_PASS_FLAG_SUSPENDING_PASS = 0x2,
    D3D12_RENDER_PASS_FLAG_RESUMING_PASS = 0x4
} D3D12_RENDER_PASS_UAV_ACCESS_FLAGS;

DEFINE_ENUM_FLAG_OPERATORS( D3D12_RENDER_PASS_FLAGS );

[uuid(9742FB99-3D7D-4572-977C-CED7D15EB709), object, local, pointer_default(unique)]
interface ID3D12GraphicsCommandList4 : ID3D12GraphicsCommandList3
{
    void BeginRenderPass(
        [annotation("_In_")]  UINT NumRenderTargets,
        [annotation("_In_reads_opt_(NumRenderTargets)")] const D3D12_RENDER_PASS_RENDER_TARGET_DESC* pRenderTargets,
        [annotation("_In_opt_")] const D3D12_RENDER_PASS_DEPTH_STENCIL_DESC* pDepthStencil,
        D3D12_RENDER_PASS_FLAGS  Flags
        );

    void EndRenderPass();
}

Runtime validation

The following validation will be added to the core runtime:

The following will be added to SDK Layers:


HLK Test Plan

11on12 will replace OMSetRenderTargets calls with BeginRenderPass/EndRenderPass as appropriate to test general rendering functionality with Render Passes.

For testing more specific functionality, the following will be exercised via the HLK:


History of changes

Date Version Description
8/7/2019 1.0 Updated public spec to match implementation regarding SetDescriptorHeaps
10/30/2018 0.09 Clarified inconsistency about what resource state render targets and depth buffers are left in after a render pass that ends with a resolve
5/15/2018 0.08 General API/DDI refactoring – runtime support for previous DDI version remains, but will be removed prior to feature complete. Suspend/Resume has been refactored to be at the pass level, rather than per-view. Suspend/Resume_LOCAL has been removed from the API/DDI, will be re-added if more ISV scenarios are identified in a future release. SetDescriptorHeaps no longer allowed during a render pass. Applications can now explicitly specify whether during a resolve the resolve source (e.g. the bound RT/DB) should be preserved or not. Minor API/DDI enum renaming to better follow D3D12 conventions
4/10/2018 0.07 Allow Resource Barriers that obey the “no read dependencies may be taken on a write that occurred during the Render Pass” rule. Add new API/DDI CAP for applications to know whether Render Passes are potentially more optimal on the hardware (e.g. whether the DDI table is even implemented), and for the driver to reflect whether UAV writes within the Render Pass can be implemented efficiently. Remove SOSetTargets from the list of disallowed APIs within a Render Pass, fill in some APIs that were missing from the list (notably copies). One new HLK test case
3/7/2018 0.06 Allow the Resolve-at-EndRenderPass operation to (optionally) target multiple subresources, earlier design assumed just a single subresource per RTV. Changes to D3D12_RENDER_PASS_ENDING_ACCESS_RESOLVE_SUBRESOURCE_PARAMETERS and D3D12_RENDER_PASS_ENDING_ACCESS_RESOLVE_PARAMETERS
2/16/2018 0.05 Address feedback: Remove Resource Barrier integration from BeginRenderPass, feedback is that it is likely not a significant win, and this simplifies the design. Can add back in if potential wins identified once up and running. Change ONE_TO_ONE_READ to LOCAL_READ, where user specifies kernel size for reads.
2/14/2018 0.04 Address feedback: Clear*View APIs are not allowed within a Render Pass. Clarified that Render Targets/Depth Buffer are in an undefined state following a Render Pass, don’t necessarily need to be cleanly unbound by UMD. Open issue if runtime enforces this, similar to how it unbinds RTs at start of Command List. Added following to list of open issues: Should SO be allowed within a Render Pass?
2/13/2018 0.03 Add API headers. Move access specification to sub-resource granularity.Reduce verbosity of UAV bindings.Add “ISV can validate proper usage of feature even if UMD does not support feature” design goal.
01/23/2018 0.02 Design simplification to only allow a single set of writable surfaces per Render Pass.
01/09/2018 0.01 Initial version