v1.00 2021-04-20

Shader Model 6.6 introduces support for quad-based derivative operations to compute and optionally mesh and amplification shaders.

Previously, these operations were unavailable in these shader stages because they require computations on 2x2 quads of adjacent values. Shaders without this explicit structure can now use these operations.

Because they lack render targets,
compute, mesh and amplification shaders
don’t have natural quads as pixel shaders do.
Instead, their quads are determined by the dimensions
of their `numthreads`

attribute.

Derivative operations use values from neighboring threads
in a 2x2 quad (or grid) of threads
operating in lock-step.
In this document,
each thread within a quad is referred to as a lane.
In compute, mesh and amplification shaders,
the lanes that make up the 2x2 quad depends on `numthreads`

.
Where `numthreads`

has an X value divisible by 4
and Y and Z are both 1,
the quad layouts are determined according to 1D quad rules.
Where `numthreads`

X and Y values are divisible by 2,
the quad layouts are determined according to 2D quad rules.
Using derivative operations in
any `numthreads`

configuration not matching either of these
is invalid and will produce an error.

For both layouts, previously-established associations between quads and lane indices remain:

- quadID = WaveGetLaneIndex() / 4;
- quadIndex = WaveGetLaneIndex() % 4;

Note that quadID is meaningless for most purposes except to enforce that all lanes with the same quadID value must share the same quad. The quadIndex follows a Z-ordering beginning in the upper left, proceeding right and then starting over on the next row down:

quadIndex Values |
---|

(0) (1) (2) (3) |

Where only the X dimension is greater than 1,
the 2x2 quad is based on the group index
as provided by the `SV_GroupIndex`

parameter.

- X position in quad from left =
`SV_GroupIndex & 1`

- Y position in quad from top =
`(SV_GroupIndex & 2) >> 1`

The result is that lanes in the group form a sequence of serialized quads. Every four sequential lanes form an individual quad. The first and second lanes in the foursome form the upper left and upper right respectively. The third and fourth lanes form the lower left and lower right respectively. The lane after the fourth, if present represents the upper left of the next quad. This is sometimes called Z-order.

To provide a deterministic mapping between SV_GroupIndex and the lanes within a quad, 1D quad ordering requires that threads are assigned lane indices in the same order within the quad as specified by the SV_GroupIndex parameter. Such that:

For 1D quads: SV_GroupThreadID.x % 4 == SV_GroupIndex % 4 == WaveGetLaneIndex() % 4

For example,
a thread group defined by `[numthreads (32, 1, 1)]`

might contain a single wave containing 32 lanes that make up 8 quads.
The layout for the first quad would be as follows where (##)
represents the group index in decimal:

Quad 0 |
---|

(00) (01) (02) (03) |

And a later quad would be:

Quad n |
---|

(16) (17) (18) (19) |

Where the X and Y dimensions are divisible by 2,
the 2x2 quad is based on the x and y components
of group thread ID
as provided by the `SV_GroupThreadID`

parameter.

- X position in quad from left =
`SV_GroupThreadID.x & 1`

- Y position in quad from top =
`SV_GroupThreadID.y & 1`

As a result, for every even x and y value within the thread group, a quad is made up of the values (x,y) , (x+1,y), (x, y+1), (x+1, y+1). There is no defined assignment of any quad within any wave and no relation between the values of SV_GroupIndex and the return value of WaveGetLaneIndex() should be assumed.

While existing compute Quad* operations required the same association with lane indices, no mapping of quad lanes to SV_GroupThreadID was required. This new mapping is not required for pre 6.6 shader models.

For example,
a thread group defined by `[numthreads (8,4,1)]`

might contain a single wave containing 32 lanes that make up 8 quads.
The layout for the first quad would be as follows where (#,#)
represents the corresponding(GroupThreadID.x, GroupThreadID.y) values:

Quad 0 |
---|

(0,0) (1,0) (0,1) (1,1) |

Quad n |
---|

(6,2) (7,2) (6,3) (7,3) |

The functions added operate exactly as their existing counterparts with the exception of using the local quads as specified above.

These functions calculate the derivative in the x or y direction using coarse or fine calculations.

These functions take a varying
`value`

of type `T`

.
The return value type must also be `T`

.

```
T ddx(in T value)
T ddx_coarse(in T value)
T ddy(in T value)
T ddy_coarse(in T value)
T ddx_fine(in T value)
T ddy_fine(in T value)
```

Having the ability to calculate the derivatives as above also allows the calculation of level of detail(LOD) values and also enables the standard sampling operations that depend on LOD calculations. Previously, only sample operations that didn’t require derivative calculations were available.

Return type `R`

is dependent on the texture content type.
`F`

and `I`

are float and integer values whose dimensions
depend on the dimensions of the texture type.

```
float TexObject::CalculateLevelOfDetail( in SamplerState sampler_state, in F pos )
float TexObject::CalculateLevelOfDetailUnclamped( in SamplerState sampler_state, in F pos )
R TexObject::Sample( in SamplerState sampler_state, in F location, in [I Offset])
R TexObject::SampleBias( in SamplerState sampler_state, in F location, float Bias, [I Offset])
float TexObject::SampleCmp( in SamplerComparisonState S, F location, float compare_value, [int Offset])
```

These functions enable the reading of
varying values from other lanes of the current quad
using explicit indices (`QuadReadLaneAt`

)
or from a position relative to the current lane.
Unlike other entries here,
these must be supported on Shader Model 6.0.

These functions take a varying
`value`

of type `T`

.
The return value type must also be `T`

.

```
T QuadReadLaneAt( in T value, uint index)
T QuadReadAcrossDiagonal( in T value)
T QuadReadAcrossX( in T value)
T QuadReadAcrossY( in T value)
```

Derivative and derivative-dependent texture sample operations must be supported
in compute shaders
on devices that support `D3D_SHADER_MODEL_6_6`

.

Derivative and derivative-dependent texture sample operations must be supported
in amplification and mesh shaders
on devices that support `D3D_SHADER_MODEL_6_6`

and have the `DerivativesInMeshAndAmplificationShadersSupported`

capability.

The Quad Read Functions must be supported
on devices that support `D3D_SHADER_MODEL_6_0`

and report support for the `WaveOps`

capability.

Applications can query the availability
of the texture sample operations listed here
in mesh and amplification shaders
using `ID3D12Device::CheckFeatureSupport()`

passing `D3D12_FEATURE_D3D12_OPTIONS9`

as the `Feature`

parameter
and retrieving the `pFeatureSupportData`

parameter
as a struct of type `D3D12_FEATURE_DATA_D3D12_OPTIONS9`

.
The relevant part of this struct is defined below.

```
typedef enum D3D12_FEATURE {
...
D3D12_FEATURE_D3D12_OPTIONS9
} D3D12_FEATURE;
typedef struct D3D12_FEATURE_DATA_D3D12_OPTIONS9 {
...
BOOL DerivativesInMeshAndAmplificationShadersSupported;
} D3D12_FEATURE_DATA_D3D12_OPTIONS9;
```

`DerivativesInMeshAndAmplificationShadersSupported`

is a boolean that specifies
whether the Texture Sample Methods
are supported in the mesh and amplification shader stages.

- What restrictions should be placed on work group size?
- RESOLVED: The restrictions are principally on the wave size, which are not entirely under the control of the user. Mention is made of how work group size could impact the availability of active lanes for each quad, but details are left out.

- Should QuadRead* functions be included?
- RESOLVED: Yes. They were already available in these shader stages, but the definition of the quads they depend on wasn’t well specified.

- What device capabilities are required?
- RESOLVED: Any device that supports Shader Model 6.6 should be able to support derivative and sample operations in compute shaders and quad operations in all shaders. Derivative and sample operations in amplification and mesh shaders are supported if the appropriate capability bit is true. Quad Read functions should be supported on Shader Model 6.0

- How should the quads be ordered?
- RESOLVED: Implementation dependent. This spec takes no position on the ordering of quads within a group. This gives the most flexibility to implementations possible while still presenting a consistent ordering for threads within a single quad.

Version | Date | Description |
---|---|---|

1.00 | 20 Apr 2021 | Minor Edits for Publication |

0.11 | 25 Jan 2021 | Constrain linear quad mode to 1D NumThreads. Introduce new 2D quad mapping |

0.10 | 11 Jan 2021 | Switch to OPTIONS9 |

0.9 | 01 Dec 2020 | Revert to basing quads on group index |

0.8 | 24 Apr 2020 | Include ddx/ddy in the cap bit and rename accordingly. |

0.7 | 13 Apr 2020 | Use Wave Index to define the makeup of the quads |

0.6 | 09 Apr 2020 | Add Capability bit for Amplification/Mesh Shaders |

0.5 | 16 Mar 2020 | Clarify ascii diagram |

0.4 | 05 Mar 2020 | Respond to feedback. Spelling and caps, validation error |

0.3 | 02 Mar 2020 | Add ascii art for quads |

0.2 | 21 Feb 2020 | Simplify function descriptions. Expand on quad description. |

0.1 | 19 Feb 2020 | Initial draft |