0048 - Group Wave Index
| Status | Under Consideration |
|---|---|
| Authors | |
| Sponsor |
- Planned Version: Shader Model 6.10
- Issues: #645
Introduction
The proposal is for a new shader construct:
SV_GroupWaveIndex: the index of the wave in the thread group
Motivation
Compute, Amplification and Mesh shader workloads consist of some number of thread groups, with each thread group containing some number of waves and there being a number of threads in the wave. Certain algorithms can be accelerated by specializing work done by individual waves in a thread group.
Currently, developers on PC cannot efficiently determine which wave they’re in
within a thread group without resorting to unsafe workarounds like
uint groupWaveIndex = SVGroupIndex / WaveGetLaneCount(), which is not
guaranteed to be correct across all hardware implementations and thread group
dimensions. This forces developers to either write divergent code paths for
different platforms or use slightly less efficient
alternatives involving atomic operations on thread local memory for example.
Proposed solution
This proposal introduces SV_GroupWaveIndex, a new system-value semantic that
provides the index of the current wave within the thread group. This value
ranges from 0 to N-1, where N is the number of waves in the thread group.
Example Usage
#define TILE_SIZE 16
[numthreads(TILE_SIZE, TILE_SIZE, 1)]
void ComputeMinMaxZ(
uint2 tileID : SV_GroupID,
uint waveIndex : SV_GroupWaveIndex)
{
float z = LoadDepth(tileID, waveIndex);
float minZ = WaveActiveMin(z);
float maxZ = WaveActiveMax(z);
// Collaborate between waves using wave index
if (waveIndex == 0)
{
// First wave performs final reduction
GroupMemoryBarrierWithGroupSync();
// ... combine results from all waves
}
}
This solution enables:
- Wave-level collaboration: Different waves within a thread group can perform different tasks and coordinate their work efficiently.
- Portable code: A single code path works across all wave sizes without
conditionals on
WaveGetLaneCount().
Detailed design
HLSL Additions
Grammar
The SV_GroupWaveIndex semantic is added as a new system-value semantic:
SV_GroupWaveIndex : uint
This semantic can be applied to:
- Function parameters in applicable shader entry points
- Input structures for applicable shader entry points
Shader Stage Compatibility
SV_GroupWaveIndex is valid in compute, mesh, and amplification shaders.
Using this semantic in any other shader stage will result in a compilation
error.
Type Requirements
The SV_GroupWaveIndex semantic must be applied to a uint type. Using it
with any other type will result in a compilation error.
Value Range
The value of SV_GroupWaveIndex is in the range [0, N-1], where N is the total
number of waves in the thread group. The number of waves N is determined by:
N = ceil((numthreads.x * numthreads.y * numthreads.z) / WaveSize)
Where numthreads is specified by the [numthreads(x, y, z)] attribute and
WaveSize is the actual wave size used for shader execution i.e. what is
returned from WaveGetLaneCount.
Wave Ordering
The distribution of SV_GroupWaveIndex values to threads is up to the
implementation on how best to launch waves for a given the shader and the
target hardware requirements. However, the presence of SV_GroupWaveIndex must
not cause the implementation to violate existing wave shape requirements such
as SM 6.6’s requirement to have 4 consecutive threads form a 2x2 quad in
cases where the 2D thread group dimensions are divisible by 2.
Interaction with Other Semantics
SV_GroupWaveIndex can be freely combined with other compute shader semantics:
SV_GroupID: The 3D index of the thread groupSV_GroupThreadID: The 3D index of the thread within the groupSV_GroupIndex: The flattened linear index of the thread within the groupSV_DispatchThreadID: The global thread ID across the dispatch
Example:
[numthreads(256, 1, 1)]
void CSMain(
uint3 groupID : SV_GroupID,
uint3 groupThreadID : SV_GroupThreadID,
uint groupIndex : SV_GroupIndex,
uint waveIndex : SV_GroupWaveIndex)
{
// All semantics can be used together
}
Source Code Compatibility
This feature is purely additive and has no impact on existing HLSL source code.
Existing shaders that do not use SV_GroupWaveIndex are unaffected.
Interchange Format Additions
DXIL Additions
A new DXIL intrinsic is added to represent SV_GroupWaveIndex:
uint dx.op.groupWaveIndex(i32) ; returns the wave index within the group
- Opcode: (to be assigned during implementation)
- Operand: i32 opcode constant
- Return Type: i32 (unsigned)
- Return Value: The index of the current wave within the thread group [0, N-1]
The intrinsic is lowered to hardware-specific instructions during backend compilation. The value is typically derived from hardware thread group and wave information available in the shader execution environment.
Metadata
No new metadata is required. The use of SV_GroupWaveIndex is indicated by the
presence of the dx.op.groupWaveIndex intrinsic in the shader.
Diagnostic Changes
New Errors
The following new compilation errors are introduced:
Invalid Shader Stage
- Error:
error: SV_GroupWaveIndex is only valid in compute, mesh, and amplification shaders - Occurs when:
SV_GroupWaveIndexis used in other shader stages
- Error:
Invalid Type
- Error:
error: SV_GroupWaveIndex must be applied to a 'uint' type - Occurs when: The semantic is applied to a type other than
uint
- Error:
Invalid Semantic Usage
- Error:
error: system-value semantics cannot be used as outputs - Occurs when:
SV_GroupWaveIndexis used on an output parameter or return value
- Error:
Validation Changes
DXIL validation is updated to verify:
Shader Model Check: The
dx.op.groupWaveIndexintrinsic is only valid in Shader Model 6.10 or later.Shader Stage Check: The intrinsic only appears in compute, amplification and mesh shaders.
Well-formed Usage: The intrinsic is called with the correct signature (single i32 opcode operand, returns i32).
Runtime Additions
Runtime Information
No additional runtime information needs to be communicated beyond what is
already provided in the PSV0 (Pipeline State Validation) data. The runtime does
not need to know whether a shader uses SV_GroupWaveIndex.
Device Capability
Shader Model Requirement: Shader Model 6.10 or later.
Hardware Support: As a required feature of SM 6.10, all devices supporting
this shader model must provide correct SV_GroupWaveIndex values.
Interaction with Wave Size:
- Works with fixed wave sizes (specified via
[WaveSize(N)]attribute) - Works with wave size ranges (specified via
[WaveSize(min, max)]or[WaveSize(min, max, preferred)]) - Works with hardware-default wave sizes
Emulation:
Implementations could theoretically emulate this by using group shared memory e.g:
groupshared uint g_waveId = 0;
...
uint groupWaveId = 0;
if(WaveIsFirstLane())
{
InterlockedAdd(g_waveId, 1, groupWaveId);
}
groupWaveId = WaveReadLaneAt(groupWaveId, 0);
Testing
Compiler Testing
DXIL Generation:
- Verify that
SV_GroupWaveIndexgenerates the correctdx.op.groupWaveIndexintrinsic call - Test with various thread group sizes and wave size specifications
- Confirm correct behavior when combined with other compute shader semantics
Diagnostic Testing
Error Conditions:
- Verify error when used in invalid shader stages
- Verify error when applied to non-uint types (int, float, uint2, etc.)
- Verify error when used as an output semantic
- Verify error in shader models earlier than 6.10
Validation Testing
DXIL Validator:
- Confirm validation failure when intrinsic appears in invalid shader stages
- Confirm validation failure when intrinsic appears in pre-6.10 shader models
- Verify acceptance of valid usage patterns
Execution Testing
Correctness Tests:
- Test with various thread group sizes: small (8), medium (256), large (1024)
- Test with different wave sizes: 4, 8, 16, 32, 64, 128
Wave Size Interaction:
- Test with fixed wave sizes using
[WaveSize(N)] - Test with wave size ranges using
[WaveSize(min, max)] - Test with wave size ranges with preferred size
Multi-Wave Collaboration:
- Test shaders where different waves perform different work
- Verify correct synchronization using
GroupMemoryBarrierbetween waves - Test scenarios with group shared memory accessed by different waves
Edge Cases:
- Single wave per thread group (numthreads ≤ wave size)
- Maximum waves per thread group
- Non-power-of-2 thread group sizes
- 1D, 2D, and 3D thread group configurations