0053 - Allow Early Pixel Culling
| Status | Under Consideration |
|---|---|
| Author | |
| Sponsor |
Introduction
This proposal introduces a new [allowearlypixelculling] attribute for HLSL
Pixel Shaders that enables implementations to perform early pixel culling
optimizations for shaders with side effects. Unlike the existing
[earlydepthstencil] attribute which provides deterministic early
depth/stencil testing, this new attribute allows non-deterministic culling
where occluded pixels may be skipped before shader execution. This is
particularly useful for shaders that use UAV writes or sampler feedback with
features like discard, SV_Depth, or coverage modification that prevent use
of [earlydepthstencil]. The attribute affects HLSL compilation, DXIL
encoding, and SPIR-V generation, enabling hardware-specific optimizations at
the compiler and driver level.
Motivation
Current HLSL implementations can perform early depth/stencil culling
optimizations to skip Pixel Shader invocations for occluded pixels. However,
when a Pixel Shader declares UAV access or sampler feedback operations, these
optimizations must be disabled to maintain deterministic behavior. This is
because the specification requires predictable control over Pixel Shader
invocation counts when external effects are present. The existing
[earlydepthstencil] attribute provides deterministic early testing but
cannot be used with shaders that modify depth, use discard, or modify
coverage.
Many real-world rendering scenarios do not require deterministic UAV write counts from occluded pixels. Virtual texture feedback systems only need to know which texture tiles are visible, not how many occluded fragments reference them. Sampler feedback streaming systems benefit from knowing which mip levels are needed for visible surfaces, but occluded geometry loading textures wastes bandwidth. Alpha-tested foliage rendering with UAV-based effects could benefit from culling without requiring expensive depth pre-passes.
Without a mechanism to express that non-deterministic culling is acceptable, developers must choose between suboptimal performance (all occluded pixels execute) or complex workarounds like multiple shader variants or manual depth pre-passes. A shader-level annotation allows the compiler and driver to enable early culling optimizations where appropriate while maintaining backwards compatibility for shaders requiring deterministic behavior.
Proposed solution
This proposal adds a new [allowearlypixelculling] attribute for HLSL Pixel
Shaders. When applied, implementations may perform early pixel culling
optimizations that skip shader execution for some occluded pixels, even when
the shader has side effects like UAV writes. The exact number and selection of
culled pixels is implementation-dependent, and applications opt into this
non-determinism explicitly.
The attribute enables use of shader features incompatible with
[earlydepthstencil], including discard, SV_Depth, SV_DepthGreaterEqual,
SV_DepthLessEqual, SV_Coverage, and SV_StencilRef. However, these two
attributes are mutually exclusive and cannot be combined on the same shader.
Feature Compatibility
The following table shows which Pixel Shader features are compatible with each early optimization attribute:
| Feature | [earlydepthstencil] | [allowearlypixelculling] |
|---|---|---|
discard | ✗ | ✓ |
SV_DepthGreaterEqual | ✗ | ✓ |
SV_DepthLessEqual | ✗ | ✓ |
SV_Depth | ✗ | ✓ |
SV_Coverage (output) | ✗ | ✓ |
SV_StencilRef | ✗ | ✓ |
| UAV Writes | ✓ (Deterministic) | ✓ (Non-Deterministic) |
| SamplerFeedback Writes | ✓ (Deterministic) | ✓ (Non-Deterministic) |
Developer’s note: When
[allowearlypixelculling]is used, implementations may perform early culling to skip some occluded pixels before shader execution. Depth/stencil writes occur based on the final shader outputs after the shader completes. For pixels that execute, if the shader discards the pixel, no depth/stencil write occurs; if the shader does not discard, depth/stencil testing and writes proceed normally. The only difference from the default mode is that some occluded pixels may be culled early and never execute the shader, which means their UAV/SamplerFeedback writes will not occur.
Developer’s note: Conservative depth outputs (
SV_DepthGreaterEqual/SV_DepthLessEqual) enable more effective early culling because they provide depth bounds guarantees that allow more aggressive optimization. Shaders that outputSV_Depthare unlikely to benefit from early culling optimizations, but the attribute is still allowed.
Example Usage
The following example demonstrates a Pixel Shader that uses discard for
alpha testing and writes to both UAVs and sampler feedback. This shader must
run in late depth mode due to the discard, but implementations may perform
early culling, meaning some occluded pixels may not execute:
RWTexture2D<float4> UAV0 : register(u0);
RWStructuredBuffer<uint> UAV1 : register(u1);
Texture2D<float4> TEX0 : register(t0);
SamplerState SMP0 : register(s0);
FeedbackTexture2D<SAMPLER_FEEDBACK_MIP_REGION_USED> FB0 : register(u2);
struct PSInput
{
uint2 xy : COORD;
};
[allowearlypixelculling]
float4 PSMain(in PSInput input) : SV_Target
{
float4 value = UAV0.Load(input.xy);
// Discard pixels based on loaded value
if (value.x < 0.0 || value.y < 0.0)
discard;
// UAV writes may be skipped for occluded pixels due to early culling
InterlockedAdd(UAV1[0], (uint)(value.x * value.y));
// Sampler feedback writes may be skipped for occluded pixels
FB0.WriteSamplerFeedback(TEX0, SMP0, value.xy);
return saturate(value);
}
Detailed design
HLSL Additions
A new attribute [allowearlypixelculling] is added to HLSL for Pixel Shader
entry points. The attribute has no parameters.
Syntax
[allowearlypixelculling]
ReturnType EntryPointName(Parameters) : SemanticList
{
// Pixel Shader body
}
Semantic Rules
- The attribute is valid only on Pixel Shader entry points
- The attribute is mutually exclusive with
[earlydepthstencil] - When present, the shader may use any combination of:
discardstatementSV_DepthoutputSV_DepthGreaterEqualoutputSV_DepthLessEqualoutputSV_CoverageoutputSV_StencilRefoutput- UAV reads and writes
- Sampler feedback operations
- The attribute requires Shader Model 6.X or later
Compilation Errors
The compiler shall issue an error if:
- The attribute is applied to a non-Pixel Shader entry point
- The attribute is combined with
[earlydepthstencil]on the same entry point - The attribute is used with a Shader Model earlier than 6.X
Non-Determinism Model
The presence of [allowearlypixelculling] relaxes the determinism guarantee
for Pixel Shader invocations. Implementations may skip execution for an
unspecified subset of occluded pixels, from none to all. The exact selection
is implementation-dependent and may vary between runs, hardware, or driver
versions.
For pixels that do execute:
- Depth/stencil testing occurs based on final shader outputs
- If the shader discards the pixel, no depth/stencil write occurs
- If the shader does not discard, depth/stencil testing and writes proceed normally
- All shader outputs and side effects occur as specified
For pixels that are culled early:
- The shader does not execute
- No UAV writes or sampler feedback operations occur
- No depth/stencil test or write occurs
This is analogous to existing non-determinism in vertex shader invocation counts due to vertex caching optimizations.
Author’s note: This attribute addresses non-determinism in whether a Pixel Shader executes due to occlusion. This is distinct from Rasterizer Ordered Views (ROVs), which address non-determinism in the ordering of concurrent UAV writes from overlapping pixels.
DXIL Changes
The [allowearlypixelculling] attribute is encoded in DXIL using the same
mechanism as the existing [earlydepthstencil] attribute. A shader flag bit
is added to the DXIL shader flags metadata to indicate the presence of this
attribute.
Shader Flag
A new shader flag bit AllowEarlyPixelCulling is added to the DXIL shader
flags enumeration. This flag is set when the [allowearlypixelculling]
attribute is present on the Pixel Shader entry point.
The flag is mutually exclusive with the ForceEarlyDepthStencil flag. DXIL
with both flags set is invalid.
Metadata Encoding
The shader flag appears in the DXIL module metadata:
!dx.shaderModel = !{!N}
!N = !{!"ps", i32 6, i32 X}
!dx.entryPoints = !{!M}
!M = !{void ()* @main, !"main", null, null, !O}
!O = !{i32 8, i32 P} ; Tag 8 = shader flags
; P includes AllowEarlyPixelCulling bit
The exact bit position shall be defined in the DXIL specification following established conventions for shader flag bits.
Shader Model Requirement
This flag is valid only in Shader Model 6.X and later. The DXIL validator shall reject modules with this flag in earlier shader models.
SPIR-V Changes
The [allowearlypixelculling] attribute maps to SPIR-V using the same
mechanism as the existing [earlydepthstencil] attribute.
Execution Mode
For Vulkan targets, the attribute is expressed using an appropriate SPIR-V execution mode on the fragment shader entry point. The specific execution mode shall follow the same pattern used for early depth/stencil testing.
If [earlydepthstencil] maps to ExecutionMode EarlyFragmentTests, then
[allowearlypixelculling] requires a corresponding execution mode or extension
to express non-deterministic early culling semantics.
Vulkan Extension Considerations
If no standard SPIR-V execution mode exists to express non-deterministic early culling, a vendor extension may be required. The extension should:
- Define an execution mode for fragment shaders
- Be compatible with depth output, discard, and coverage modification
- Indicate that fragment invocation counts are implementation-dependent
- Be supported by implementations that can benefit from early culling
Implementations that do not support the extension or optimization may execute all pixels deterministically with no functional difference.
Validation Changes
Compiler Validation
The DXC compiler shall validate:
[allowearlypixelculling]is only applied to Pixel Shader entry points[allowearlypixelculling]and[earlydepthstencil]are not combined- The Shader Model is 6.X or later when the attribute is used
Error messages shall clearly indicate the constraint violation.
DXIL Validator
The DXIL validator shall enforce:
- The
AllowEarlyPixelCullingshader flag is present only in Pixel Shaders - The
AllowEarlyPixelCullingandForceEarlyDepthStencilflags are not both set - The flag is present only in Shader Model 6.X or later
- The flag is properly encoded in shader metadata
Validation failures shall produce clear error messages indicating the specific constraint that was violated.
SPIR-V Validation
SPIR-V validation shall ensure:
- The execution mode is applied only to fragment shader entry points
- The execution mode is compatible with any Vulkan extensions in use
- Required capabilities or extensions are declared
Testing
Correct DXIL and SPIR-V codegen will be verified through DXC unit tests that
compile shaders with the [allowearlypixelculling] attribute and verify the
presence of the correct shader flag or execution mode in the generated code.
Tests will cover shaders using various combinations of features (discard,
depth outputs, UAV writes, sampler feedback) to ensure the flag is correctly
propagated. Validation errors will be tested through filecheck-based unit
tests that verify appropriate errors are produced when the attribute is
misused (wrong shader stage, combined with [earlydepthstencil], unsupported
shader model). These tests will ensure clear error messages guide developers
to correct usage. Codegen tests will verify that the compiler correctly
encodes the attribute in both DXIL metadata and SPIR-V execution modes,
following the established patterns for [earlydepthstencil].
Alternatives considered
An alternative approach would add a D3D12_PIPELINE_STATE_FLAG that enables
early pixel culling at PSO creation time rather than via a shader attribute.
This would allow the same shader to be used with different culling behaviors
in different render passes (depth pre-pass vs. main pass). However, this
approach has several drawbacks: it moves the decision from shader compile time
to runtime, reducing opportunities for compiler optimizations; it creates
ambiguity about when the optimization is active; and it complicates the shader
contract by making behavior dependent on external PSO state. The attribute
approach provides a clear, explicit shader-level contract that the compiler
can reason about during optimization, similar to how [earlydepthstencil]
works today. A hybrid approach where both the attribute and a PSO flag are
required could be considered in future work if runtime control proves
necessary.