HLSL Specifications

INF-0006 - Long Vector Execution Test Plan

StatusAccepted
Author
Sponsor
  • Impacted Projects: DXC

Introduction

This test plan covers testing all HLSL intrinsics that can take long vectors as parameters. And more specifically, it only covers testing scenarios which will get coverage from a graphics driver supporting DXIL.

These tests will verify that all DXIL opcodes and LLVM instructions which can be reached using valid HLSL in SM 6.9 can compile, run, and produce correct output when given long and native vector inputs. They will not verify that the generated DXIL is vectorized.

All tests are to be included in the HLK test binary which ships with the OS. This test binary is only built in the OS repo and based off of the ExecutionTests source code in the DXC repo. There is a script in the WinTools repo which generates and annotates the HLK tests.

We break coverage down into five test categories.

  1. Implement DXIL OpCode tests:

    • At the bottom of this document there are tables containing all HLSL operators (more on those in ‘3. HLSL Operator Tests’) and HLSL intrinsics that can be used with long vectors. The HLSL intrinsics tables have a DXIL OpCode and LLVM instruction columns. These columns contain the intrinsic’s mapped DXIL OpCodes as well as their LLVM instructions. All intrinsics have at least one DXIL OpCode or one LLVM instruction.

      Many intrinsics have trivial mappings. Atan is an example of an intrinsic with a trivial mapping. Other intrinsics have multiple DXIL OpCodes. Some intrinsics will use all listed DXIL OpCodes and/or LLVM instructions, while others will have additional logic which determines which OpCodes/Instructions are used. If an intrinsic relies on additional logic to determine which OpCodes/Instructions are used then the OpCode/Instructions will be enclosed in ‘[]’ brackets. The sign intrinsic is an example of an intrinsic with additional logic. If an OpCode/Instruction is not enclosed in ‘[]’ then it is used in all paths for that intrinsic.

  2. Implement LLVM Instruction tests:

    • These are the test cases for the LLVM Instructions listed in the table at the bottom of this document.
    • Because we will use HLSL intrinsics to get coverage for the DXIL OpCode tests we speculate that we will get most of the coverage needed for the LLVM Instruction tests. After implementing the DXIL OpCode tests we should be able to do a coverage audit and ammend test cases, or write simple additional ones, as needed.
    • Just as in ‘1. Implement DXIL OpCode Tests’ some cases have multiple instructions listed. ‘[]’ brackets are used in the same manner. And there may also be multiple instructions.
    • Additional OpCodes/Instructions are logic based (i.e float or int specific).
  3. HLSL Operator tests:

    • HLSL Operators Table in this document lists the HLSL Operators which can take long vectors as arguments.
    • Many of these operators can and will get coverage by default in the DXIL OpCode tests. But we will audit coverage and ammend test case, or write simple additional ones, as needed.
  4. Standard loading and storing of long vectors

    • These could be covered in test categories 1 and 2. But I propose we break out individual cases to ensure that we have more granular coverage.
    • Ensure we have some basic tests doing standard loading/storing of long vectors across Buffer types to test and Vector element data types to test.
    • Additionally, the above buffer types and data types should be tested by loading from a ResourceDescriptorHeap
  5. ‘Creative’ test cases:

Buffer types to test

  • Raw Buffers (Byte Address Buffers)
  • Structured Buffers (StructuredBuffer<T>)

Vector element data types to test

Testing will cover the following vector element data types:

  • bool, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float16_t, float32_t, float64_t, packed_int16_t, and packed_uint16_t.

Vector sizes and alignments to test

General sizes to test are in the range [3, 1024]. It is worth noting that the new form of rawBufferLoad will be updated to vectorize sizes < 5.

Test sizes

  • vector<TYPE, 3> : Testing one below previous vector limit. Early testing found some issues here so it was added.
  • vector<TYPE, 4> : Previous limit.
  • vector<TYPE, 5> : Testing one above previous vector limit.
  • vector<TYPE, 16> : This size of ‘vector’ previously only appeared as matrices.
  • vector<TYPE, 17> : Larger than any vector previously possible.
  • vector<TYPE, 35> : Arbitrarily picked.
  • vector<TYPE, 100> : Arbitrarily picked.
  • vector<TYPE, 256> : Arbitrarily picked.
  • vector<TYPE, 1024> : The new max size of a vector.
  • These sizes will be tested across Vector element data types to test

Some noteable alignment cases

  • 128 bit boundaries : Memory access for Shader Model 5.0 and earlier operate on 128-bit slots aligned on 128-bit boundaries. An example is vector<half, 7>, vector<half, 8> and vector<half, 9>. 112 bits, 128 bits, and 144 bits respectively. This boundary will tested for with 32-bit and 64-bit sized values as well.

  • Most GPUs operate on at least 32-bits at once, so what happens if you use 16-bit values and an odd number of elements. Could accessing the last element expose issues where we could overwrite the next variable if it is assuming alignment?

High level test design

  1. The test will leverage the existing XML infrastructure currently used by the existing execution tests. There are two XML files. This general design pattern exists today in the execution tests.

    • 1st XML: Used to define shader source code and metadata about that shader code. This XML file is parsed using a private class. This private class helps facilitate creation of D3D resources and execution of the shader.
    • 2nd XML: Describes metadata about the specific test cases. Used by the TAEF infrastructure for TAEF Data Driven Testing
  2. Test inputs will be hard coded in a c++ header file. This was chosen over definining inputs in the second XML as this is cleaner and easier to parse for different data types. This c++ header method also avoids needing to repeat the data set in the XML for each individual test cast. Inputs will use ‘value sets’ which will typically be much smaller than the desired vector test size. Values will be repeated cyclically until the vector is full. For example, a value set {1, 2, 3} used to populate a vector<int, 1024> will produce the pattern <1, 2, 3, 1, 2, 3, ...>, repeating the sequence until all 1024 elements are filled. This approach provides predictable test data while keeping input definitions manageable.

  3. Expected outputs are computed for each test case at run time.

  4. All new long vector test code is factored out into its own files.

Implementation phases

Do the test work in two simple phases.

  1. Implement and validate (locally against WARP) for all test categories.
  2. HLK related work:
  • Add a SM 6.9 HLK requirement. Includes updating the HLK requirements doc.
  • Update mm_annotate_shader_op_arith_table.py to annotate the new test cases with HLK GUIDS and requirements
  • Add new tests to HLK playlist

Shipping

Note that because DXC and the Agility SDK are both undocked from Windows it is our normal operating behavior for the HLK tests to become available with a later TBD OS release. The good news is that this doesn’t prevent the tests from being available much earlier in the DXC repo. It just means that they are simply TAEF tests in the DXC repo. An HLK test includes an extra level of infrastructure for test gating, selection, and result submission for WHQL signing of drivers.

  1. Tests will be shared privately with IHVs along with the latest DXC and latest Agility SDK for testing and validation. IHVs will also be able to build and run the tests from the public DXC repo themselves. If needed Microsoft can share further instructions when the tests are available.

  2. The tests will ship with the HLK at a TBD date in a later OS release.

Test Validation Requirements

The following statements must be true and validated for this work to be considered completed.

  • All new test cases pass when run locally against a WARP device
  • All new test cases must verify applicable outputs for correctness.
  • All new test cases are confirmed to be present in HLK Studio and selectable to be run when a target device satisfies the HLK ShaderModel 6.9 requirement.
  • All new tests/test cases are added to the official WHQL HLK playlist for the OS release that the HLK tests will ship with.
  • Tests will be annoated to show which DXIL OpCode, LLVM Instructions, and HLSL operators they are intended to get coverage for.

Notes

  • Private test binaries/collateral will be shared with IHVs for validation purposes. This will enable IHVs to verify long vector functionality without waiting for an OS/HLK release.

HLSL-Operators

✅ - Means there was an explicit test case implemented for the intrinsic. ☑️ - Means the intrinsic gets coverage via other intrinsics. For example ’exp2’ just uses the DXIL Opcode for Exp.

HLSL Operators

These operators generate LLVM instructions which use vectors.

Operator table from Microsoft HLSL Operators

CompletedOperator NameOperatorNotes
Addition+
Subtraction-
Multiplication*
Additive and Multiplicative Operators+, -, *, /, %
Array Operator[i]llvm:ExtractElementInst OR llvm:InsertElemtInst
Assignment Operators=, +=, -=, *=, /=, %=
Bitwise Operators~, «, », &, |, ^, «=, »=, &=, |=, ^=Only valid on int and uint vectors
Boolean Math Operators& &, || , ?:
Cast Operator(type)No direct operator, difference in GetElementPointer or load type
Comparison Operators<, >, ==, !=, <=, >=
Prefix or Postfix Operators++, –
Unary Operators!, -, +

Mappings of HLSL Intrinsics to DXIL OpCodes or LLVM Instructions

Trigonometry

CompletedIntrinsicDXIL OpCodeLLVM InstructionBasic Op TypeNotes
acosAcosUnaryrange: -1 to 1
asinAsinUnaryrange: -pi/2 to pi/2. Floating point types only.
atanAtanUnaryrange: -pi/2 to pi/2.
cosCosUnaryno range requirements.
coshHcosUnaryno range requirements.
sinSinUnaryno range requirements.
sinhHsinUnaryno range requirements.
tanTanUnaryno range requirements.
tanhHtanUnaryno range requirements.
atan2AtanFDiv, FAdd, FSub, FCmpOLT, FCmpOEQ, FCmpOGE, FCmpOLT, And, SelectUnaryNot required.

Math

CompletedIntrinsicDXIL OpCodeLLVM InstructionBasic Op TypeNotes
abs[Imax], [Fabs]UnaryImax for ints. Fabs for floats.
ceilRound_piUnary
expExpUnary
floorRound_niUnary
fmaFmaTernaryAll three inputs are of the same type. Any inputs that are long vectors must have the same number of dimensions.
fracrcUnary
frexpFCmpUNE, SExt, BitCast, And, Add, AShr, SIToFP, Store, And, OrUnaryHas a return value in addition to an output parameter.
☑️ldexpExpFMulBinaryNot required. Covered by floating point multiplication and exp.
☑️lerpFSub, FMul, FAddTernaryNot required. FSub, FMul, and FAdd are all well covered.
logLogFMulUnaryAll three inputs are of the same type. Any inputs that are long vectors must have the same number of dimensions.
madIMadTernary
maxIMaxBinary
minIMinBinary
☑️pow[Log, Exp][FMul] , [FDiv]BinaryNot required. Ops well covered by other tests.
☑️rcpFDivUnaryNot required. Covered by floating point division.
roundRound_neUnary
rsqrtRsqrtUnary
signZExt, Sub, [ICmpSLT], [FCmpOLT]Unary
smoothstepSaturateFMul, FSub, FDivTernary
sqrtSqrtUnary
☑️stepFCmpOLT, SelectBinaryNot required. FCmpOLT covered by atan2 and sign. Select covered by explicit select test.
truncRound_zUnary
☑️clampFMax, FMin, [UMax, UMin] , [IMax, Imin]TernaryNot required. Covered by min and max.
☑️exp2ExpUnaryNot required. Covered by exp.
☑️log10LogFMulUnaryNot required. Covered by log.
☑️log2LogUnaryNot required. Covered by log.

Float Ops

CompletedIntrinsicDXIL OpCodeLLVM InstructionBasic Op TypeNotes
f16tof32LegacyF16ToF32Unary
f32tof16LegacyF32ToF16Unary
isfiniteIsFiniteUnary
isinfIsInfUnary
isnanIsNanUnary
modfRound_zFSub, StoreHas a return value and an ouput value.Unary
fmodFAbs, FrcFDiv, FNeg, FCmpOGE, Select, FMulBinary

Bitwise Ops

CompletedIntrinsicDXIL OpCodeLLVM InstructionBasic Op TypeNotes
saturateSaturateUnary
reversebitsBfrevUnary
countbitsCountbitsUnary
firstbithighFirstbitSHiUnary
firstbitlowFirstbitLoUnary

Logic Ops

CompletedIntrinsicDXIL OpCodeLLVM InstructionBasic Op TypeNotes
selectSelect, [ExtractElement, InsertElement]Ternary
andAnd, [ExtractElement, InsertElement]Not required. Covered by select.Binary
orOr, [ExtractElement, InsertElement]Not required. Covered by select.Binary

Reductions

CompletedIntrinsicDXIL OpCodeLLVM InstructionBasic Op TypeNotes
all[FCmpUNE], [ICmpNE] , [ExtractElement, And]Unary
any[FCmpUNE], [ICmpNE] , [ExtractElement, Or]Unary
dotExtractElement, MulBinary

Derivative and Quad Operations

CompletedIntrinsicDXIL OpCodeLLVM InstructionBasic Op TypeNotes
ddxDerivCoarseXUnary
ddx_fineDerivFineXUnary
ddyDerivCoarseYUnary
ddy_fineDerivFineYUnary
fwidthQuadReadLaneAtUnary
QuadReadLaneAcrossXQuadOpUnary
QuadReadLaneAcrossYQuadOpUses different QuadOp parameters leading to different behavior.Unary
QuadReadLaneAcrossDiagonalQuadOpUses different QuadOp parameters leading to different behavior.Unary
ddx_coarseDerivCoarseXNot required. Covered by ddxUnary
ddy_coarseDerivCoarseYNot requried. Covered by ddyUnary

WaveOps

CompletedIntrinsicDXIL OpCodeLLVM InstructionBasic Op TypeNotes
WaveActiveBitAndWaveActiveBitBinary
WaveActiveBitOrWaveActiveBitBinary
WaveActiveBitXorWaveActiveBitBinary
WaveActiveProductWaveActiveOpBinary
WaveActiveSumWaveActiveOpBinary
WaveActiveMinWaveActiveOpBinary
WaveActiveMaxWaveActiveOpBinary
WaveMultiPrefixBitAndWaveMultiPrefixOpBinary
WaveMultiPrefixBitOrWaveMultiPrefixOpBinary
WaveMultiPrefixBitXorWaveMultiPrefixOpBinary
WaveMultiPrefixProductWaveMultiPrefixOpBinary
WaveMultiPrefixSumWaveMultiPrefixOpBinary
WavePrefixSumWavePrefixOpBinary
WavePrefixProductWavePrefixOpBinary
WaveReadLaneAtWaveReadLaneAtBinary
WaveReadLaneFirstWaveReadLaneFirstUnary
WaveActiveAllEqualWaveActiveAllEqualUnary
WaveMatchWaveMatchUnary

Type Casting Operations

CompletedIntrinsicDXIL OpCodeLLVM InstructionBasic Op TypeNotes
asdoubleMakeDoubleBinary
asfloatBitCastUnary
asfloat16BitCastUnary
asintBitCastUnary
asint16BitCastUnary
asuint (from double)SplitDoubleReturns void. Has two output arguments.Converts double to two uintsBinary
asuint (bitcast)BitCastBitcast from float/int to uintUnary
asuint16BitCastUnary