INF-0006 - Long Vector Execution Test Plan

Status	Accepted
Author	Alex Sepkowski
Sponsor	Alex Sepkowski

Impacted Projects: DXC

Introduction

This test plan covers testing all HLSL intrinsics that can take long vectors as parameters. And more specifically, it only covers testing scenarios which will get coverage from a graphics driver supporting DXIL.

These tests will verify that all DXIL opcodes and LLVM instructions which can be reached using valid HLSL in SM 6.9 can compile, run, and produce correct output when given long and native vector inputs. They will not verify that the generated DXIL is vectorized.

All tests are to be included in the HLK test binary which ships with the OS. This test binary is only built in the OS repo and based off of the ExecutionTests source code in the DXC repo. There is a script in the WinTools repo which generates and annotates the HLK tests.

We break coverage down into five test categories.

Implement DXIL OpCode tests:
- At the bottom of this document there are tables containing all HLSL operators (more on those in ‘3. HLSL Operator Tests’) and HLSL intrinsics that can be used with long vectors. The HLSL intrinsics tables have a DXIL OpCode and LLVM instruction columns. These columns contain the intrinsic’s mapped DXIL OpCodes as well as their LLVM instructions. All intrinsics have at least one DXIL OpCode or one LLVM instruction.
  Many intrinsics have trivial mappings. Atan is an example of an intrinsic with a trivial mapping. Other intrinsics have multiple DXIL OpCodes. Some intrinsics will use all listed DXIL OpCodes and/or LLVM instructions, while others will have additional logic which determines which OpCodes/Instructions are used. If an intrinsic relies on additional logic to determine which OpCodes/Instructions are used then the OpCode/Instructions will be enclosed in ‘[]’ brackets. The sign intrinsic is an example of an intrinsic with additional logic. If an OpCode/Instruction is not enclosed in ‘[]’ then it is used in all paths for that intrinsic.
Implement LLVM Instruction tests:
- These are the test cases for the LLVM Instructions listed in the table at the bottom of this document.
- Because we will use HLSL intrinsics to get coverage for the DXIL OpCode tests we speculate that we will get most of the coverage needed for the LLVM Instruction tests. After implementing the DXIL OpCode tests we should be able to do a coverage audit and ammend test cases, or write simple additional ones, as needed.
- Just as in ‘1. Implement DXIL OpCode Tests’ some cases have multiple instructions listed. ‘[]’ brackets are used in the same manner. And there may also be multiple instructions.
- Additional OpCodes/Instructions are logic based (i.e float or int specific).
HLSL Operator tests:
- HLSL Operators Table in this document lists the HLSL Operators which can take long vectors as arguments.
- Many of these operators can and will get coverage by default in the DXIL OpCode tests. But we will audit coverage and ammend test case, or write simple additional ones, as needed.
Standard loading and storing of long vectors
- These could be covered in test categories 1 and 2. But I propose we break out individual cases to ensure that we have more granular coverage.
- Ensure we have some basic tests doing standard loading/storing of long vectors across Buffer types to test and Vector element data types to test.
- Additionally, the above buffer types and data types should be tested by loading from a ResourceDescriptorHeap
‘Creative’ test cases:
- Sizes around alignments and boundaries. See Vector sizes and alignments
- Odd (non even) number of elements in vector. See Test Sizes

Buffer types to test

Raw Buffers (Byte Address Buffers)
Structured Buffers (StructuredBuffer<T>)

Vector element data types to test

Testing will cover the following vector element data types:

bool, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float16_t, float32_t, float64_t, packed_int16_t, and packed_uint16_t.

Vector sizes and alignments to test

General sizes to test are in the range [3, 1024]. It is worth noting that the new form of rawBufferLoad will be updated to vectorize sizes < 5.

Test sizes

vector<TYPE, 3> : Testing one below previous vector limit. Early testing found some issues here so it was added.
vector<TYPE, 4> : Previous limit.
vector<TYPE, 5> : Testing one above previous vector limit.
vector<TYPE, 16> : This size of ‘vector’ previously only appeared as matrices.
vector<TYPE, 17> : Larger than any vector previously possible.
vector<TYPE, 35> : Arbitrarily picked.
vector<TYPE, 100> : Arbitrarily picked.
vector<TYPE, 256> : Arbitrarily picked.
vector<TYPE, 1024> : The new max size of a vector.
These sizes will be tested across Vector element data types to test

Some noteable alignment cases

128 bit boundaries : Memory access for Shader Model 5.0 and earlier operate on 128-bit slots aligned on 128-bit boundaries. An example is vector<half, 7>, vector<half, 8> and vector<half, 9>. 112 bits, 128 bits, and 144 bits respectively. This boundary will tested for with 32-bit and 64-bit sized values as well.
Most GPUs operate on at least 32-bits at once, so what happens if you use 16-bit values and an odd number of elements. Could accessing the last element expose issues where we could overwrite the next variable if it is assuming alignment?

High level test design

The test will leverage the existing XML infrastructure currently used by the existing execution tests. There are two XML files. This general design pattern exists today in the execution tests.
- 1st XML: Used to define shader source code and metadata about that shader code. This XML file is parsed using a private class. This private class helps facilitate creation of D3D resources and execution of the shader.
- 2nd XML: Describes metadata about the specific test cases. Used by the TAEF infrastructure for TAEF Data Driven Testing
Test inputs will be hard coded in a c++ header file. This was chosen over definining inputs in the second XML as this is cleaner and easier to parse for different data types. This c++ header method also avoids needing to repeat the data set in the XML for each individual test cast. Inputs will use ‘value sets’ which will typically be much smaller than the desired vector test size. Values will be repeated cyclically until the vector is full. For example, a value set {1, 2, 3} used to populate a vector<int, 1024> will produce the pattern <1, 2, 3, 1, 2, 3, ...>, repeating the sequence until all 1024 elements are filled. This approach provides predictable test data while keeping input definitions manageable.
Expected outputs are computed for each test case at run time.
All new long vector test code is factored out into its own files.

Implementation phases

Do the test work in two simple phases.

Implement and validate (locally against WARP) for all test categories.
HLK related work:

Add a SM 6.9 HLK requirement. Includes updating the HLK requirements doc.
Update mm_annotate_shader_op_arith_table.py to annotate the new test cases with HLK GUIDS and requirements
Add new tests to HLK playlist

Shipping

Note that because DXC and the Agility SDK are both undocked from Windows it is our normal operating behavior for the HLK tests to become available with a later TBD OS release. The good news is that this doesn’t prevent the tests from being available much earlier in the DXC repo. It just means that they are simply TAEF tests in the DXC repo. An HLK test includes an extra level of infrastructure for test gating, selection, and result submission for WHQL signing of drivers.

Tests will be shared privately with IHVs along with the latest DXC and latest Agility SDK for testing and validation. IHVs will also be able to build and run the tests from the public DXC repo themselves. If needed Microsoft can share further instructions when the tests are available.
The tests will ship with the HLK at a TBD date in a later OS release.

Test Validation Requirements

The following statements must be true and validated for this work to be considered completed.

All new test cases pass when run locally against a WARP device
All new test cases must verify applicable outputs for correctness.
All new test cases are confirmed to be present in HLK Studio and selectable to be run when a target device satisfies the HLK ShaderModel 6.9 requirement.
All new tests/test cases are added to the official WHQL HLK playlist for the OS release that the HLK tests will ship with.
Tests will be annoated to show which DXIL OpCode, LLVM Instructions, and HLSL operators they are intended to get coverage for.

Notes

Private test binaries/collateral will be shared with IHVs for validation purposes. This will enable IHVs to verify long vector functionality without waiting for an OS/HLK release.

HLSL-Operators

✅ - Means there was an explicit test case implemented for the intrinsic. ☑️ - Means the intrinsic gets coverage via other intrinsics. For example ’exp2’ just uses the DXIL Opcode for Exp.

HLSL Operators

These operators generate LLVM instructions which use vectors.

Operator table from Microsoft HLSL Operators

Completed	Operator Name	Operator	Notes
✅	Addition	+
✅	Subtraction	-
✅	Multiplication	*
	Additive and Multiplicative Operators	+, -, *, /, %
	Array Operator	[i]	llvm:ExtractElementInst OR llvm:InsertElemtInst
	Assignment Operators	=, +=, -=, *=, /=, %=
	Bitwise Operators	~, «, », &, \|, ^, «=, »=, &=, \|=, ^=	Only valid on int and uint vectors
	Boolean Math Operators	& &, \|\| , ?:
	Cast Operator	(type)	No direct operator, difference in GetElementPointer or load type
	Comparison Operators	<, >, ==, !=, <=, >=
	Prefix or Postfix Operators	++, –
	Unary Operators	!, -, +

Mappings of HLSL Intrinsics to DXIL OpCodes or LLVM Instructions

Trigonometry

Completed	Intrinsic	DXIL OpCode	LLVM Instruction	Basic Op Type	Notes
✅	acos	Acos		Unary	range: -1 to 1
✅	asin	Asin		Unary	range: -pi/2 to pi/2. Floating point types only.
✅	atan	Atan		Unary	range: -pi/2 to pi/2.
✅	cos	Cos		Unary	no range requirements.
✅	cosh	Hcos		Unary	no range requirements.
✅	sin	Sin		Unary	no range requirements.
✅	sinh	Hsin		Unary	no range requirements.
✅	tan	Tan		Unary	no range requirements.
✅	tanh	Htan		Unary	no range requirements.
✅	atan2	Atan	FDiv, FAdd, FSub, FCmpOLT, FCmpOEQ, FCmpOGE, FCmpOLT, And, Select	Unary	Not required.

Math

Completed	Intrinsic	DXIL OpCode	LLVM Instruction	Basic Op Type	Notes
✅	abs	[Imax], [Fabs]		Unary	Imax for ints. Fabs for floats.
✅	ceil	Round_pi		Unary
✅	exp	Exp		Unary
✅	floor	Round_ni		Unary
	fma	Fma		Ternary	All three inputs are of the same type. Any inputs that are long vectors must have the same number of dimensions.
✅	frac	rc		Unary
	frexp		FCmpUNE, SExt, BitCast, And, Add, AShr, SIToFP, Store, And, Or	Unary	Has a return value in addition to an output parameter.
☑️	ldexp	Exp	FMul	Binary	Not required. Covered by floating point multiplication and exp.
☑️	lerp		FSub, FMul, FAdd	Ternary	Not required. FSub, FMul, and FAdd are all well covered.
✅	log	Log	FMul	Unary	All three inputs are of the same type. Any inputs that are long vectors must have the same number of dimensions.
	mad	IMad		Ternary
✅	max	IMax		Binary
✅	min	IMin		Binary
☑️	pow	[Log, Exp]	[FMul] , [FDiv]	Binary	Not required. Ops well covered by other tests.
☑️	rcp		FDiv	Unary	Not required. Covered by floating point division.
✅	round	Round_ne		Unary
✅	rsqrt	Rsqrt		Unary
✅	sign		ZExt, Sub, [ICmpSLT], [FCmpOLT]	Unary
	smoothstep	Saturate	FMul, FSub, FDiv	Ternary
✅	sqrt	Sqrt		Unary
☑️	step		FCmpOLT, Select	Binary	Not required. FCmpOLT covered by atan2 and sign. Select covered by explicit select test.
✅	trunc	Round_z		Unary
☑️	clamp	FMax, FMin, [UMax, UMin] , [IMax, Imin]		Ternary	Not required. Covered by min and max.
☑️	exp2	Exp		Unary	Not required. Covered by exp.
☑️	log10	Log	FMul	Unary	Not required. Covered by log.
☑️	log2	Log		Unary	Not required. Covered by log.

Float Ops

Intrinsic	DXIL OpCode	LLVM Instruction	Basic Op Type	Notes
f16tof32	LegacyF16ToF32			Unary
f32tof16	LegacyF32ToF16			Unary
isfinite	IsFinite			Unary
isinf	IsInf			Unary
isnan	IsNan			Unary
modf	Round_z	FSub, Store	Has a return value and an ouput value.	Unary
fmod	FAbs, Frc	FDiv, FNeg, FCmpOGE, Select, FMul		Binary

Bitwise Ops

Intrinsic	DXIL OpCode	Notes
saturate	Saturate	Unary
reversebits	Bfrev	Unary
countbits	Countbits	Unary
firstbithigh	FirstbitSHi	Unary
firstbitlow	FirstbitLo	Unary

Logic Ops

Intrinsic	LLVM Instruction	Basic Op Type	Notes
select	Select, [ExtractElement, InsertElement]		Ternary
and	And, [ExtractElement, InsertElement]	Not required. Covered by select.	Binary
or	Or, [ExtractElement, InsertElement]	Not required. Covered by select.	Binary

Reductions

Intrinsic	LLVM Instruction	Notes
all	[FCmpUNE], [ICmpNE] , [ExtractElement, And]	Unary
any	[FCmpUNE], [ICmpNE] , [ExtractElement, Or]	Unary
dot	ExtractElement, Mul	Binary

Derivative and Quad Operations

Intrinsic	DXIL OpCode	Basic Op Type	Notes
ddx	DerivCoarseX		Unary
ddx_fine	DerivFineX		Unary
ddy	DerivCoarseY		Unary
ddy_fine	DerivFineY		Unary
fwidth	QuadReadLaneAt		Unary
QuadReadLaneAcrossX	QuadOp		Unary
QuadReadLaneAcrossY	QuadOp	Uses different QuadOp parameters leading to different behavior.	Unary
QuadReadLaneAcrossDiagonal	QuadOp	Uses different QuadOp parameters leading to different behavior.	Unary
ddx_coarse	DerivCoarseX	Not required. Covered by ddx	Unary
ddy_coarse	DerivCoarseY	Not requried. Covered by ddy	Unary

WaveOps

Intrinsic	DXIL OpCode	Notes
WaveActiveBitAnd	WaveActiveBit	Binary
WaveActiveBitOr	WaveActiveBit	Binary
WaveActiveBitXor	WaveActiveBit	Binary
WaveActiveProduct	WaveActiveOp	Binary
WaveActiveSum	WaveActiveOp	Binary
WaveActiveMin	WaveActiveOp	Binary
WaveActiveMax	WaveActiveOp	Binary
WaveMultiPrefixBitAnd	WaveMultiPrefixOp	Binary
WaveMultiPrefixBitOr	WaveMultiPrefixOp	Binary
WaveMultiPrefixBitXor	WaveMultiPrefixOp	Binary
WaveMultiPrefixProduct	WaveMultiPrefixOp	Binary
WaveMultiPrefixSum	WaveMultiPrefixOp	Binary
WavePrefixSum	WavePrefixOp	Binary
WavePrefixProduct	WavePrefixOp	Binary
WaveReadLaneAt	WaveReadLaneAt	Binary
WaveReadLaneFirst	WaveReadLaneFirst	Unary
WaveActiveAllEqual	WaveActiveAllEqual	Unary
WaveMatch	WaveMatch	Unary

Type Casting Operations

Completed	Intrinsic	DXIL OpCode	LLVM Instruction	Basic Op Type	Notes
✅	asdouble	MakeDouble			Binary
✅	asfloat		BitCast		Unary
✅	asfloat16		BitCast		Unary
✅	asint		BitCast		Unary
✅	asint16		BitCast		Unary
✅	asuint (from double)	SplitDouble	Returns void. Has two output arguments.	Converts double to two uints	Binary
✅	asuint (bitcast)		BitCast	Bitcast from float/int to uint	Unary
✅	asuint16		BitCast		Unary

Edit on GitHub