HLSL has supported vectors in a limited capacity (int3, float4, etc.), and these are scalarized in DXIL; small vectors while useful in a traditional graphics context do not scale well with the evolution on HLSL as a more general purpose language targetting Graphics and Compute. Notably, with the ubiquitous adoption of machine learning techniques which often get expressed as vector-matrix operations, there is a need for supporting larger vector sizes in HLSL and preserving these vector objects at the DXIL level to take advantage of specialized hardware that can accelerate vector operations.
Enable vectors of longer length in HLSL and preserve the vector type in DXIL.
vector<T, N>
Currently HLSL allows vector<T, N> name;
where T
is any scalar type and N
, number of
components, is a positive integer less than or equal to 4. See current definition here.
This proposal extends this support to longer vectors (beyond 4).
The default behavior of HLSL vectors is preserved for backward compatibility, meaning, skipping the last parameter N
defaults to 4-component vectors and the use vector name;
declares a 4-component float vector, etc. More examples
here.
The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave uniformity requirements, but implementations may specify best practices in certain uses for optimal performance.
Restrictions on the uses of vectors with N > 4
struct
.Constructing vectors
HLSL vectors can be constructed through initializer lists and constructor syntax initializing or by assignment.
Examples:
vector<uint, 5> vecA = {1, 2, 3, 4, 5};
vector<uint, 6> vecB = vector<uint, 6>(6, 7, 8, 9, 0, 0);
uint4 initval = {0, 0, 0, 0};
vector<uint, 8> vecC = {uint2(coord.xy), vecB};
vector<uint, 6> vecD = vecB;
Load and Store vectors from Buffers/Arrays
For loading and storing N-dimensional vectors from ByteAddressBuffers we use the LoadN
and StoreN
methods, extending
the existing Load/Store, Load2/Store2, Load3/Store3 and Load4/Store4 methods.
// Load/Store from [RW]ByteAddressBuffers
RWByteAddressBuffer myBuffer;
vector<uint, N> val = myBuffer.LoadN(uint StartOffsetInBytes);
myBuffer.StoreN<T>(uint StartoffsetInBytes, vector<T, N> stVec);
// Load/Store from groupshared arrays
groupshared T inputArray[512];
groupshared T outputArray[512];
Load(vector<T,N> ldVec, groupshared inputArray, uint offsetInBytes);
Store(vector<T,N> stVec, groupshared outputArray, uint offsetInBytes);
Operations on vectors
Support all HLSL intrinsics that are important as activation functions: fma, exp, log, tanh, atan, min, max, clamp, and step. Eventually support all HLSL operators and math intrinsics that are currently enabled for vectors.
Refer to the HLSL spec for an exhaustive list of Operators and Intrinsics.
Note: Additionally any mathematical operations missing from the above list but needed as activation functions for neural network computations will be added.
First class debug support for HLSL vectors. Emit llvm.dbg.declare
and llvm.dbg.value
intrinsics that can be used by tools for better debugging experience. Open Issue: Handle DXIL scalarized and vector paths.
TODO: Possible checks for DXIL vector support and tiered support.
Open Issue: Can implementations support vector DXIL?
Our original proposal introduced an opaque Cooperative Vector type to HLSL to limit the scope of the feature to small neural network evaluation and also contain the scope for testing. But aligning with the long term roadmap of HLSL to enable generic vectors, it makes sense to not introduce a new datatype but use HLSL vectors, even if the initial implementation only exposes partial functionality.