High-Level Shader Language Specification
Working Draft

Introduction[Intro]

The hlsl is the GPU programming language provided in conjunction with the dx runtime. Over many years its use has expanded to cover every major rendering API across all major development platforms. Despite its popularity and long history hlsl has never had a formal language specification. This document seeks to change that.

hlsl draws heavy inspiration originally from isoC and later from isoCPP with additions specific to graphics and parallel computation programming. The language is also influenced to a lesser degree by other popular graphics and parallel programming languages.

hlsl has two reference implementations which this specification draws heavily from. The original reference implementation fxc has been in use since dx 9. The more recent reference implementation dxc has been the primary shader compiler since dx 12.

In writing this specification bias is leaned toward the language behavior of dxc rather than the behavior of fxc, although that can vary by context.

In very rare instances this spec will be aspirational, and may diverge from both reference implementation behaviors. This will only be done in instances where there is an intent to alter implementation behavior in the future. Since this document and the implementations are living sources, one or the other may be ahead in different regards at any point in time.

Scope[Intro.Scope]

This document specifies the requirements for implementations of hlsl. The hlsl specification is based on and highly influenced by the specifications for the c and the cpp.

This document covers both describing the language grammar and semantics for hlsl, and (in later sections) the standard library of data types used in shader programming.

Normative References[Intro.Refs]

The following referenced documents provide significant influence on this document and should be used in conjunction with interpreting this standard.

isoC, Programming languages - C
isoCPP, Programming languages - C++
dx Specifications, https://microsoft.github.io/DirectX-Specs/

Terms and definitions[Intro.Terms]

This document aims to use terms consistent with their definitions in isoC and isoCPP. In cases where the definitions are unclear, or where this document diverges from isoC and isoCPP, the definitions in this section, the remaining sections in this chapter, and the attached glossary ([main]) supersede other sources.

Common Definitions[Intro.Defs]

The following definitions are consistent between hlsl and the isoC and isoCPP specifications, however they are included here for reader convenience.

Correct Data[Intro.Defs.CorrectData]

Data is correct if it represents values that have specified or unspecified but not undefined behavior for all the operations in which it is used. Data that is the result of undefined behavior is not correct, and may be treated as undefined.

Diagnostic Message[Intro.Defs.Diags]

An implementation defined message belonging to a subset of the implementation’s output messages which communicates diagnostic information to the user.

Ill-formed Program[Intro.Defs.IllFormed]

A program that is not well-formed, for which the implementation is expected to return unsuccessfully and produce one or more diagnostic messages.

Implementation-defined Behavior[Intro.Defs.ImpDef]

Behavior of a well-formed program and correct data which may vary by the implementation, and the implementation is expected to document the behavior.

Implementation Limits[Intro.Defs.ImpLimits]

Restrictions imposed upon programs by the implementation of either the compiler or runtime environment. The compiler may seek to surface runtime-imposed limits to the user for improved user experience.

Undefined Behavior[Intro.Defs.Undefined]

Behavior of invalid program constructs or incorrect data for which this standard imposes no requirements, or does not sufficiently detail.

Unspecified Behavior[Intro.Defs.Unspecified]

Behavior of a well-formed program and correct data which may vary by the implementation, and the implementation is not expected to document the behavior.

Well-formed Program[Intro.Defs.WellFormed]

An hlsl program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule.

Runtime Implementation[Intro.Defs.Runtime]

A runtime implementation refers to a full-stack implementation of a software runtime that can facilitate the execution of hlsl programs. This broad definition includes libraries and device driver implementations. The hlsl specification does not distinguish between the user-facing programming interfaces and the vendor-specific backing implementation.

Runtime Targeting[Intro.Runtime]

hlsl emerged from the evolution of dx to grant greater control over GPU geometry and color processing. It gained popularity because it targeted a common hardware description which all conforming drivers were required to support. This common hardware description, called a sm, is an integral part of the description for hlsl . Some hlsl features require specific sm features, and are only supported by compilers when targeting those sm versions or later.

spmd Programming Model[Intro.Model]

hlsl uses a spmd programming model where a program describes operations on a single element of data, but when the program executes it executes across more than one element at a time. This programming model is useful due to GPUs largely being simd hardware architectures where each instruction natively executes across multiple data elements at the same time.

There are many different terms of art for describing the elements of a GPU architecture and the way they relate to the spmd program model. In this document we will use the terms as defined in the following subsections.

spmd Terminology[Intro.Model.Terms]

Host and Device[Intro.Model.Terms.HostDevice]

hlsl is a data-parallel programming language designed for programming auxiliary processors in a larger system. In this context the host refers to the primary processing unit that runs the application which in turn uses a runtime to execute hlsl programs on a supported device. There is no strict requirement that the host and device be different physical hardware, although they commonly are. The separation of host and device in this specification is useful for defining the execution and memory model as well as specific semantics of language constructs.

lane[Intro.Model.Terms.Lane]

A lane represents a single computed element in an spmd program. In a traditional programming model it would be analogous to a thread of execution, however it differs in one key way. In multi-threaded programming threads advance independent of each other. In spmd programs, a group of lanes may execute instructions in lockstep because each instruction may be a simd instruction computing the results for multiple lanes simultaneously, or synchronizing execution across multiple lanes or waves. A lane has an associated lane state which denotes the execution status of the lane (1.6.1.7).

wave[Intro.Model.Terms.Wave]

A grouping of lanes for execution is called a wave. The size of a wave is defined as the maximum number of active lanes the wave supports. wave sizes vary by hardware architecture, and are required to be powers of two. The number of active lanes in a wave can be any value between one and the wave size.

Some hardware implementations support multiple wave sizes. There is no overall minimum wave size requirement, although some language features do have minimum lane size requirements.

hlsl is explicitly designed to run on hardware with arbitrary wave sizes. Hardware architectures may implement waves as simt where each thread executes instructions in lockstep. This is not a requirement of the model. Some constructs in hlsl require synchronized execution. Such constructs will explicitly specify that requirement.

quad[Intro.Model.Terms.Quad]

A quad is a subdivision of four lanes in a wave which are computing adjacent values. In pixel shaders a quad may represent four adjacent pixels and quad operations allow passing data between adjacent lanes. In compute shaders quads may be one or two dimensional depending on the workload dimensionality. Quad operations require four active lanes.

threadgroup[Intro.Model.Terms.Group]

A grouping of lanes executing the same shader to produce a combined result is called a threadgroup. threadgroups are independent of simd hardware specifications. The dimensions of a threadgroup are defined in three dimensions. The maximum extent along each dimension of a threadgroup, and the total size of a threadgroup are implementation limits defined by the runtime and enforced by the compiler. If a threadgroup’s size is not a whole multiple of the hardware wave size, the unused hardware lanes are implicitly inactive.

If a threadgroup size is smaller than the wave size , or if the threadgroup size is not an even multiple of the wave size, the remaining lane are inactive lanes.

dispatch[Intro.Model.Terms.Dispatch]

A grouping of threadgroups which represents the full execution of a hlsl program and results in a completed result for all input data elements.

lane States[Intro.Model.Terms.LaneState]

lanes may be in four primary states: active, helper, inactive, and predicated off.

An active lane is enabled to perform computations and produce output results based on the initial launch conditions and program control flow.

A helper lane is a lane which would not be executed by the initial launch conditions except that its computations are required for adjacent pixel operations in pixel fragment shaders. A helper lane will execute all computations but will not perform writes to buffers, and any outputs it produces are discarded. Helper lanes may be required for lane-cooperative operations to execute correctly.

A inactive lane is a lane that is not executed by the initial launch conditions. This can occur if there are insufficient inputs to fill all lanes in the wave, or to reduce per-thread memory requirements or register pressure.

A predicated off lane is a lane that is not being executed due to program control flow. A lane may be predicated off when control flow for the lanes in a wave diverge and one or more lanes are temporarily not executing.

The diagram blow illustrates the state transitions between lane states:

spmd Execution Model[Intro.Model.Exec]

A runtime implementation shall provide an implementation-defined mechanism for defining a dispatch. A runtime shall manage hardware resources and schedule execution to conform to the behaviors defined in this specification in an implementation-defined way. A runtime implementation may sort the threadgroups of a dispatch into waves in an implementation-defined way. During execution no guarantees are made that all lanes in a wave are actively executing.

wave, quad, and threadgroup operations require execution synchronization of applicable active and helper lanes as defined by the individual operation.

Optimization Restrictions[Intro.Model.Restrictions]

An optimizing compiler may not optimize code generation such that it changes the behavior of a well-formed program except in the presence of implementation-defined or unspecified behavior.

The presence of wave, quad, or threadgroup operations may further limit the valid transformations of a program. Specifically, control flow operations which result in changing which lanes, quads, or waves are actively executing are illegal in the presence of cooperative operations if the optimization alters the behavior of the program.

hlsl Memory Models[Intro.Memory]

Memory accesses for sm 5.0 and earlier operate on 128-bit slots aligned on 128-bit boundaries. This optimized for the common case in early shaders where data being processed on the GPU was usually 4-element vectors of 32-bit data types.

On modern hardware memory access restrictions are loosened, and reads of 32-bit multiples are supported starting with sm 5.1 and reads of 16-bit multiples are supported with sm 6.0. sm features are fully documented in the dx Specifications, and this document will not attempt to elaborate further.

Memory Spaces[Intro.Memory.Spaces]

hlsl programs manipulate data stored in four distinct memory spaces: thread, threadgroup, device and constant.

Thread Memory[Intro.Memory.Spaces.Thread]

Thread memory is local to the lane. It is the default memory space used to store local variables. Thread memory cannot be directly read from other threads without the use of intrinsics to synchronize execution and memory.

threadgroup Memory[Intro.Memory.Spaces.Group]

threadgroup memory is denoted in hlsl with the groupshared keyword. The underlying memory for any declaration annotated with groupshared is shared across an entire threadgroup. Reads and writes to threadgroup Memory, may occur in any order except as restricted by synchronization intrinsics or other memory annotations.

Device Memory[Intro.Memory.Spaces.Device]

Device memory is memory available to all lanes executing on the device. This memory may be read or written to by multiple threadgroups that are executing concurrently. Reads and writes to device memory may occur in any order except as restricted by synchronization intrinsics or other memory annotations. Some device memory may be visible to the host. Device memory that is visible to the host may have additional synchronization concerns for host visibility.

Constant Memory[Intro.Memory.Spaces.Constant]

Constant memory is similar to device memory in that it is available to all lanes executing on the device. Constant memory is read-only, and an implementation can assume that constant memory is immutable and cannot change during execution.

Lexical Conventions[Lex]

Unit of Translation[Lex.Translation]

The text of hlsl programs is collected in source and header files. The distinction between source and header files is social and not technical. An implementation will construct a translation unit from a single source file and any included source or header files referenced via the #include preprocessing directive conforming to the isoC preprocessor specification.

An implementation may implicitly include additional sources as required to expose the hlsl library functionality as defined in (12).

Phases of Translation[Lex.Phases]

hlsl inherits the phases of translation from isoCPP, with minor alterations, specifically the removal of support for trigraph and digraph sequences. Below is a description of the phases.

Source files are characters that are mapped to the basic source character set in an implementation-defined manner.
Any sequence of backslash (\) immediately followed by a new line is deleted, resulting in splicing lines together.
Tokenization occurs and comments are isolated. If a source file ends in a partial comment or preprocessor token the program is ill-formed and a diagnostic shall be issued. Each comment block shall be treated as a single white-space character.
Preprocessing directives are executed, macros are expanded, pragma and other unary operator expressions are executed. Processing of #include directives results in all preceding steps being executed on the resolved file, and can continue recursively. Finally all preprocessing directives are removed from the source.
Character and string literal specifiers are converted into the appropriate character set for the execution environment.
Adjacent string literal tokens are concatenated.
White-space is no longer significant. Syntactic and semantic analysis occurs translating the whole translation unit into an implementation-defined representation.
The translation unit is processed to determine required instantiations, the definitions of the required instantiations are located, and the translation and instantiation units are merged. The program is ill-formed if any required instantiation cannot be located or fails during instantiation.
External references are resolved, library references linked, and all translation output is collected into a single output.

Character Sets[Lex.CharSet]

The basic source character set is a subset of the ASCII character set. The table below lists the valid characters and their ASCII values:

Hex ASCII Value	Character Name	Glyph or C Escape Sequence
0x09	Horizontal Tab	`\t`
0x0A	Line Feed	`\n`
0x0D	Carriage Return	`\r`
0x20	Space
0x21	Exclamation Mark	`!`
0x22	Quotation Mark	`"`
0x23	Number Sign	`#`
0x25	Percent Sign	`%`
0x26	Ampersand	`&`
0x27	Apostrophe	`’`
0x28	Left Parenthesis	`(`
0x29	Right Parenthesis	`)`
0x2A	Asterisk	`*`
0x2B	Plus Sign	`+`
0x2C	Comma	`,`
0x2D	Hyphen-Minus	`-`
0x2E	Full Stop	`.`
0x2F	Solidus	`/`
0x30 .. 0x39	Digit Zero .. Nine	`0 1 2 3 4 5 6 7 8 9`
0x3A	Colon	`:`
0x3B	Semicolon	`;`
0x3C	Less-than Sign	`<`
0x3D	Equals Sign	`=`
0x3E	Greater-than Sign	`>`
0x3F	Question Mark	`?`
0x41 .. 0x5A	Latin Capital Letter A .. Z	`A B C D E F G H I J K L M`
		`N O P Q R S T U V W X Y Z`
0x5B	Left Square Bracket	`[`
0x5C	Reverse Solidus	`\`
0x5D	Right Square Bracket	`[`
0x5E	Circumflex Accent	`^`
0x5F	Underscore	`_`
0x61 .. 0x7A	Latin Small Letter a .. z	`a b c d e f g h i j k l m`
		`n o p q r s t u v w x y z`
0x7B	Left Curly Bracket	`{`
0x7C	Vertical Line	`\|`
0x7D	Right Curly Bracket	`}`

An implementation may allow source files to be written in alternate extended character sets as long as that set is a superset of the basic character set. The translation character set is an extended character set or the basic character set as chosen by the implementation.

Preprocessing Tokens[Lex.PPTokens]

preprocessing-token:
* header-name
* identifier
* pp-number
* character-literal
* string-literal
* preprocessing-op-or-punc
* each non-whitespace character from the translation character set that cannot be one of the above

Each preprocessing token that is converted to a token shall have the lexical form of a keyword, an identifier, a constant, a string literal or an operator or punctuator.

Preprocessing tokens are the minimal lexical elements of the language during translation phases 3 through 6 (2.2). Preprocessing tokens can be separated by whitespace in the form of comments, white space characters, or both. White space may appear within a preprocessing token only as part of a header name or between the quotation characters in a character constant or string literal.

Header name preprocessing tokens are only recognized within #include preprocessing directives, __has_include expressions, and implementation-defined locations within #pragma directives. In those contexts, a sequence of characters that could be either a header name or a string literal is recognized as a header name.

Tokens[Lex.Tokens]

token:
* identifier
* keyword
* literal
* operator-or-punctuator

There are five kinds of tokens: identifiers, keywords, literals, and operators or punctuators. All whitespace characters and comments are ignored except as they separate tokens.

Comments[Lex.Comments]

The characters /* start a comment which terminates with the characters /. The characters // start a comment which terminates at the next new line.

Header Names[Lex.Headers]

header-name:
* < h-char-sequence >
* " q-char-sequence "

h-char-sequence:
* h-char
* h-char-sequence h-char

h-char:
* any character in the translation character set except newline or >

q-char-sequence:
* q-char
* q-char-sequence q-char

q-char:
* any character in the translation character set except newline or "

Character sequences in header names are mapped to header files or external source file names in an implementation defined way.

Preprocessing numbers[Lex.PPNumber]

pp-number:
* digit
* . digit
* pp-number ’ digit
* pp-number ’ non-digit
* pp-number e sign
* pp-number E sign
* pp-number p sign
* pp-number P sign
* pp-number .

Preprocessing numbers begin with a digit or period (.), and may be followed by valid identifier characters and floating point literal suffixes (e+, e-, E+, E-, p+, p-, P+, and P-). Preprocessing number tokens lexically include all integer-literal and floating-literal tokens.

Preprocessing numbers do not have types or values. Types and values are assigned to integer-literal, floating-literal, and vector-literal tokens on successful conversion from preprocessing numbers.

A preprocessing number cannot end in a period (.) if the immediate next token is a scalar-element-sequence (2.9.4). In this situation the pp-number token is truncated to end before the period².

Literals[Lex.Literals]

Literal Classifications[Lex.Literal.Kinds]

literal:
* integer-literal
* character-literal
* floating-literal
* string-literal
* boolean-literal
* vector-literal

Integer Literals[Lex.Literal.Int]

integer-literal:
* decimal-literal integer-suffix_opt
* octal-literal integer-suffix_opt
* hexadecimal-literal integer-suffix_opt
*

decimal-literal:
* nonzero-digit
* decimal-literal digit
*

octal-literal:
* octal-literal octal-digit
*

hexadecimal-literal:
* 0x hexadecimal-digit
* 0X hexadecimal-digit
* hexadecimal-literal hexadecimal-digit
*

nonzero-digit: one of
* 2 3 4 5 6 7 8 9
*

octal-digit: one of
* 1 2 3 4 5 6 7
*

hexadecimal-digit: one of
* 1 2 3 4 5 6 7 8 9
* a b c d e f
* A B C D E F
*

integer-suffix:
* unsigned-suffix long-suffix_opt
* long-suffix unsigned-suffix_opt
*

unsigned-suffix: one of
* u U
*

long-suffix: one of
* l L

An integer literal is an optional base prefix, a sequence of digits in the appropriate base, and an optional type suffix. An integer literal shall not contain a period or exponent specifier.

The type of an integer literal is the first of the corresponding list in the table below in which its value can be represented³.

Suffix	Decimal constant	Octal or hexadecimal constant
none	int32_t	int32_t
	int64_t	uint32_t
		int64_t
		uint64_t
u or U	uint32_t	uint32_t
	uint64_t	uint64_t
l or L	int64_t	int64_t
		uint64_t
Both u or U	uint64_t	uint64_t
and l or L

If the specified value of an integer literal cannot be represented by any type in the corresponding list, the integer literal has no type and the program is ill-formed.

An implementation may support the integer suffixes ll and ull as equivalent to l and ul respectively.

Floating-point Literals[Lex.Literal.Float]

floating-literal:
* fractional-constant exponent-part_opt floating-suffix_opt
* digit-sequence exponent-part floating-suffx_opt
* fractional-constant:
* digit-sequence_opt . digit-sequence
* digit-sequence .
* exponent-part:
* e sign_opt digit-sequence
* E sign_opt digit-sequence
* sign: one of
* + - digit-sequence:
* digit
* digit-sequence digit floating-suffix: one of h f l H F L

A floating literal is written either as a fractional-constant with an optional exponent-part and optional floating-suffix, or as an integer digit-sequence with a required exponent-part and optional floating-suffix.

The type of a floating literal is float, unless explicitly specified by a suffix. The suffixes h and H specify half, the suffixes f and F specify float, and the suffixes l and L specify double.⁴ If a value specified in the source is not in the range of representable values for its type, the program is ill-formed.

Vector Literals[Lex.Literal.Vector]

vector-literal:
* integer-literal . scalar-element-sequence
* floating-literal . scalar-element-sequence

scalar-element-sequence:
* scalar-element-sequence-x
* scalar-element-sequence-r

scalar-element-sequence-x:
* x
* scalar-element-sequence-x x

scalar-element-sequence-r:
* r
* scalar-element-sequence-r r

A vector-literal is an integer-literal or floating-point literal followed by a period (.) and a scalar-element-sequence.

A scalar-element-sequence is a vector-swizzle-sequence where only the first vector element accessor is valid (x or r). A scalar-element-sequence is equivalent to a vector splat conversion performed on the integer-literal or floating-literal value (4.9).

Basic Concepts[Basic]

HLSL inherits a significant portion of its language semantics from C and C++. Some of this is a result of intentional adoption of syntax early in the development of the language and some a side-effect of the Clang-based implementation of DXC.

This chapter includes a lot of definitions that are inherited from C and C++. Some are identical to C or C++, others are slightly different. HLSL is neither a subset nor a superset of C or C++, and cannot be simply described in terms of C or C++. This specification includes all necessary definitions for clarity.

Preamble[Basic.preamble]

An entity is a value, object, function, enumerator, type, class member, bit-field, template, template specialization, namespace, or pack.

A name is a use of an identifier (5.2.4), operator-function-id ([Overload.operator]), conversion-function-id (9.2), or template-id (10) that denotes any entity or label (6.1).

Every name that denotes an entity is introduced by a declaration. Every name that denotes a label is introduced by a labeled statement (6.1)⁵.

A variable is introduced by the declaration of a reference other than a non-static data member of an object. The variable’s name denotes the reference or object.

Whenever a name is encountered it is necessary to determine if the name denotes an entity that is a type or template. The process for determining if a name refers to a type or template is called name lookup.

Two names are the same name if:

they are identifiers comprised of the same character sequence, or
they are operator-function-ids formed with the same operator, or
they are conversion-function-ids formed with the same type, or
they are template-ids that refer to the same class or function.

This section matches isoCPP section [basic] except for the exclusion of goto and literal operators.

Declarations and definitions[Basic.Decl]

A declaration (7) may introduce one or more names into a translation unit or redeclare names introduced by previous declarations. If a declaration introduces names, it specifies the interpretation and attributes of these names. A declaration may also have effects such as:

verifying a static assertion (7),
use of attributes (7), and
controlling template instantiation (10.1).

A declaration is a definition unless:

it declares a function without specifying the function’s body (7.5),
it is a parameter declaration in a function declaration that does not specify the function’s body (7.5),
it is a global or namespace member declaration without the static specifier⁶,
it declares a static data member in a class definition,
it is a class name declaration,
it is a template parameter,
it is a typedef declaration (7),
it is an alias-declaration (7),
it is a using-declaration (7),
it is a static_assert-declaration (7),
it is an empty-declaration (7),
or a using-directive (7).

The two examples below are adapted from isoCPP [basic.def]. All but one of the following are definitions:

int f(int x) return x+1; // defines f and x struct S int a;int b;; // defines S, S::a, and S::b struct X // defines X int x; // defines non-static member x static int y; // declares static data member y ; int X::y = 1; // defines X::y enum up, down ; // defines up and down namespace N // defines N int d; // declares N::d static int i; // defines N::i

All of the following are declarations:

int a; // declares a const int c; // declares c X anX; // declares anX int f(int); // declares f struct S; // declares S typedef int Int; // declares Int using N::d; // declares d using Float = float; // declares Float cbuffer CB // does not declare CB int z; // declares z tbuffer TB // does not declare TB int w; // declares w

One-Definition Rule[Basic.ODR]

The isoCPP One-definition rule is adopted as defined in isoCPP [basic.def.odr].

Scope[Basic.Scope]

Name Lookup[Basic.Lookup]

Program and linkage[Basic.Linkage]

A translation unit (2.1) is comprised of a sequence of declarations:

translation-unit:
* declaration-sequence_opt

A program is one or more translation units linked together. A program built from a single translation unit, bypassing a linking step is called freestanding.

A program is said to be fully linked, when it contains no unresolved external declarations, and all exported declarations are entry point declarations (3.7). A program is said to be partially linked, when it contains at least one unresolved external declaration or at least one exported declaration that is not an entry point.

An implementation may generate programs as fully linked or partially linked as requested by the user, and a runtime may allow fully linked or partially linked programs as the implementation allows.

A name has linkage if it can refer to the same entity as a name introduced by a declaration in another scope. If a variable, function, or another entity with the same name is declared in several scopes, but does not have sufficient linkage, then several instances of the entity are generated.

A name with no linkage may not be referred to by names from any other scope.
A name with internal linkage may be referred to by names from other scopes within the same translation unit.
A name with external linkage may be referred to by names from other scopes within the same translation unit, and by names from scopes of other translation units.
A name with program linkage may be referred to by names from other scopes within the same translation unit, by names from scopes of other translation units, by names from scopes of other programs, and by a runtime implementation.

When merging translation units through linking or generating a freestanding program only names with program linkage must be retained in the final program.

Program Linkage[Basic.Linkage.Program]

Entities with program linkage can be referred to from other partially linked programs or a runtime implementation.

The following entities have program linkage:

entry point functions (3.7)
functions marked with export keyword (7.7)
declarations contained within an export-declaration-group (7.7)

External Linkage[Basic.Linkage.External]

Entities with external linkage can be referred to from the scopes in the other translation units and enable linking between them.

The following entities in HLSL have external linkage:

global variables that are not marked static or groupshared ⁷
static data members of classes or template classes

Linkage of functions (including template functions) that are not entry points or marked with export keyword is implementation dependent. ⁸

Internal Linkage[Basic.Linkage.Internal]

Entities with internal linkage can be referred to from all scopes in the current translation unit.

The following entities in HLSL have internal linkage:

global variables marked as static or groupshared
all entities declared in an unnamed namespace or a namespace within an unnamed namespace
enumerations
classes or template classes, their member functions, and nested classes and enumerations

No Linkage[Basic.Linkage.NoLinkage]

An entity with no linkage can be referred to only from the scope it is in.

Any of the following entites declared at function scope or block scopes derived from function scope have no linkage:

local variables
local classes and their member functions
other entities declared at function scope or block scopes derived from function scope that such as typedefs, enumerations, and enumerators

Start[Basic.Start]

A fully linked program shall contain one or more global functions, which are the designated starting points for the program. These global functions are called entry points, because they denote the location where execution inside the program begins.

Entry point functions have different requirements based on the target runtime and execution mode (3.7.1).

Parameters to entry functions and entry function return types must be of scalar, vector, or non-intangible class type (3.8). Scalar and vector parameters and return types must be annotated with semantic annotations (7.6.1). Class type input and output parameters must have all fields annotated with semantic annotations.

Execution Mode[Basic.Start.Mode]

A runtime may define a set of execution modes in an implementation defined way. Each execution mode will have a set of implementation defined rules which restrict available language functionality as appropriate for the execution mode.

Types[Basic.types]

The object representation of an object of type T is the sequence of N bytes taken up by the object of type T, where N equals sizeof(T)⁹. The object representation of an object may be different based on the memory space it is stored in (1.7.1).

The value representation of an object is the set of bits that hold the value of type T. Bits in the object representation that are not part of the value representation are padding bits.

An object type is a type that is not a function type, not a reference type, and not a void type.

A class type is a data type declared with either the class or struct keywords (9). A class type T may be declared as incomplete at one point in a translation unit via a forward declaration, and complete later with a full definition. The type T is the same type throughout the translation unit.

There are special implementation-defined types such as handle types, which fall into a category of standard intangible types. Intangible types are types that have no defined object representation or value representation, as such the size is unknown at compile time.

A class type T is an intangible class type if it contains a base class or members of intangible class type, standard intangible type, or arrays of such types. Standard intangible types and intangible class types are collectively called intangible types(11).

An object type is an incomplete type if the compiler lacks sufficient information to determine the size of an object of type T, and it is not an intangible type. It is a complete type if the compiler has sufficient information to determine the size of an object of type T, or if the type is known to be an intangible type. An object may not be defined to have an incomplete type.

Arithmetic types (3.8.1), enumeration types, and cv-qualified versions of these types are collectively called scalar types.

Vectors of scalar types declared with the built-in vector<T,N> template are vector types. Vector lengths must be between 1 and 4 (i.e. 1 ≤ N ≤ 4 ).

Matrices of scalar types declared with the built-in matrix<T,N,M> template are matrix types. Matrix dimensions, N and M, must be between 1 and 4 (i.e. 1 ≤ N ≤ 4 ).

Arithmetic Types[Basic.types.arithmetic]

There are three standard signed integer types: int16_t, int32_t, and int64_t. Each of the signed integer types is explicitly named for the size in bits of the type’s object representation. There is also the type alias int which is an alias of int32_t. There is one minimum precision signed integer type: min16int. The minimum precision signed integer type is named for the required minimum value representation size in bits. The object representation of min16int is int. The standard signed integer types and minimum precision signed integer type are collectively called signed integer types.

There are three standard unsigned integer types: uint16_t, uint32_t, and uint64_t. Each of the unsigned integer types is explicitly named for the size in bits of the type’s object representation. There is also the type alias uint which is an alias of uint32_t. There is one minimum precision unsigned integer type: min16uint. The minimum precision unsigned integer type is named for the required minimum value representation size in bits. The object representation of min16uint is uint. The standard unsigned integer types and minimum precision unsigned integer type are collectively called unsigned integer types.

The minimum precision signed integer types and minimum precision unsigned integer types are collectively called minimum precision integer types. The standard signed integer types and standard unsigned integer types are collectively called standard integer types. The signed integer types and unsigned integer types are collectively called integer types. Integer types inherit the object representation of integers defined in isoC23¹⁰. Integer types shall satisfy the constraints defined in isoCPP, section basic.fundamental.

There are three standard floating point types: half, float, and double. The float type is a 32-bit floating point type. The double type is a 64-bit floating point type. Both the float and double types have object representations as defined in IEEE754. The half type may be either 16-bit or 32-bit as controlled by implementation defined compiler settings. If half is 32-bit it will have an object representation as defined in IEEE754, otherwise it will have an object representation matching the binary16 format defined in IEEE754¹¹. There is one minimum precision floating point type: min16float. The minimum precision floating point type is named for the required minimum value representation size in bits. The object representation of min16float is float¹². The standard floating point types and minimum precision floating point type are collectively called floating point types.

Integer and floating point types are collectively called arithmetic types.

The void type is inherited from isoCPP, which defines it as having an empty set of values and being an incomplete type that can never be completed. The void type is used to signify the return type of a function that returns no value. Any expression can be explicitly converted to void.

Scalarized Type Compatability[Basic.types.scalarized]

All types T have a scalarized representation, SR(T), which is a list of one or more types representing each scalar element of T.

Scalarized representations are determined as follows:

The scalarized representation of an array T[n] is SR(T₀), ..SR(T_n).
The scalarized representation of a vector vector<T,n> is T₀, ..T_n.
The scalarized representation of a matrix matrix<T,n, m> is T₀, ..T_n × m.
The scalarized representation of a class type T, SR(T) is computed recursively as SR(T::base), SR(T::₀), ..SR(T::_n) where (T::base) is T’s base class if it has one, and T : :_n represents the n non-static members of T.
The scalarized representation for an enumeration type is the underlying arithmetic type.
The scalarized representation for arithmetic, intangible types, and any other type T is T.

Two types cv1 T1 and cv2 T2 are scalar-layout-compatible types if T1 and T2 are the same type or if the sequence of types defined by the scalar representation SR(T1) and scalar representation SR(T2) are identical.

Lvalues and rvalues[Basic.lval]

Expressions are classified by the type(s) of values they produce. The valid types of values produced by expressions are:

An lvalue represents a function or object.
An rvalue represents a temporary object.
An xvalue (expiring value) represents an object near the end of its lifetime.
A cxvalue (casted expiring value) is an xvalue which, on expiration, assigns its value to a bound lvalue.
A glvalue is an lvalue, xvalue, or cxvalue.
A prvalue is an rvalue that is not an xvalue.

Standard Conversions[Conv]

hlsl inherits standard conversions similar to isoCPP. This chapter enumerates the full set of conversions. A standard conversion sequence is a sequence of standard conversions in the following order:

Zero or one conversion of either lvalue-to-rvalue, or array-to-pointer.
Zero or one conversion of either integral conversion, floating point conversion, floating point-integral conversion, or boolean conversion, derived-to-base-lvalue, or flat conversion¹³.
Zero or one conversion of scalar-vector splat, or vector/matrix truncation. ¹⁴.
Zero or one qualification conversion.

Standard conversion sequences are applied to expressions, if necessary, to convert it to a required destination type.

Lvalue-to-rvalue conversion[Conv.lval]

A glvalue of a non-function type T can be converted to a prvalue. The program is ill-formed if T is an incomplete type. If the glvalue refers to an object that is not of type T and is not an object of a type derived from T, the program is ill-formed. If the glvalue refers to an object that is uninitialized, the behavior is undefined. Otherwise the prvalue is of type T.

If the glvalue refers to an array of type T, the prvalue will refer to a copy of the array, not memory referred to by the glvalue.

Array-to-pointer conversion[Conv.array]

An lvalue or rvalue of type T[] (unsized array), can be converted to a prvalue of type pointer to T¹⁵ ¹⁶.

Integral promotion[Conv.ipromote]

An integral promotion is a conversion of:

a glvalue of integer type other than bool to a cxvalue of integer type of higher conversion rank, or
a conversion of a prvalue of integer type other than bool to a prvalue of integer type of higher conversion rank, or
a conversion of a glvalue of type bool to a cxvalue of integer type, or
a conversion of a prvalue of type bool to a prvalue of integer type.

Integer conversion ranks are defined in section 4.13.1.

A conversion is only a promotion if the destination type can represent all of the values of the source type.

Floating point promotion[Conv.fppromote]

A glvalue of a floating point type can be converted to a cxvalue of a floating point type of higher conversion rank, or a prvalue of a floating point type can be converted to a prvalue of a floating point type of higher conversion rank.

Floating point conversion ranks are defined in section [Conf.rank.float].

Integral conversion[Conv.iconv]

A glvalue of an integer type can be converted to a cxvalue of any other non-enumeration integer type. A prvalue of an integer type can be converted to a prvalue of any other integer type.

If the destination type is unsigned, integer conversion maintains the bit pattern of the source value in the destination type truncating or extending the value to the destination type.

If the destination type is signed, the value is unchanged if the destination type can represent the source value. If the destination type cannot represent the source value, the result is implementation-defined.

If the source type is bool, the values true and false are converted to one and zero respectively.

Floating point conversion[Conv.fconv]

A glvalue of a floating point type can be converted to a cxvalue of any other floating point type. A prvalue of a floating point type can be converted to a prvalue of any other floating point type.

If the source value can be exactly represented in the destination type, the conversion produces the exact representation of the source value. If the source value cannot be exactly represented, the conversion to a best-approximation of the source value is implementation defined.

Floating point-integral conversion[Conv.fpint]

A glvalue of floating point type can be converted to a cxvalue of integer type. A prvalue of floating point type can be converted to a prvalue of integer type. Conversion of floating point values to integer values truncates by discarding the fractional value. The behavior is undefined if the truncated value cannot be represented in the destination type.

A glvalue of integer type can be converted to a cxvalue of floating point type. A prvalue of integer type can be converted to a prvalue of floating point type. If the destination type can exactly represent the source value, the result is the exact value. If the destination type cannot exactly represent the source value, the conversion to a best-approximation of the source value is implementation defined.

Boolean conversion[Conv.bool]

A glvalue of arithmetic type can be converted to a cxvalue of boolean type. A prvalue of arithmetic or unscoped enumeration type can be converted to a prvalue of boolean type. A zero value is converted to false; all other values are converted to true.

Vector splat conversion[Conv.vsplat]

A glvalue of type T can be converted to a cxvalue of type vector<T,x> or a prvalue of type T can be converted to a prvalue of type vector<T,x>. The destination value is the source value replicated into each element of the destination.

A glvalue of type T can be converted to a cxvalue of type matrix<T,x,y> or a prvalue of type T can be converted to a prvalue of type matrix<T,x,y>. The destination value is the source value replicated into each element of the destination.

Vector and matrix truncation conversion[Conv.vtrunc]

A prvalue of type vector<T,x> can be converted to a prvalue of type:

vector<T,y> only if y < x, or
T

The resulting value of vector truncation is comprised of elements [0..y), dropping elements [y..x).

A prvalue of type matrix<T,x,y> can be converted to a prvalue of type:

matrix<T,z,w> only if x ≥ z and y ≥ w,
vector<T,z> only if x ≥ z, or
T.

Matrix truncation is performed on each row and column dimension separately. The resulting value is comprised of vectors [0..z) which are each separately comprised of elements [0..w). Trailing vectors and elements are dropped.

Reducing the dimension of a vector to one (vector<T,1>), can produce either a single element vector or a scalar of type T. Reducing the rows of a matrix to one (matrix<T,x,1>), can produce either a single row matrix, a vector of type vector<T,x>, or a scalar of type T.

Component-wise conversions[Conv.cwise]

A glvalue of type vector<T,x> can be converted to a cxvalue of type vector<V,x>, or a prvalue of type vector<T,x> can be converted to a prvalue of type vector<V,x>. The source value is converted by performing the appropriate conversion of each element of type T to an element of type V following the rules for standard conversions in chapter 4.

A glvalue of type matrix<T,x,y> can be converted to a cxvalue of type matrix<V,x,y>, or a prvalue of type matrix<T,x,y> can be converted to a prvalue of type matrix<V,x,y>. The source value is converted by performing the appropriate conversion of each element of type T to an element of type V following the rules for standard conversions in chapter 4.

Qualification conversion[Conv.qual]

A prvalue of type "cv1 T" can be converted to a prvalue of type "cv2 T" if type "cv2 T" is more cv-qualified than "cv1 T".

Conversion Rank[Conv.rank]

Every integer and floating point type have defined conversion ranks. These conversion ranks are used to differentiate between promotions and other conversions (see: [Conv.iprom] and [Conv.fpprom]).

Integer Conversion Rank[Conv.rank.int]

No two signed integer types shall have the same conversion rank even if they have the same representation.
The rank of a signed integer type shall be greater than the rank of any signed integer type with a smaller size.
The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type.
The rank of bool shall be less than the rank of all other standard integer types.
The rank of a minimum precision integer type shall be less than the rank of any other minimum precision integer type with a larger minimum value representation size.
The rank of a minimum precision integer type shall be less than the rank of all standard integer types.
For all integer types T1, T2, and T3: if T1 has greater rank than T2 and T2 has greater rank than T3, then T1 shall have greater rank than T3.

Floating Point Conversion Rank[Conv.rank.float]

The rank half shall be greater than the rank of min16float.
The rank float shall be greater than the rank of half.
The rank double shall be greater than the rank of float.
For all floating point types T1, T2, and T3: if T1 has greater rank than T2 and T2 has greater rank than T3, then T1 shall have greater rank than T3.

Expressions[Expr]

This chapter defines the formulations of expressions and the behavior of operators when they are not overloaded. Only member operators may be overloaded¹⁷. Operator overloading does not alter the rules for operators defined by this standard.

An expression may also be an unevaluated operand when it appears in some contexts. An unevaluated operand is a expression which is not evaluated in the program¹⁸.

Whenever a glvalue appears in an expression that expects a prvalue, a standard conversion sequence is applied based on the rules in 4.

Usual Arithmetic Conversions[Expr.conv]

Binary operators for arithmetic and enumeration type require that both operands are of a common type. When the types do not match the usual arithmetic conversions are applied to yield a common type. When usual arithmetic conversions are applied to vector operands they behave as component-wise conversions (4.11). The usual arithmetic conversions are:

If either operand is of scoped enumeration type no conversion is performed, and the expression is ill-formed if the types do not match.
If either operand is a vector<T,X>, vector truncation or scalar extension is performed with the following rules:
- If both vectors are of the same length, no dimension conversion is required.
- If one operand is a vector and the other operand is a scalar, the scalar is extended to a vector via a Splat conversion (4.9) to match the length of the vector.
- Otherwise, if both operands are vectors of different lengths, the vector of longer length is truncated to match the length of the shorter vector (4.10).
If either operand is of type double or vector<double, X>, the other operator shall be converted to match.
Otherwise, if either operand is of type float or vector<float, X>, the other operand shall be converted to match.
Otherwise, if either operand is of type half or vector<half, X>, the other operand shall be converted to match.
Otherwise, integer promotions are performed on each scalar or vector operand following the appropriate scalar or component-wise conversion (4).
- If both operands are scalar or vector elements of signed or unsigned types, the operand of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
- Otherwise, if both the operand of unsigned scalar or vector element type is of greater rank than the operand of signed scalar or vector element type, the signed operand is converted to the type of the unsigned operand.
- Otherwise, if the operand of signed scalar or vector element type is able to represent all values of the operand of unsigned scalar or vector element type, the unsigned operand is converted to the type of the signed operand.
- Otherwise, both operands are converted to a scalar or vector type of the unsigned integer type corresponding to the type of the operand with signed integer scalar or vector element type.

Primary Expressions[Expr.Primary]

primary-expression:
* literal
* this
* ( expression )
* id-expression
*

Literals[Expr.Primary.Literal]

The type of a literal is determined based on the grammar forms specified in 2.9.1.

This[Expr.Primary.This]

The keyword this names a reference to the implicit object of non-static member functions. The this parameter is always a prvalue of non-cv-qualifiedtype. ¹⁹

A this expression shall not appear outside the declaration of a non-static member function.

Parenthesis[Expr.Primary.Paren]

An expression (E) enclosed in parenthesis has the same type, result and value category as E without the enclosing parenthesis. A parenthesized expression may be used in the same contexts with the same meaning as the same non-parenthesized expression.

Names[Expr.Primary.ID]

The grammar and behaviors of this section are almost identical to C/C++ with some subtractions (notably lambdas and destructors).

id-expression:
* unqualified-id
* qualified-id

Unqualified Identifiers[Expr.Primary.ID.Unqual]

unqualified-id:
* identifier
* operator-function-id
* conversion-function-id
* template-id
*

Qualified Identifiers[Expr.Primary.ID.Qual]

qualified-id:
* nested-name-specifier template_opt unqualified-id
*

nested-name-specifier:
* ::
* type-name ::
* namespace-name ::
* nested-name-specifier identifier ::
* nested-name-specifier template_opt simple-template-id ::

Postfix Expressions[Expr.Post]

postfix-expression:
* primary-expression
* postfix-expression [ expression ]
* postfix-expression [ braced-init-list ]
* postfix-expression ( expression-list_opt )
* simple-type-specifier ( expression-list_opt )
* typename-specifier ( expression_opt )
* simple-type-specifier braced-init-list
* typename-specifier braced-init-list
* postfix-expression . template_opt id-expression
* postfix-expression -> template_opt id-expression
* postfix-expression ++
* postfix-expression –

Subscript[Expr.Post.Subscript]

A postfix-expression followed by an expression in square brackets () is a subscript expression. In an array subscript expression of the form E1[E2], E1 must either be a variable of array of T[], or an object of type T where T provides an overloaded implementation of operator[] (8).²⁰

Function Calls[Expr.Post.Call]

A function call may be an ordinary function, or a member function. In a function call to an ordinary function, the postfix-expression must be an lvalue that refers to a function. In a function call to a member function, the postfix-expression will be an implicit or explicit class member access whose id-expression is a member function name.

When a function is called, each parameter shall be initialized with its corresponding argument. The order in which parameters are initialized is unspecified. ²¹

If the function is a non-static member function the this argument shall be initialized to a reference to the object of the call as if casted by an explicit cast expression to an lvalue reference of the type that the function is declared as a member of.

Parameters are either input parameters, output parameters, or input/output parameters as denoted in the called function’s declaration (7.5). For all types of parameters the argument expressions are evaluated before the function call occurs.

Input parameters are passed by-value into a function. If an argument to an input parameter is of constant-sized array type, the array is copied to a temporary and the temporary value is converted to an address via array-to-pointer decay. If an argument is an unsized array type, the array lvalue directly decays via array-to-pointer decay. ²²

Arguments to output and input/output parameters must be lvalues. Output parameters are not initialized prior to the call; they are passed as an uninitialized cxvalue (3.9). An output parameter is only initialized explicitly inside the called function. It is undefined behavior to not explicitly initialize an output parameter before returning from the function in which it is defined. The cxvalue created from an argument to an input/output parameter is initialized through copy-initialization from the lvalue argument expression. Overload resolution shall occur on argument initialization as if the expression T Param = Arg were evaluated. In both cases, the cxvalue shall have the type of the parameter and the argument can be converted to that type through implicit or explicit conversion.

If an argument to an output or input/output parameter is a constant sized array, the array is copied to a temporary cxvalue following the same rules for any other data type. If an argument to an output or input/output parameter is an unsized array type, the array lvalue directly decays via array-to-pointer decay. An argument of a constant sized array of type T[N] can be converted to a cxvalue of an unsized array of type T[] through array to pointer decay. An unsized array of type T[], cannot be implicitly converted to a a constant sized array of type T[N].

On expiration of the cxvalue, the value is assigned back to the argument lvalue expression using a resolved assignment expression as if the expression Arg = Param were written²³. The argument expression must be of a type or able to convert to a type that has defined copy-initialization to and assignment from the parameter type. The lifetime of the cxvalue begins at argument expression evaluation, and ends after the function returns. A cxvalue argument is passed by-address to the caller.

If the lvalue passed to an output or input/output parameter does not alias any other parameter passed to that function, an implementation may avoid the creation of excess temporaries by passing the address of the lvalue instead of creating the cxvalue.

When a function is called, any parameter of object type must have completely defined type, and any parameter of array of object type must have completely defined element type.²⁴ The lifetime of a parameter ends on return of the function in which it is defined.²⁵ Initialization and destruction of each parameter occurs within the context of the calling function.

The value of a function call is the value returned by the called function.

A function call is an lvalue if the result type is an lvalue reference type; otherwise it is a prvalue.

If a function call is a prvalue of object type, the type of the prvalue must be complete.

Statements[Stmt]

statement:
* labeled-statement
* attribute-specifier-sequence_opt expression-statement
* attribute-specifier-sequence_opt compound-statement
* attribute-specifier-sequence_opt iteration-statement
* attribute-specifier-sequence_opt selection-statement
* declaration-statement

Label Statements[Stmt.Label]

The optional attribute-specifier-sequence applies to the statement that immediately follows it.

Attributes[Stmt.Attr]

Unroll Attribute[Stmt.Attr.Unroll]

The [unroll] attribute is only valid when applied to iteration-statements. It is used to indicate that iteration-statements like for, while and do while can be unrolled. This attribute qualifier can be used to specify full unrolling or partial unrolling by a specified amount. This is a compiler hint and the compiler may ignore this directive.

The unroll attribute may optionally have an unroll factor represented as a single argument n that is an integer constant expression value greater than zero. If n is not specified, the compiler determines the unrolling factor for the loop. The [unroll] attribute can not be applied to the same iteration-statement as the attribute.

Loop Attribute[Stmt.Attr.Loop]

The Attribute tells the compiler to execute each iteration of the loop. In other words, its a hint to indicate a loop should not be unrolled. Therefore it is not compatible with the attribute.

Declarations[Decl]

Preamble[Decl.Pre]

Declarations generally specify how names are to be interpreted. Declarations have the form

declaration-seq:
* declaration
* declaration-seq declaration

declaration:
* name-declaration
* special-declaration

name-declaration:
* ...

special-declaration:
* export-declaration-group
* ...

Specifiers[Decl.Spec]

General[Decl.Spec.General]

The specifiers that can be used in a declaration are

decl-specifier:
* function-specifier
* ...

Function specifiers[Decl.Spec.Fct]

A function-specifier can be used only in a function declaration.

function-specifier:
* export
*

The export specifier denotes that the function has program linkage (3.6.1).

The export specifier cannot be used on functions directly or indirectly within an unnamed namespace.

Functions with program linkage can also be specified in export-declaration-group (7.7).

If a function is declared with an export specifier then all redeclarations of the same function must also use the export specifier or be part of export-declaration-group (7.7).

Declarators[Decl.Decl]

Initializers[Decl.Init]

The process of initialization described in this section applies to all initializers regardless of the context.

initializer:
* brace-or-equal-initializer
* ( expression-list )
*

brace-or-equal-initializer:
* = initializer-clause
* braced-init-list
*

initializer-clause:
* assignment-expression
* braced-init-list
*

braced-init-list:
* { initializer-list ,_opt }
* { }
*

initializer-list:
* initializer-clause
* initializer-list , initializer-clause
*

Aggregate Initialization[Decl.Init.Agg]

An aggregate is a vector, matrix, array, or class.

The subobjects of an aggregate have a defined order. For vectors and arrays the order is increasing subscript order. For matrices it is increasing subscript order with the subscript nesting such that in the notation Mat[M][N], the ordering is Mat[0][0]...Mat[0][N]...Mat[M][0]...Mat[M][N]. For classes the order is base class, followed by member subobjects in declaration order.

A flattened ordering of subobjects can be produced by performing a depth-first traversal of the subobjects of an object following the defined subobject ordering.

Each braced initializer list is comprised of zero or more initializer-clause expressions, which is either another braced initializer list or an expression which generates a value that either is or can be implicitly converted to an rvalue. Each assignment-expression is an object, which may be a scalar or aggregate type. A flattened initializer sequence is constructed by a depth-first traversal over each assignment-expression in an initializer-list and performing a depth-first traversal accessing each subobject of the assignment-expression.

An initializer-list is a valid initializer if for each element E_n in the target object’s flattened ordering there is a corresponding initializer I_n in the flattened initializer sequence which can be implicitly converted to the element’s type.

An initializer-list is invalid if the flattened initializer sequence contains more or fewer elements than the target object’s flattened ordering, or if any initializer I_n cannot be implicitly converted to the corresponding element E_n’s type.

Function Definitions[Decl.Function]

Attributes[Decl.Attr]

Semantic Annotations[Decl.Attr.Semantic]

Entry Attributes[Decl.Attr.Entry]

Export Declarations[Decl.Export]

One or more functions with external linkage can be also specified in the form of

export-declaration-group:
* export { function-declaration-seq_opt }
*

function-declaration-seq:
* function-declaration function-declaration-seq_opt

The export specifier denotes that every function-declaration included in function-declaration-seq has external linkage (3.6.2).

The export-declaration-group declaration cannot appear directly or indirectly within an unnamed namespace.

Functions with external linkage can also be declared with an export specifier (7.2.2).

If a function is part of an export-declaration-group then all redeclarations of the same function must also be part on a export-declaration-group or be declared with an export specifier (7.2.2).

Overloading[Overload]

HLSL inherits much of its overloading behavior from C++. This chapter is extremely similar to isoCPP clause [over]. Notable differences exist around HLSL’s parameter modifier keywords, program entry points, and overload conversion sequence ranking.

When a single name is declared with two or more different declarations in the same scope, the name is overloaded. A declaration that declares an overloaded name is called an overloaded declaration. The set of overloaded declarations that declare the same overloaded name are that name’s overload set.

Only function and template declarations can be overloaded; variable and type declarations cannot be overloaded.

Overloadable Declarations[Overload.Decl]

This section specifies the cases in which a function declaration cannot be overloaded. Any program that contains an invalid overload set is ill-formed.

In overload set is invalid if:

One or more declaration in the overload set only differ by return type.

int Yeet(); uint Yeet(); // ill-formed: decls differ only by return type
An overload set contains more than one member function declarations with the same parameter-type-list, and one of those declarations is a static member function declaration (9.1).

class Doggo

static void pet(); void pet(); // ill-formed: static pet has the same parameter-type-list void pet() const; // ill-formed: static pet has the same parameter-type-list

void wagTail(); // valid: no conflicting static declaration. void wagTail() const; // valid: no conflicting static declaration.

static void bark(Doggo D); void bark(); // valid: static bark parameter-type-list is different void bark() const; // valid: static bark parameter-type-list is different

;
An overload set contains more than one entry function declaration (7.6.2).

void VS(); void VS(int); // valid: only one entry point.

[shader("vertex")] void Entry();

[shader("compute")] void Entry(int); // ill-formed: an overload set cannot have more than one entry function
An overload set contains more than one function declaration which only differ in parameter declarations of equivalent types.

void F(int4 I); void F(vector<int, 4> I); // ill-formed: int4 is a type alias of vector<int, 4>
An overload set contains more than one function declaration which only differ in const specifiers.

void G(int); void G(const int); // ill-formed: redeclaration of G(int) void G(int) void G(const int) // ill-formed: redefinition of G(int)
An overload set contains more than one function declaration which only differ in parameters mismatching out and inout.

void H(int); void H(in int); // valid: redeclaration of H(int) void H(inout int); // valid: overloading between in and inout is allowed

void I(in int); void I(out int); // valid: overloading between in and out is allowed

void J(out int); void J(inout int); // ill-formed: Cannot overload based on out/inout mismatch

Overload Resolution[Overload.Res]

Overload resolution is process by which a function call is mapped to a the best overloaded function declaration. Overload resolution uses set of functions called the candidate set, and a list of expressions that comprise the argument list for the call.

Overload resolution selects the function to call in the following contexts²⁶:

invocation of a function named in a function call expression;
invocation of a function call operator on a class object named in function call syntax;
invocation of the operator referenced in an expression;
invocation of a user-defined conversion for copy-initialization of a class object;
invocation of a conversion function for initialization of an object of a nonclass type from an expression of class type.

In each of these contexts a unique method is used to construct the overload candidate set and argument expression list.

Candidate Functions and Argument Lists[Overload.Res.Sets]

isoCPP goes into a lot of detail in this section about how candidate functions and argument lists are selected for each context where overload resolution is performed. HLSL matches C++ for the contexts that HLSL inherits. For now, this section will be left as a stub, but HLSL inherits the following sections from C++:

[over.call.func]
[over.call.object]
[over.match.oper]
[over.match.copy]
[over.match.conv]

Viable Functions[Overload.Res.Viable]

Given the candidate set and argument expressions as determined by the relevant context (8.2.1), a subset of viable functions can be selected from the candidate set.

A function candidate F(P₀...P_m) is not a viable function for a call with argument list A₀...A_n if:

The function has fewer parameters than there are arguments in the argument list (m < n).
The function has more parameters than there are arguments to the argument list (m > n), and function parameters P_n + 1...P_m do not all have default arguments.
There is not an implicit conversion sequence that converts each argument A_i to the type of the corresponding parameter P_i.

Best Viable Function[Overload.Res.Best]

For an overloaded call with arguments A₀...A_n, each viable function F(P₀...P_m), has a set of implicit conversion sequences ICS₀(F)...ICS_m(F) defining the conversion sequences for each argument A_i to the type of parameter P_i.

A viable function F is defined to be a better function than another viable function $F`$ if for all arguments ICS_i(F) is not a worse conversion sequence than $ICS_i(F`)$, and:

for some argument j, ICS_j(F) is a better conversion than $ICS_j(F`)$ or,
in the context of an initialization by user-defined conversion, the conversion sequence from the return type of F to the destination type is a better conversion sequence than the return type of $F`$ to the destination type or,
F is a non-template function and $F`$ is a function template specialization, or
F and $F`$ are both function template specializations and F is more specialized than $F`$ according to function template partial ordering rules (10.2).

If there is one viable function that is a better function than all the other viable functions, it is the selected function; otherwise the call is ill-formed.

If the resolved overload is a function with multiple declarations, and if at least two of these declarations specify a default argument that made the function viable, the program is ill-formed.

void F(int X = 1); void F(float Y = 2.0f);

void Fn() F(1); // Okay. F(3.0f); // Okay. F(); // Ill-formed.

Implicit Conversion Sequences[Overload.ICS]

An implicit conversion sequence is a sequence of conversions which converts a source value to a prvalue of destination type. In overload resolution the source value is the argument expression in a function call, and the destination type is the type of the corresponding parameter of the function being called.

When a parameter is a cxvalue an inverted implicit conversion sequence is required to convert the parameter type back to the argument type for writing back to the argument expression lvalue. An inverted implicit conversion sequence must be a well-formed implicit conversion sequence where the source value is the implicit cxvalue of the parameter type, and the destination type is the argument expression’s lvalue type.

A well-formed implicit conversion sequence is either a standard conversion sequence, or a user-defined conversion sequence.

In the following contexts an implicit conversion sequence can only be a standard conversion sequence:

Argument conversion for a user-defined conversion function.
Copying a temporary for class copy-initialization.
When passing an initializer-list as a single argument.
Copy-initialization of a class by user-defined conversion.

An implicit conversion sequence models a copy-initialization unless it is an inverted implicit conversion sequence when it models an assignment. Any difference in top-level cv-qualification is handled by the copy-initialization or assignment, and does not constitute a conversion²⁷.

When the source value type and the destination type are the same, the implicit conversion sequence is an identity conversion, which signifies no conversion.

Only standard conversion sequences that do not create temporary objects are valid for implicit object parameters or left operand to assignment operators.

If no sequence of conversions can be found to convert a source value to the destination type, an implicit conversion sequence cannot be formed.

If several different sequences of conversions exist that convert the source value to the destination type, the implicit conversion sequence is defined to be the unique conversion sequence designated the ambiguous conversion sequence. For the purpose of ranking implicit conversion sequences, the ambiguous conversion sequence is treated as a user-defined sequence that is indistinguishable from any other user-defined conversion sequence. If overload resolution selects a function using the ambiguous conversion sequence as the best match for a call, the call is ill-formed.

Standard Conversion Sequences[Overload.ICS.SCS]

The conversions that comprise a standard conversion sequence and the composition of the sequence are defined in Chapter 4.

Each standard conversion is given a category and rank as defined in the table below:

Conversion	Category	Rank	Reference
No conversion	Identity
1-2 Lvalue-to-rvalue			4.1
4-4 Array-to-pointer	Lvalue Transformation	Exact Match	4.2
1-2 Qualification	Qualification Adjustment		4.12
1-4 Scalar splat (without conversion)	Scalar Extension	Extension	4.9
1-4 Integral promotion			4.5 & 4.13.1
1-1 Floating point promotion	Promotion	Promotion	4.6 & 4.13.2
1-1 Component-wise promotion			4.11
1-4 Scalar splat promotion	Scalar Extension Promotion	Promotion Extension	4.9
1-4 Integral conversion			4.5
1-1 Floating point conversion			4.6
1-1 Floating-integral conversion	Conversion	Conversion	4.7
1-1 Boolean conversion			4.8
1-1 Component-wise conversion			4.11
1-4 Scalar splat conversion	Scalar Extension Conversion	Conversion Extension	4.9
1-4 Vector truncation (without conversion)	Dimensionality Reduction	Truncation	4.10
1-4 Vector truncation promotion	Dimensionality Reduction Promotion	Promotion Truncation	4.10
1-4 Vector truncation conversion	Dimensionality Reduction Conversion	Conversion Truncation	4.10
1-4

If a scalar splat conversion occurs in a conversion sequence where all other conversions are Exact Match rank, the conversion is ranked as Extension. If a scalar splat occurs in a conversion sequence with a Promotion conversion, the conversion is ranked as Promotion Extension. If a scalar splat occurs in a conversion sequence with a Conversion conversion, the conversion is ranked as Conversion Extension.

If a vector truncation conversion occurs in a conversion sequence where all other conversions are Exact Match rank, the conversion is ranked as Truncation. If a vector truncation occurs in a conversion sequence with a Promotion conversion, the conversion is ranked as Promotion Truncation. If a vector truncation occurs in a conversion sequence with a Conversion conversion, the conversion is ranked as Conversion Truncation.

Otherwise, the rank of a conversion sequence is determined by considering the rank of each conversion.

Conversion sequence ranks are ordered from better to worse as:

Exact Match
Extension
Promotion
Promotion Extension
Conversion
Conversion Extension
Truncation
Promotion Truncation
Conversion Truncation

Comparing Implicit Conversion Sequences[Overload.ICS.Comparing]

A partial ordering of implicit conversion sequences exists based on defining relationships for better conversion sequence, and better conversion. If an implicit conversion sequence ICS(f) is a better conversion sequence than $ICS(f`)$, then the inverse is also true: $ICS(f`)$ is a worse conversion sequence than ICS(f). If ICS(f) is neither better nor worse than $ICS(f`)$, the conversion sequences are indistinguishable conversion sequences.

A standard conversion sequence is always better than a user-defined conversion sequence.

Standard conversion sequences are ordered by their ranks. Two conversion sequences with the same rank are indistinguishable unless one of the following rules applies:

If class B is derived directly or indirectly from class A and class C is derived directly or indirectly from class B,
- binding of a expression of type C to a cxvalue of type B is better than binding an expression of type C to a cxvalue of type A,
- conversion of C to B is better than conversion or C to A,
- binding of a expression of type B to a cxvalue of type A is better than binding an expression of type C to a cxvalue of type A,
- conversion of B to A is better than conversion of C to A.

Classes[Classes]

Static Members[Classes.Static]

Conversions[Classes.Conversions]

Templates[Template]

Template Instantiation[Template.Inst]

Partial Ordering of Function Templates[Template.Func.Order]

Intangible Types[Intangible]

Runtime[Runtime]

The preprocessor is inherited from C++ 11 with no grammar extensions. It is specified here only for completeness.↩︎
This grammar formulation is not context-free and requires an LL(2) parser.↩︎
This behavior matches isoC but is reduced in scope because HLSL has fewer data types.↩︎
This substantially deviates from the implementations in fxc and dxc, but is consistent with the official documentation and the behavior of GLSL. It is also substantially simpler to implement and more regular than the existing behaviors.↩︎
HLSL does not have goto, and labeled statements are only valid within switch statements.↩︎
Global variable declarations are implicitly constant and external in HLSL.↩︎
These are not really linked with other translation units but rather their values are loaded indirectly based on cbuffer mapping.↩︎
In DXC today functions that are not entry points or exported have internal linkage by default. This can be overriden by -default-linkage compiler option.↩︎
sizeof(T) returns the size of the object as-if it’s stored in device memory, and determining the size if it’s stored in another memory space is not possible.↩︎
C23 adopts two’s compliment as the object representation for integer types.↩︎
IEEE-754 only defines a binary encoding for 16-bit floating point values, it does not fully specify the behavior of such types.↩︎
This means when stored to memory objects of type min16float are stored as binary32 as defined in IEEE754.↩︎
This differs from C++ with the addition of flat conversion.↩︎
C++ does not support dimension altering conversions for scalar, vector or matrix types.↩︎
hlsl does not support grammar for specifying pointer or reference types, however they are used in the type system and must be described in language rules.↩︎
Array-to-pointer conversion of constant sized arrays is not supported.↩︎
This will change in the future, but this document assumes current behavior.↩︎
The operand to sizeof(...) is a good example of an unevaluated operand. In the code sizeof(Foo()), the call to Foo() is never evaluated in the program.↩︎
HLSL Specs Proposal 0007 proposes adopting C++-like syntax and semantics for cv-qualified this references.↩︎
HLSL does not support the base address of a subscript operator being the expression inside the braces, which is valid in C and C++.↩︎
Today in DXC targeting DXIL matches the Microsoft C++ ABI and evaluates argument expressions right-to-left, while SPIR-V generation matches the Itanium ABI evaluating parameters left-to-right. There are good arguments for unifying these behaviors, and arguments for keeping them different.↩︎
This results in input parameters of unsized arrays being modifiable by a function.↩︎
The argument expression is not re-evaluated after the call, so any side effects of the call occur only before the call.↩︎
HLSL output and input/output parameters are passed by value, so they must also have complete type.↩︎
As stated above cxvalue parameters are passed-by-address, so the expiring parameter is the reference to the address, not the cxvalue. The cxvalue expires in the caller.↩︎
DXC only supports overload resolution for function calls and invocation of operators during expressions. Clang will support all contexts listed.↩︎
"Top-level" cv-qualification refers to the qualification of the value. This means an parameter of type T can be initialized by a argument of type const T. This does not mean that a parameter of type inout T can be initialized with a argument of type const T because there is no valid inverted conversion system to assign back to a value of type const T.↩︎