2 Lexical Conventions[Lex]

2.1 Unit of Translation[Lex.Translation]

The text of hlsl programs is collected in source and header files. The distinction between source and header files is social and not technical. An implementation will construct a translation unit from a single source file and any included source or header files referenced via the #include preprocessing directive conforming to the isoC preprocessor specification.

An implementation may implicitly include additional sources as required to expose the hlsl library functionality as defined in (12).

2.2 Phases of Translation[Lex.Phases]

hlsl inherits the phases of translation from isoCPP, with minor alterations, specifically the removal of support for trigraph and digraph sequences. Below is a description of the phases.

Source files are characters that are mapped to the basic source character set in an implementation-defined manner.
Any sequence of backslash (\) immediately followed by a new line is deleted, resulting in splicing lines together.
Tokenization occurs and comments are isolated. If a source file ends in a partial comment or preprocessor token the program is ill-formed and a diagnostic shall be issued. Each comment block shall be treated as a single white-space character.
Preprocessing directives are executed, macros are expanded, pragma and other unary operator expressions are executed. Processing of #include directives results in all preceding steps being executed on the resolved file, and can continue recursively. Finally all preprocessing directives are removed from the source.
Character and string literal specifiers are converted into the appropriate character set for the execution environment.
Adjacent string literal tokens are concatenated.
White-space is no longer significant. Syntactic and semantic analysis occurs translating the whole translation unit into an implementation-defined representation.
The translation unit is processed to determine required instantiations, the definitions of the required instantiations are located, and the translation and instantiation units are merged. The program is ill-formed if any required instantiation cannot be located or fails during instantiation.
External references are resolved, library references linked, and all translation output is collected into a single output.

2.3 Character Sets[Lex.CharSet]

The basic source character set is a subset of the ASCII character set. The table below lists the valid characters and their ASCII values:

Hex ASCII Value	Character Name	Glyph or C Escape Sequence
0x09	Horizontal Tab	`\t`
0x0A	Line Feed	`\n`
0x0D	Carriage Return	`\r`
0x20	Space
0x21	Exclamation Mark	`!`
0x22	Quotation Mark	`"`
0x23	Number Sign	`#`
0x25	Percent Sign	`%`
0x26	Ampersand	`&`
0x27	Apostrophe	`’`
0x28	Left Parenthesis	`(`
0x29	Right Parenthesis	`)`
0x2A	Asterisk	`*`
0x2B	Plus Sign	`+`
0x2C	Comma	`,`
0x2D	Hyphen-Minus	`-`
0x2E	Full Stop	`.`
0x2F	Solidus	`/`
0x30 .. 0x39	Digit Zero .. Nine	`0 1 2 3 4 5 6 7 8 9`
0x3A	Colon	`:`
0x3B	Semicolon	`;`
0x3C	Less-than Sign	`<`
0x3D	Equals Sign	`=`
0x3E	Greater-than Sign	`>`
0x3F	Question Mark	`?`
0x41 .. 0x5A	Latin Capital Letter A .. Z	`A B C D E F G H I J K L M`
		`N O P Q R S T U V W X Y Z`
0x5B	Left Square Bracket	`[`
0x5C	Reverse Solidus	`\`
0x5D	Right Square Bracket	`[`
0x5E	Circumflex Accent	`^`
0x5F	Underscore	`_`
0x61 .. 0x7A	Latin Small Letter a .. z	`a b c d e f g h i j k l m`
		`n o p q r s t u v w x y z`
0x7B	Left Curly Bracket	`{`
0x7C	Vertical Line	`\|`
0x7D	Right Curly Bracket	`}`

An implementation may allow source files to be written in alternate extended character sets as long as that set is a superset of the basic character set. The translation character set is an extended character set or the basic character set as chosen by the implementation.

2.4 Preprocessing Tokens[Lex.PPTokens]

preprocessing-token:
header-name
identifier
pp-number
character-literal
string-literal
preprocessing-op-or-punc
each non-whitespace character from the translation character set that cannot be one of the above

Each preprocessing token that is converted to a token shall have the lexical form of a keyword, an identifier, a constant, a string literal or an operator or punctuator.

Preprocessing tokens are the minimal lexical elements of the language during translation phases 3 through 6 (2.2). Preprocessing tokens can be separated by whitespace in the form of comments, white space characters, or both. White space may appear within a preprocessing token only as part of a header name or between the quotation characters in a character constant or string literal.

Header name preprocessing tokens are only recognized within #include preprocessing directives, __has_include expressions, and implementation-defined locations within #pragma directives. In those contexts, a sequence of characters that could be either a header name or a string literal is recognized as a header name.

2.5 Tokens[Lex.Tokens]

token:
identifier
keyword
literal
operator-or-punctuator

There are five kinds of tokens: identifiers, keywords, literals, and operators or punctuators. All whitespace characters and comments are ignored except as they separate tokens.

2.6 Comments[Lex.Comments]

The characters /* start a comment which terminates with the characters /. The characters // start a comment which terminates at the next new line.

2.7 Header Names[Lex.Headers]

header-name:
< h-char-sequence >
" q-char-sequence "

h-char-sequence:
h-char
h-char-sequence h-char

h-char:
any character in the translation character set except newline or >

q-char-sequence:
q-char
q-char-sequence q-char

q-char:
any character in the translation character set except newline or "

Character sequences in header names are mapped to header files or external source file names in an implementation defined way.

2.8 Preprocessing numbers[Lex.PPNumber]

pp-number:
digit
. digit
pp-number ’ digit
pp-number ’ non-digit
pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .

Preprocessing numbers begin with a digit or period (.), and may be followed by valid identifier characters and floating point literal suffixes (e+, e-, E+, E-, p+, p-, P+, and P-). Preprocessing number tokens lexically include all integer-literal and floating-literal tokens.

Preprocessing numbers do not have types or values. Types and values are assigned to integer-literal, floating-literal, and vector-literal tokens on successful conversion from preprocessing numbers.

A preprocessing number cannot end in a period (.) if the immediate next token is a scalar-element-sequence (2.9.4). In this situation the pp-number token is truncated to end before the period².

2.9 Literals[Lex.Literals]

2.9.1 Literal Classifications[Lex.Literal.Kinds]

literal:
integer-literal
character-literal
floating-literal
string-literal
boolean-literal
vector-literal

2.9.2 Integer Literals[Lex.Literal.Int]

integer-literal:
decimal-literal integer-suffix_opt
octal-literal integer-suffix_opt
hexadecimal-literal integer-suffix_opt
decimal-literal:
nonzero-digit
decimal-literal digit
octal-literal:
octal-literal octal-digit
hexadecimal-literal:
0x hexadecimal-digit
0X hexadecimal-digit
hexadecimal-literal hexadecimal-digit
nonzero-digit: one of
2 3 4 5 6 7 8 9
octal-digit: one of
1 2 3 4 5 6 7
hexadecimal-digit: one of
1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
integer-suffix:
unsigned-suffix long-suffix_opt
long-suffix unsigned-suffix_opt
unsigned-suffix: one of
u U
long-suffix: one of
l L

An integer literal is an optional base prefix, a sequence of digits in the appropriate base, and an optional type suffix. An integer literal shall not contain a period or exponent specifier.

The type of an integer literal is the first of the corresponding list in the table below in which its value can be represented³.

Suffix	Decimal constant	Octal or hexadecimal constant
none	int32_t	int32_t
	int64_t	uint32_t
		int64_t
		uint64_t
u or U	uint32_t	uint32_t
	uint64_t	uint64_t
l or L	int64_t	int64_t
		uint64_t
Both u or U	uint64_t	uint64_t
and l or L

If the specified value of an integer literal cannot be represented by any type in the corresponding list, the integer literal has no type and the program is ill-formed.

An implementation may support the integer suffixes ll and ull as equivalent to l and ul respectively.

2.9.3 Floating-point Literals[Lex.Literal.Float]

floating-literal:
fractional-constant exponent-part_opt floating-suffix_opt
digit-sequence exponent-part floating-suffx_opt
fractional-constant:
digit-sequence_opt . digit-sequence
digit-sequence .
exponent-part:
e sign_opt digit-sequence
E sign_opt digit-sequence
sign: one of
+ -
digit-sequence:
digit
digit-sequence digit floating-suffix: one of h f l H F L

A floating literal is written either as a fractional-constant with an optional exponent-part and optional floating-suffix, or as an integer digit-sequence with a required exponent-part and optional floating-suffix.

The type of a floating literal is float, unless explicitly specified by a suffix. The suffixes h and H specify half, the suffixes f and F specify float, and the suffixes l and L specify double.⁴ If a value specified in the source is not in the range of representable values for its type, the program is ill-formed.

2.9.4 Vector Literals[Lex.Literal.Vector]

vector-literal:
integer-literal . scalar-element-sequence
floating-literal . scalar-element-sequence

scalar-element-sequence:
scalar-element-sequence-x
scalar-element-sequence-r

scalar-element-sequence-x:
x
scalar-element-sequence-x x

scalar-element-sequence-r:
r
scalar-element-sequence-r r

A vector-literal is an integer-literal or floating-point literal followed by a period (.) and a scalar-element-sequence.

A scalar-element-sequence is a vector-swizzle-sequence where only the first vector element accessor is valid (x or r). A scalar-element-sequence is equivalent to a vector splat conversion performed on the integer-literal or floating-literal value (4.11).