Lexical Conventions[Lex]

2 Lexical Conventions[Lex]

2.1 Unit of Translation[Lex.Translation]

The text of hlsl programs is collected in source and header files. The distinction between source and header files is social and not technical. An implementation will construct a translation unit from a single source file and any included source or header files referenced via the #include preprocessing directive conforming to the isoC preprocessor specification.

An implementation may implicitly include additional sources as required to expose the hlsl library functionality as defined in (13).

2.2 Phases of Translation[Lex.Phases]

hlsl inherits the phases of translation from isoCPP, with minor alterations, specifically the removal of support for trigraph and digraph sequences. Below is a description of the phases.

  1. Source files are characters that are mapped to the basic source character set in an implementation-defined manner.

  2. Any sequence of backslash (\) immediately followed by a new line is deleted, resulting in splicing lines together.

  3. Tokenization occurs and comments are isolated. If a source file ends in a partial comment or preprocessor token the program is ill-formed and a diagnostic shall be issued. Each comment block shall be treated as a single white-space character.

  4. Preprocessing directives are executed, macros are expanded, pragma and other unary operator expressions are executed. Processing of #include directives results in all preceding steps being executed on the resolved file, and can continue recursively. Finally all preprocessing directives are removed from the source.

  5. Character and string literal specifiers are converted into the appropriate character set for the execution environment.

  6. Adjacent string literal tokens are concatenated.

  7. White-space is no longer significant. Syntactic and semantic analysis occurs translating the whole translation unit into an implementation-defined representation.

  8. The translation unit is processed to determine required instantiations, the definitions of the required instantiations are located, and the translation and instantiation units are merged. The program is ill-formed if any required instantiation cannot be located or fails during instantiation.

  9. External references are resolved, library references linked, and all translation output is collected into a single output.

2.3 Character Sets[Lex.CharSet]

The basic source character set is a subset of the ASCII character set. The table below lists the valid characters and their ASCII values:

Hex ASCII Value Character Name Glyph or C Escape Sequence
0x09 Horizontal Tab \t
0x0A Line Feed \n
0x0D Carriage Return \r
0x20 Space
0x21 Exclamation Mark !
0x22 Quotation Mark "
0x23 Number Sign #
0x25 Percent Sign %
0x26 Ampersand &
0x27 Apostrophe
0x28 Left Parenthesis (
0x29 Right Parenthesis )
0x2A Asterisk *
0x2B Plus Sign +
0x2C Comma ,
0x2D Hyphen-Minus -
0x2E Full Stop .
0x2F Solidus /
0x30 .. 0x39 Digit Zero .. Nine 0 1 2 3 4 5 6 7 8 9
0x3A Colon :
0x3B Semicolon ;
0x3C Less-than Sign <
0x3D Equals Sign =
0x3E Greater-than Sign >
0x3F Question Mark ?
0x41 .. 0x5A Latin Capital Letter A .. Z A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
0x5B Left Square Bracket [
0x5C Reverse Solidus \
0x5D Right Square Bracket [
0x5E Circumflex Accent ^
0x5F Underscore _
0x61 .. 0x7A Latin Small Letter a .. z a b c d e f g h i j k l m
n o p q r s t u v w x y z
0x7B Left Curly Bracket {
0x7C Vertical Line |
0x7D Right Curly Bracket }

An implementation may allow source files to be written in alternate extended character sets as long as that set is a superset of the basic character set. The translation character set is an extended character set or the basic character set as chosen by the implementation.

2.4 Preprocessing Tokens[Lex.PPTokens]

preprocessing-token:
header-name
identifier
pp-number
character-literal
string-literal
preprocessing-op-or-punc
each non-whitespace character from the translation character set that cannot be one of the above

1

Each preprocessing token that is converted to a token shall have the lexical form of a keyword, an identifier, a constant, a string literal or an operator or punctuator.

Preprocessing tokens are the minimal lexical elements of the language during translation phases 3 through 6 (2.2). Preprocessing tokens can be separated by whitespace in the form of comments, white space characters, or both. White space may appear within a preprocessing token only as part of a header name or between the quotation characters in a character constant or string literal.

Header name preprocessing tokens are only recognized within #include preprocessing directives, __has_include expressions, and implementation-defined locations within #pragma directives. In those contexts, a sequence of characters that could be either a header name or a string literal is recognized as a header name.

2.5 Tokens[Lex.Tokens]

token:
identifier
keyword
literal
operator-or-punctuator

There are five kinds of tokens: identifiers, keywords, literals, and operators or punctuators. All whitespace characters and comments are ignored except as they separate tokens.

2.6 Comments[Lex.Comments]

The characters /* start a comment which terminates with the characters /. The characters // start a comment which terminates at the next new line.

2.7 Header Names[Lex.Headers]

header-name:
< h-char-sequence >
" q-char-sequence "

h-char-sequence:
h-char
h-char-sequence h-char

h-char:
any character in the translation character set except newline or >

q-char-sequence:
q-char
q-char-sequence q-char

q-char:
any character in the translation character set except newline or "

Character sequences in header names are mapped to header files or external source file names in an implementation defined way.

2.8 Preprocessing numbers[Lex.PPNumber]

pp-number:
digit
. digit
pp-number digit
pp-number non-digit
pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .

Preprocessing numbers begin with a digit or period (.), and may be followed by valid identifier characters and floating point literal suffixes (e+, e-, E+, E-, p+, p-, P+, and P-). Preprocessing number tokens lexically include all integer-literal and floating-literal tokens.

Preprocessing numbers do not have types or values. Types and values are assigned to integer-literal, floating-literal, and vector-literal tokens on successful conversion from preprocessing numbers.

A preprocessing number cannot end in a period (.) if the immediate next token is a scalar-element-sequence (2.9.4). In this situation the pp-number token is truncated to end before the period2.

2.9 Literals[Lex.Literals]

2.9.1 Literal Classifications[Lex.Literal.Kinds]

literal:
integer-literal
character-literal
floating-literal
string-literal
boolean-literal
vector-literal

2.9.2 Integer Literals[Lex.Literal.Int]

integer-literal:
decimal-literal integer-suffixopt
octal-literal integer-suffixopt
hexadecimal-literal integer-suffixopt
decimal-literal:
nonzero-digit
decimal-literal digit
octal-literal:
octal-literal octal-digit
hexadecimal-literal:
0x hexadecimal-digit
0X hexadecimal-digit
hexadecimal-literal hexadecimal-digit
nonzero-digit: one of
2 3 4 5 6 7 8 9
octal-digit: one of
1 2 3 4 5 6 7
hexadecimal-digit: one of
1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
integer-suffix:
unsigned-suffix long-suffixopt
long-suffix unsigned-suffixopt
unsigned-suffix: one of
u U
long-suffix: one of
l L

An integer literal is an optional base prefix, a sequence of digits in the appropriate base, and an optional type suffix. An integer literal shall not contain a period or exponent specifier.

The type of an integer literal is the first of the corresponding list in the table below in which its value can be represented3.

Suffix Decimal constant Octal or hexadecimal constant
none int32_t int32_t
int64_t uint32_t
int64_t
uint64_t
u or U uint32_t uint32_t
uint64_t uint64_t
l or L int64_t int64_t
uint64_t
Both u or U uint64_t uint64_t
and l or L

If the specified value of an integer literal cannot be represented by any type in the corresponding list, the integer literal has no type and the program is ill-formed.

An implementation may support the integer suffixes ll and ull as equivalent to l and ul respectively.

2.9.3 Floating-point Literals[Lex.Literal.Float]

floating-literal:
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffxopt
fractional-constant:
digit-sequenceopt . digit-sequence
digit-sequence .
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
sign: one of
+ -
digit-sequence:
digit
digit-sequence digit floating-suffix: one of h f l H F L

A floating literal is written either as a fractional-constant with an optional exponent-part and optional floating-suffix, or as an integer digit-sequence with a required exponent-part and optional floating-suffix.

The type of a floating literal is float, unless explicitly specified by a suffix. The suffixes h and H specify half, the suffixes f and F specify float, and the suffixes l and L specify double.4 If a value specified in the source is not in the range of representable values for its type, the program is ill-formed.

2.9.4 Vector Literals[Lex.Literal.Vector]

vector-literal:
integer-literal . scalar-element-sequence
floating-literal . scalar-element-sequence

scalar-element-sequence:
scalar-element-sequence-x
scalar-element-sequence-r

scalar-element-sequence-x:
x
scalar-element-sequence-x x

scalar-element-sequence-r:
r
scalar-element-sequence-r r

A vector-literal is an integer-literal or floating-point literal followed by a period (.) and a scalar-element-sequence.

A scalar-element-sequence is a vector-swizzle-sequence where only the first vector element accessor is valid (x or r). A scalar-element-sequence is equivalent to a vector splat conversion performed on the integer-literal or floating-literal value (4.10).