reshape_map

Macro reshape_map 

Source
macro_rules! reshape_map {
    ($($any: tt)*) => { ... };
}
Expand description

Maps a multi-dimensional local index and thread layout into a reshaped global index. The idea is that the user need to specify the dimensions of logical index (local + thread ids), and the target tensor dimensions, then we map the logical index into the corresponding target multi-dimension tensor. We have:

  • Logical index — e.g., (threadId, local_access_id) in CUDA.
  • Target tensor dimensions — e.g., a tensor of shape $(\hat{D}_0, \hat{D}1, … \hat{D}{N+M})$

We want to map the logical linear or multi-dimensional index into the corresponding tensor element, possibly in a flattened memory layout. To ensure we can access the tensor properly, developers need to do linear2vec operation to convert threadid to $tids[D_{N+M}][…][D_N]$, and local_access_id to $lids[D_N][…][D_0]$ The reshape_map! macro generates a MapReshape struct that reshapes local thread ID and index to a shape and then creates a linearized combination of them according to a specified layout or weights to get a global access index.

§Macro Signature

The macro takes dimension sizes and optional layout offset to specify the new layout.

gpu::reshape_map!(
    [($local_id_dim, $tensor_dim?),*]  // local id + corresponding tensor dimension
    |
    [($thread_id_dim, $tensor_dim?),*] // thread id + corresponding tensor dimension
    =>
    layout: [-?i1, -?i2, ...]          // permutation of dimensions in the new layout
    offset: $offset                // optional offset (default 0)
)
  • lid_tensor_Dims: array expression specifying the shape of local index, and the array shape corresponding to the local index:
    [(D_0, TD_0), (D_1, TD_1), ..., (D_{N-1}, TD_{N-1})]
  • tid_tensor_Dims: array expression specifying the shape of thread index, and the array shape corresponding to the local index:
    $[(D_N, TD_N), …, (D_{N+M-1}, TD_{N+M-1)]$
  • When $TD_k$ is omitted, it is assumed to be $D_k$.
  • permutation: permutation of dimensions in the new layout:
    $[p_0, p_1, …, p_{N+M-1}]$ (low → high dimension)
    • $0 \le p_k < N$ for thread dimensions
    • $N \le p_k < N+M$ for index dimensions
    • Must be a valid constant permutation [0, 1, 2, 3] or [i0, i1, t0, t1]
    • If omitted, defaults to $[0, 1, …, N+M-1]$
    • Negative index -k is allowed to indicate $TD_k - 1 - id_k$ instead of $id_k$
    • -0 is allowed
    • p_k can be a constant literal or a readable name like t<k-N> or i<k> for thread and index dimensions

§Behavior

  • Translates linear_thread_id and local_id to software-defined multi-dimensional thread IDs:
    $lid_k = lid / \prod_{j=0}^{k - 1}(D_j) \mod D_k$ $lid = \sum_{k=0}^{N}(lid_k * \prod_{j=0}^{k - 1}(D_j))$ $tid_k = tid / \prod_{j=N}^{k - 1}(D_{j+N}) \mod D_{k+N}$ $tid = (tid_0, tid_1, …, tid_{N-1})$ (low → high)
  • Merges thread IDs with index IDs:
    $id = (lid_0, …, lid_{M-1}, tid_0, …, tid_{N-1})$ (low → high) Thus, the logical index has dimention $idxDim = [D_0, D_1, … D_{N+M+1}]$
  • Treats array as shape:
    $\hat{D} = [TD_0, TD_1, .., TD_{N+M-1]$
  • If $TD_k \ne $D_k$, some threads or indices will be skipped.
    Valid access range:
    $0 \le id_k < \min(D_k, TD_k)$
  • Apply reversed flag to index id’_k =if reversed_k {TD_k - 1 - id_k} else {id_k}
  • Global ID $globalid = sum_{k=0}^{N+M}( id’{p{k}} prod_{j=0}^{k - 1}(TD_{p_j}))$
  • Accesses the array in permuted order $arr[id’{p_0}][id’{p_1}]…[id’{p{N+M-1}}]$

§Safety

  • Users cannot create MapReshape instances outside the macro without unsafe {}.
  • Sizes must be non-zero; size = 0 triggers runtime errors and should be treated as functionality errors.
    This will not violate race-free guarantees.
  • Guarantees safe mapping for valid permutations to ensure race-free chunking.

§Examples

See more tests in chunk_scope::test_reshape_map.

§Example 1: No permutation

Similar to MapLinear(3) when num_thread = 4.

gpu::reshape_map!([3] | [4]  => layout: [0, 1]);

which is equivalent to

gpu::reshape_map!([3] | [4] => layout: [i0, t0]);
// local index shape: [3]
// thread shape: [4]
// Access: arr[tid0][idx0]
// access -> tid: [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]

§Example 2: Permutation swaps a thread id and local_id

Similar to MapLinear(1) when num_thread = 4, arr.len = 12.

gpu::reshape_map!([3] | [4] => layout: [1, 0]);

which is equivalent to

gpu::reshape_map!([3] | [4] => layout: [t0, i0]);
// local index shape: [3]
// thread shape: [4]
// array shape: arr[4][3]
// Access: arr[idx0][tid0]
// access -> tid: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]

§Example 3: Software-defined thread dimension > 1

gpu::reshape_map!([3] | [2, 2] => layout: [0, 2, 1]);

which is equivalent to

gpu::reshape_map!([3] | [2, 2] => layout: [i0, t1, t0]);
// local index shape: [3]
// thread shape: [2, 2]
// Access: arr[tid0][tid1][idx0]
// access -> tid: [0, 0, 0, 2, 2, 2, 1, 1, 1, 3, 3, 3]

§Example 4: Swap tid and extra_id, thread dimension > 1

gpu::reshape_map!([3] | [2, 2] => layout: [2, 0, 1]);
gpu::reshape_map!([3] | [2, 2] => layout: [t1, i0, t0]);
// local index shape: [3]
// thread shape: [2, 2]
// array shape: arr[2][2][3]
// Access: arr[tid0][idx0][tid1]
// access -> tid: [0, 2, 0, 2, 0, 2, 1, 3, 1, 3, 1, 3]

§Example 5: Reverse a thread dimension

gpu::reshape_map!([3] | [2, 2] => layout: [0, -1, 2]);
gpu::reshape_map!([3] | [2, 2] => layout: [i0, -t0, t1]);
// local index shape: [3]
// thread shape: [2, 2]
// Access: arr[tid1][(max_tid0 - tid0)][idx0]

§Example 6: Skip some threads by setting a smaller new size

gpu::reshape_map!([3] | [2, (2, 1)] => layout: [0, 1, 2]);
gpu::reshape_map!([3] | [2, (2, 1)] => layout: [i0, t0, t1]);
// local index shape: [3]
// thread shape: [2, 2]
// array shape: arr[1][2][3]
// Access: arr[tid1][tid0][idx0]
// valid access range:
// 0 <= lid0 < 3
// 0 <= tid0 < 2
// 0 <= tid1 < 1
// access -> tid: [0, 0, 0, 1, 1, 1, _, _, _, _, _, _]

§Example 7: Skip some data by setting a larger size

gpu::reshape_map!([(3, 4)] | [2, 2] => layout: [0, 1, 2]);
gpu::reshape_map!([(3, 4)] | [2, 2] => layout: [i0, t0, t1]);
// local index shape: [3]
// thread shape: [2, 2]
// array shape: arr[2][2][4]
// Access: arr[tid1][tid0][idx0]
// access -> tid: [0, 0, 0, _, 1, 1, 1, _, 2, 2, 2, _, 3, 3, 3, _, ...]

§Invalid Examples

§Example 1: Invalid permutation (index out of range)

gpu::reshape_map!([2] | [2, 3] => layout: [1, 2, 3]);

§Example 2: Invalid permutation (duplicate indices)

gpu::reshape_map!([2] | [2, 3] => layout: [0, 0, 1]);
gpu::reshape_map!([2] | [2, 3] => layout: [t0, t0, i0]);

§Example 3: Invalid thread dimension (<1)

gpu::reshape_map!([2, 3] | [] => layout: [0, 1, 2]);

§Example 4: Invalid index dimension (<1)

gpu::reshape_map!( [] | [2, 2] => layout: [0, 1]);