Accera v1.2 Reference

`accera.Plan.bind(mapping)`

Only available for targets that can execute a grid of work (such as GPUs). The bind function binds dimensions of the iteration space to axes of the target-specific grid (such as v100.GridUnit.BLOCK_X, v100.GridUnit.THREAD_X or v100.GridUnit.WARP_X on an Nvidia GPU).

Arguments

argument	description	type/default
`mapping`	Mapping of indices to GPU thread or block identifiers.	dict of `Index` to target-specific identifiers

Examples

Mark the i, j, and k indices to execute on NVidia V100's BLOCK_X, THREAD_X, and THREAD_Y grid axes, respectively.

v100 = acc.Target(Target.Model.NVIDIA_V100)
plan.bind({
    i: v100.GridUnit.BLOCK_X,
    j: v100.GridUnit.THREAD_X,
    k: v100.GridUnit.THREAD_Y
})

In some cases, e.g. with tensorization where it might be non-trivial to assign threads to their respective data, it might be simpler to bind iteration space indices to warps (Nvidia) or waves (AMD) in the x and y dimensions rather than threads. This also abstracts the computation at a level higher than individual threads where, instead of each thread performing calculation independently we consider a group of threads (warp) working to solve a bigger computational problem collaboratively (as we often see in warp synchronous primitives like the CUDA's WMMA api). For example,

v100 = acc.Target(Target.Model.NVIDIA_V100)
plan.bind({
    i: v100.GridUnit.BLOCK_X,
    j: v100.GridUnit.BLOCK_Y,
    ii: v100.GridUnit.WARP_Y,
    jj: v100.GridUnit.WARP_X
})

in this case, we assign a warp/wave of threads to each unique combination of the (ii, jj) in the iteration space. The spatial arrangement of the warps to their data is defined by the ranges assigned to these individual indices. For example, if both ii and jj are ranges [0, 32) with step size of 16, we will have a total of 4 warps (2 in the x-dimension and 2 in the y-dimension) covering a 32x32 data region.

Last update: 2023-04-17