Module cg
Source - Block
- Grid
- ReduxAdd
- ReduxAnd
- ReduxMax
- ReduxMin
- ReduxOr
- ReduxXor
- Thread
- ThreadWarpTile
- Similar a thread block tile in a GPU kernel.
But the SIZE <= warp size (e.g., 32 for NVIDIA GPUs).
If SIZE = 8, stride = 4, then the clusters will be:
[0, 4, 8, 12, 16, 20, 24, 28]
[1, 5, 9, 13, 17, 21, 25, 29]
[2, 6, 10, 14, 18, 22, 26, 30]
[3, 7, 11, 15, 19, 23, 27, 31]
If SIZE = 8, stride = 1, then the clusters will be:
[0, 1, 2, 3, 4, 5, 6, 7]
[8, 9, 10, 11, 12, 13, 14, 15]
[16, 17, 18, 19, 20, 21, 22, 23]
[24, 25, 26, 27, 28, 29, 30, 31]
- CGOperations
- WarpReduceOp
- _redux_sync
- _shuffle