pub struct ThreadWarpTile<const SIZE: usize = 32, const STRIDE: usize = 1>;Expand description
Similar a thread block tile in a GPU kernel. But the SIZE <= warp size (e.g., 32 for NVIDIA GPUs). If SIZE = 8, stride = 4, then the clusters will be: [0, 4, 8, 12, 16, 20, 24, 28] [1, 5, 9, 13, 17, 21, 25, 29] [2, 6, 10, 14, 18, 22, 26, 30] [3, 7, 11, 15, 19, 23, 27, 31] If SIZE = 8, stride = 1, then the clusters will be: [0, 1, 2, 3, 4, 5, 6, 7] [8, 9, 10, 11, 12, 13, 14, 15] [16, 17, 18, 19, 20, 21, 22, 23] [24, 25, 26, 27, 28, 29, 30, 31]
Implementations§
Source§impl<const SIZE: usize, const STRIDE: usize> ThreadWarpTile<SIZE, STRIDE>
let warp = gpu::cg::ThreadWarpTile::<16>;
let size = warp.size();
ⓘlet warp = gpu::cg::ThreadWarpTile::<3>;
let size = warp.size();
impl<const SIZE: usize, const STRIDE: usize> ThreadWarpTile<SIZE, STRIDE>
let warp = gpu::cg::ThreadWarpTile::<16>;
let size = warp.size();let warp = gpu::cg::ThreadWarpTile::<3>;
let size = warp.size();pub const CHECKED_SIZE: u32
Source§impl<const SIZE: usize> ThreadWarpTile<SIZE, 1>
Implement simple flexible warp with STRIDE = 1.
impl<const SIZE: usize> ThreadWarpTile<SIZE, 1>
Implement simple flexible warp with STRIDE = 1.
pub const BASE_THREAD_MASK: u32
pub const LANE_MASK: u32
pub const SHIFT_COUNT: u32
pub const fn size(&self) -> u32
pub fn meta_group_size(&self) -> u32
pub fn subgroup_id(&self) -> u32
Sourcepub fn thread_mask(&self) -> u32
pub fn thread_mask(&self) -> u32
E.g., when SIZE = 8, lane_id -> mask 0 -> 0xff 1 -> 0xff 8 -> 0xff00 9 -> 0xff00
pub fn nvcc_redux_sync<Op: NvvmReduxSyncKind<T>, T>( &self, _op: Op, value: T, ) -> T
Source§impl<const SIZE: usize, const STRIDE: usize> ThreadWarpTile<SIZE, STRIDE>
impl<const SIZE: usize, const STRIDE: usize> ThreadWarpTile<SIZE, STRIDE>
Sourcepub fn _subgroup_reduce<T>(_value: T, _op: &'static str) -> T
pub fn _subgroup_reduce<T>(_value: T, _op: &'static str) -> T
Reduce by hardware-defined warp.
For now, it only supports i32 or u32 types.
Sourcepub fn subgroup_reduce<Op, T>(self, _op: Op, value: T) -> Twhere
Op: SubGroupReduceKind<T>,
pub fn subgroup_reduce<Op, T>(self, _op: Op, value: T) -> Twhere
Op: SubGroupReduceKind<T>,
Reduce by software-defined warp.
Trait Implementations§
Source§impl<const SIZE: usize> BuildChunkScope<Thread> for ThreadWarpTile<SIZE>
Warp -> Thread
impl<const SIZE: usize> BuildChunkScope<Thread> for ThreadWarpTile<SIZE>
Warp -> Thread
type CS = Warp2ThreadScope<SIZE>
fn build_chunk_scope(&self, _to: Thread) -> Warp2ThreadScope<SIZE>
Source§impl<const SIZE: usize> BuildChunkScope<ThreadWarpTile<SIZE>> for Block
impl<const SIZE: usize> BuildChunkScope<ThreadWarpTile<SIZE>> for Block
type CS = Block2WarpScope<SIZE>
fn build_chunk_scope(&self, _to: ThreadWarpTile<SIZE>) -> Block2WarpScope<SIZE>
Source§impl<const SIZE: usize> BuildChunkScope<ThreadWarpTile<SIZE>> for Grid
impl<const SIZE: usize> BuildChunkScope<ThreadWarpTile<SIZE>> for Grid
type CS = Grid2WarpScope<SIZE>
fn build_chunk_scope(&self, _to: ThreadWarpTile<SIZE>) -> Grid2WarpScope<SIZE>
Source§impl<const SIZE: usize> CGOperations for ThreadWarpTile<SIZE>
impl<const SIZE: usize> CGOperations for ThreadWarpTile<SIZE>
fn thread_rank(&self) -> u32
Source§impl<const SIZE: usize, const STRIDE: usize> Clone for ThreadWarpTile<SIZE, STRIDE>
impl<const SIZE: usize, const STRIDE: usize> Clone for ThreadWarpTile<SIZE, STRIDE>
Source§fn clone(&self) -> ThreadWarpTile<SIZE, STRIDE>
fn clone(&self) -> ThreadWarpTile<SIZE, STRIDE>
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl<const SIZE: usize, T, Op> WarpReduceOp<T, Op> for ThreadWarpTile<SIZE, 1>where
Op: NvvmReduxSyncKind<T> + ReduxKind,
Ideally, we should use SubGroupReduceKind here, but MLIR support is limited.
Let user use subgroup_reduce directly.
impl<const SIZE: usize, T, Op> WarpReduceOp<T, Op> for ThreadWarpTile<SIZE, 1>where
Op: NvvmReduxSyncKind<T> + ReduxKind,
Ideally, we should use SubGroupReduceKind here, but MLIR support is limited.
Let user use subgroup_reduce directly.