ThreadWarpTile

Struct ThreadWarpTile 

Source
pub struct ThreadWarpTile<const SIZE: usize = 32, const STRIDE: usize = 1>;
Expand description

Similar a thread block tile in a GPU kernel. But the SIZE <= warp size (e.g., 32 for NVIDIA GPUs). If SIZE = 8, stride = 4, then the clusters will be: [0, 4, 8, 12, 16, 20, 24, 28] [1, 5, 9, 13, 17, 21, 25, 29] [2, 6, 10, 14, 18, 22, 26, 30] [3, 7, 11, 15, 19, 23, 27, 31] If SIZE = 8, stride = 1, then the clusters will be: [0, 1, 2, 3, 4, 5, 6, 7] [8, 9, 10, 11, 12, 13, 14, 15] [16, 17, 18, 19, 20, 21, 22, 23] [24, 25, 26, 27, 28, 29, 30, 31]

Implementations§

Source§

impl<const SIZE: usize, const STRIDE: usize> ThreadWarpTile<SIZE, STRIDE>

let warp = gpu::cg::ThreadWarpTile::<16>;
let size = warp.size();
let warp = gpu::cg::ThreadWarpTile::<3>;
let size = warp.size();
Source§

impl<const SIZE: usize> ThreadWarpTile<SIZE, 1>

Implement simple flexible warp with STRIDE = 1.

Source

pub const BASE_THREAD_MASK: u32

Source

pub const LANE_MASK: u32

Source

pub const SHIFT_COUNT: u32

Source

pub const fn size(&self) -> u32

Source

pub fn meta_group_size(&self) -> u32

Source

pub fn subgroup_id(&self) -> u32

Source

pub fn thread_mask(&self) -> u32

E.g., when SIZE = 8, lane_id -> mask 0 -> 0xff 1 -> 0xff 8 -> 0xff00 9 -> 0xff00

Source

pub fn nvcc_redux_sync<Op: NvvmReduxSyncKind<T>, T>( &self, _op: Op, value: T, ) -> T

Source§

impl<const SIZE: usize, const STRIDE: usize> ThreadWarpTile<SIZE, STRIDE>

Source

pub fn _subgroup_reduce<T>(_value: T, _op: &'static str) -> T

Reduce by hardware-defined warp. For now, it only supports i32 or u32 types.

Source

pub fn subgroup_reduce<Op, T>(self, _op: Op, value: T) -> T
where Op: SubGroupReduceKind<T>,

Reduce by software-defined warp.

Trait Implementations§

Source§

impl<const SIZE: usize> BuildChunkScope<Thread> for ThreadWarpTile<SIZE>

Warp -> Thread

Source§

impl<const SIZE: usize> BuildChunkScope<ThreadWarpTile<SIZE>> for Block

Source§

impl<const SIZE: usize> BuildChunkScope<ThreadWarpTile<SIZE>> for Grid

Source§

impl<const SIZE: usize> CGOperations for ThreadWarpTile<SIZE>

Source§

impl<const SIZE: usize, const STRIDE: usize> Clone for ThreadWarpTile<SIZE, STRIDE>

Source§

fn clone(&self) -> ThreadWarpTile<SIZE, STRIDE>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<const SIZE: usize, T, Op> WarpReduceOp<T, Op> for ThreadWarpTile<SIZE, 1>
where Op: NvvmReduxSyncKind<T> + ReduxKind,

Ideally, we should use SubGroupReduceKind here, but MLIR support is limited. Let user use subgroup_reduce directly.

Source§

fn redux(&self, op: Op, value: T) -> T

Source§

impl<const SIZE: usize> WarpReduceOp<f32, ReduxAdd> for ThreadWarpTile<SIZE>

Source§

fn redux(&self, _op: ReduxAdd, value: f32) -> f32

Source§

impl<const SIZE: usize> WarpReduceOp<f32, ReduxMax> for ThreadWarpTile<SIZE>

Source§

fn redux(&self, _op: ReduxMax, value: f32) -> f32

Source§

impl<const SIZE: usize, const STRIDE: usize> Copy for ThreadWarpTile<SIZE, STRIDE>

Auto Trait Implementations§

§

impl<const SIZE: usize, const STRIDE: usize> Freeze for ThreadWarpTile<SIZE, STRIDE>

§

impl<const SIZE: usize, const STRIDE: usize> RefUnwindSafe for ThreadWarpTile<SIZE, STRIDE>

§

impl<const SIZE: usize, const STRIDE: usize> Send for ThreadWarpTile<SIZE, STRIDE>

§

impl<const SIZE: usize, const STRIDE: usize> Sync for ThreadWarpTile<SIZE, STRIDE>

§

impl<const SIZE: usize, const STRIDE: usize> Unpin for ThreadWarpTile<SIZE, STRIDE>

§

impl<const SIZE: usize, const STRIDE: usize> UnwindSafe for ThreadWarpTile<SIZE, STRIDE>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> HostToDev<T> for T

Source§

fn convert(self) -> T

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.