ChunkScope

Trait ChunkScope 

Source
pub trait ChunkScope: PrivateTraitGuard + Clone {
    type FromScope: SyncScope;
    type ToScope: SyncScope;

    // Required methods
    fn thread_ids() -> [u32; 6];
    fn global_dim<D: DimType>() -> u32;
    fn global_id<D: DimType>(thread_ids: [u32; 6]) -> u32;

    // Provided methods
    fn global_id_x(thread_ids: [u32; 6]) -> u32 { ... }
    fn global_id_y(thread_ids: [u32; 6]) -> u32 { ... }
    fn global_id_z(thread_ids: [u32; 6]) -> u32 { ... }
    fn global_dim_x() -> u32 { ... }
    fn global_dim_y() -> u32 { ... }
    fn global_dim_z() -> u32 { ... }
}
Expand description

This trait provides chunking scope information, indicating chunking from a larger scope to a smaller scope: Grid -> Cluster -> Block -> Warp -> Thread

§Primitive Chunk Scopes

Cluster is now mostly out of scope due to limited hardware support. We currently consider the following 6 scope transitions:

  • Grid2Block: chunking from Grid scope to Block scope.
  • Block2Thread: chunking from Block scope to Thread scope.
  • Grid2Thread: chunking from Grid scope to Thread scope.
  • Grid2Warp: chunking from Grid scope to Warp scope.
  • Block2Warp: chunking from Block scope to Warp scope.
  • Warp2Thread: chunking from Warp scope to Thread scope.

§Chained Scope and Chained Map

In addition to primitive chunk scopes, we provide ChainedScope, which allows chaining two scopes together. ChainedScope is always used together with ChainedMap to chain two mapping strategies.

For example, Grid2Block + Block2Thread is similar to Grid2Thread but differs slightly in usage.

§When should I use Primitive Scopes instead of Chained Scopes?

Chained scope/mapping should not be over-used.

For simplicity, you may not always need to chunk from a scope to its next scope. For example, use Grid2Warp directly instead of Grid2Block + Block2Warp if the intermediate chunking strategy does not matter much.

This is especially true when access patterns switch between scopes, e.g., from grid to warp for read and then from warp to thread for write.

§When should I use Chained Scopes?

ChainedScope is useful for:

  • Removing some threads from work, and
  • Applying different mapping strategies at different levels.

§Examples

§Direct chunk: Grid -> Thread

The following example is similar to chunk_mut(…) with MapLinear.

use gpu::MapLinear;
use gpu::chunk_scope::{build_chunk_scope, Grid, Thread, ThreadWarpTile};

fn kernel(input: gpu::GpuGlobal<[f32]>) {
    let g2w = build_chunk_scope(Grid, Thread);
    let _ = input.chunk_to_scope(g2w, MapLinear::new(2));
}

§Flexible chunking: Grid -> Warp -> Thread

With Grid2Warp + Warp2Thread, only lane0 of each warp may access elements.

use gpu::MapLinear;
use gpu::chunk_scope::{build_chunk_scope, Grid, Thread, ThreadWarpTile};

fn kernel(input: gpu::GpuGlobal<[f32]>) {
    let warp = ThreadWarpTile::<32>;
    let g2w = build_chunk_scope(Grid, warp);
    let w2t = build_chunk_scope(warp, Thread);
    input
        .chunk_to_scope(g2w, MapLinear::new(2))
        .chunk_to_scope(w2t, MapLinear::new(2));
}

§Invalid scope transitions will be rejected

The chunk_to_scope API guarantees valid scope transitions.

use gpu::MapLinear;
use gpu::chunk_scope::{build_chunk_scope, Grid, Block, Thread, ThreadWarpTile};

fn kernel(input: gpu::GpuGlobal<[f32]>) {
    let warp = ThreadWarpTile::<32>;
    let g2w = build_chunk_scope(Grid, warp);
    let b2t = build_chunk_scope(Block, Thread);
    // This should not compile, as the scope transition is invalid.
    // Type mismatch resolving `<Block2ThreadScope as ChunkScope>::FromScope == ThreadWarpTile`
    input.chunk_to_scope(g2w, MapLinear::new(2))
         .chunk_to_scope(b2t, MapLinear::new(2));
}

TODO: ToScope can be leveraged for static analysis of memory access patterns. It may be used to check required synchronization scopes.

Required Associated Types§

Source

type FromScope: SyncScope

Source

type ToScope: SyncScope

Required Methods§

Source

fn thread_ids() -> [u32; 6]

Source

fn global_dim<D: DimType>() -> u32

Source

fn global_id<D: DimType>(thread_ids: [u32; 6]) -> u32

Provided Methods§

Source

fn global_id_x(thread_ids: [u32; 6]) -> u32

Provided methods.

Source

fn global_id_y(thread_ids: [u32; 6]) -> u32

Source

fn global_id_z(thread_ids: [u32; 6]) -> u32

Source

fn global_dim_x() -> u32

Source

fn global_dim_y() -> u32

Source

fn global_dim_z() -> u32

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§