Module prelude

Module prelude 

Source

Re-exports§

pub use crate::chunk::GlobalGroupChunk;
pub use crate::chunk::GlobalThreadChunk;
pub use crate::chunk::chunk_mut;
pub use crate::sync::sync_threads;
pub use crate::vector::Float2;
pub use crate::vector::Float4;
pub use crate::vector::Float8;
pub use crate::vector::VecFlatten;
pub use crate::vector::VecTypeTrait;

Macros§

nvptx_to_target_asm
reshape_map
Maps a multi-dimensional local index and thread layout into a reshaped global index. The idea is that the user need to specify the dimensions of logical index (local + thread ids), and the target tensor dimensions, then we map the logical index into the corresponding target multi-dimension tensor. We have:
reshape_map_macro

Structs§

DimX
DimY
DimZ
DynamicSharedAlloc
Dynamic GPU shared memory allocation.
GpuGlobal
Used to distinguish different memory spaces in GPU programming. GpuGlobal represents global memory space. See shared::GpuShared for shared memory space. When chunking or atomic operations are needed, GpuGlobal is owned by chunk or atomic struct. This ensures that the user cannot access the data without using chunk or atomic operations.
GpuShared
Static GPU shared memory. NVCC always aligns shared memory to 16 bytes, so we also align to 16 bytes here.
Map2D
This mapping strategy is useful when we want to reshape a 1D array into a 2D array and then distribute one element to a thread one by one until consuming all. It creates a new non-continuous partition for each thread.
MapContinuousLinear
MapLinearWithDim
Linear mapping for 1D array. N is the number of thread dimensions. width is the chunking window. The array is divided into chunks along threads until all elements are covered.

Traits§

CacheStreamLoadStore
DimType
DynamicSharedAllocBuilder
This trait is implemented for kernel config struct to provide dynamic shared memory allocation.
GPUDeviceFloatIntrinsics
This trait provides the device intrinsics for floating-point types that are not defined by Rust core::intrinsics.
HostToDev
Expose the convert function to users. This trait is sealed to prevent arbitrary implementations. Only types that implement HostToDevPrivateSeal can implement this trait. This ensures that only safe conversions are allowed, ensuring safe host-to-device interface.
PushPrintfArg
SafeGpuConfig
A safe wrapper for GPUConfig providing compile-time and runtime validation.

Functions§

block_dim
block_id
dim
global_id
grid_dim
printf
thread_id

Type Aliases§

MapLinear
TODO: deprecate MapLinear in favor of reshape_map! macro.

Attribute Macros§

attr
Add gpu attributes to the kernel function e.g. #[gpu::attr(nvvm_launch_bound(256, 1, 1, 2))]
cuda_kernel
This attribute generates a host wrapper around a kernel function, allowing it to be launched from the host. The kernel function itself is original function with Config. The generated host function is in mod #kname {pub fn launch(…)}
device
host
kernel