Module prelude

Re-exports§

nvptx_to_target_asm
reshape_map: Maps a multi-dimensional local index and thread layout into a reshaped global index. The idea is that the user need to specify the dimensions of logical index (local + thread ids), and the target tensor dimensions, then we map the logical index into the corresponding target multi-dimension tensor. We have:
reshape_map_macro

DimX
DimY
DimZ
DynamicSharedAlloc: Dynamic GPU shared memory allocation.
GpuGlobal: Used to distinguish different memory spaces in GPU programming. GpuGlobal represents global memory space. See shared::GpuShared for shared memory space. When chunking or atomic operations are needed, GpuGlobal is owned by chunk or atomic struct. This ensures that the user cannot access the data without using chunk or atomic operations.
GpuShared: Static GPU shared memory. NVCC always aligns shared memory to 16 bytes, so we also align to 16 bytes here.
Map2D: This mapping strategy is useful when we want to reshape a 1D array into a 2D array and then distribute one element to a thread one by one until consuming all. It creates a new non-continuous partition for each thread.
MapContinuousLinear
MapLinearWithDim: Linear mapping for 1D array. N is the number of thread dimensions. width is the chunking window. The array is divided into chunks along threads until all elements are covered.

CacheStreamLoadStore
DimType
DynamicSharedAllocBuilder: This trait is implemented for kernel config struct to provide dynamic shared memory allocation.
GPUDeviceFloatIntrinsics: This trait provides the device intrinsics for floating-point types that are not defined by Rust core::intrinsics.
HostToDev: Expose the convert function to users. This trait is sealed to prevent arbitrary implementations. Only types that implement HostToDevPrivateSeal can implement this trait. This ensures that only safe conversions are allowed, ensuring safe host-to-device interface.
PushPrintfArg
SafeGpuConfig: A safe wrapper for GPUConfig providing compile-time and runtime validation.

attr: Add gpu attributes to the kernel function e.g. #[gpu::attr(nvvm_launch_bound(256, 1, 1, 2))]
cuda_kernel: This attribute generates a host wrapper around a kernel function, allowing it to be launched from the host. The kernel function itself is original function with Config. The generated host function is in mod #kname {pub fn launch(…)}
device
host
kernel