Accera v1.2 Reference
accera.MMAShape
The following table shows the matrix multiplication parameters associated with the different enum values, for different data types for a single pass. So for example a single pass of the M32xN32xK2_B1
operation would take input matrices of dimensions [32x2] (A) and [2x32] (B) to produce a matrix multiplication result of dimensions [32x32] (C). These operations can then be composed together to perform matrix multiplication of larger matrices.
More information about the corresponding Matrix Arithmetic Instructions (MAI) can be found here.
accera.MMAShape | MFMA Instruction | M, N, K | Input Type (ScalarType) | Output Type (ScalarType) | Compute Type (C++) |
---|---|---|---|---|---|
M64xN64xK1_B4 | V_MFMA_F32_16x16x1F32 | 64, 64, 1 | float32 | float32 | float |
M64xN64xK1_B2 | V_MFMA_F32_32x32x1F32 | ||||
M32xN32xK2_B1 | V_MFMA_F32_32x32x2F32 | 32, 32, 2 | |||
M16xN16xK4_B1 | V_MFMA_F32_16x16x4F32 | 16, 16, 4 | |||
M64xN64xK2_B4 | V_MFMA_F32_16X16X2BF16 | 64, 64, 2 | bfloat16 | bfloat16/float32 | |
M64xN64xK2_B2 | V_MFMA_F32_32X32X2BF16 | bfloat16/float32 | |||
M32xN32xK4_B1 | V_MFMA_F32_32X32X4BF16 | 32, 32, 4 | bfloat16/float32 | ||
M16xN16xK8_B1 | V_MFMA_F32_16X16X8BF16 | 16, 16, 8 | bfloat16/float32 | ||
M64xN64xK4_B4 | V_MFMA_F32_16x16x4F16 | 64, 64, 4 | float16 | float16/32 | |
V_MFMA_I32_16X16X4I8 | int8 | int8/16/32 | int | ||
M64xN64xK4_B2 | V_MFMA_F32_32x32x4F16 | float16 | float16/32 | float | |
V_MFMA_I32_32X32X4I8 | int8 | int8/16/32 | int | ||
M32xN32xK8_B1 | V_MFMA_F32_32x32x8F16 | 32, 32, 8 | float16 | float16/32 | float |
V_MFMA_I32_32X32X8I8 | int8 | int8/16/32 | int | ||
M16xN16xK16_B1 | V_MFMA_F32_16x16x16F16 | 16, 16, 16 | float16 | float16/32 | float |
V_MFMA_I32_16X16X16I8 | int8 | int8/16/32 | int |
accera.MMAShape | M, N, K | Input Type (ScalarType) | Output Type (ScalarType) | Compute Type (C++) |
---|---|---|---|---|
M16xN16xK8_B1 | 16, 16, 8 | float32 | float32 | tf32* |
M16xN16xK16_B1 | 16, 16, 16 | float16 | float16/32 | float |
bfloat16 | float32 | |||
u/int8 | int32 | int | ||
M32xN8xK16_B1 | 32, 8, 16 | float16 | float16/32 | float |
bfloat16 | float32 | |||
u/int8 | int32 | int | ||
M8xN32xK16_B1 | 8, 32, 16 | float16 | float16/32 | float |
bfloat16 | float32 | |||
u/int8 | int32 | int |
*TensorFloat-32 is a floating-point type introduced in the Nvidia Ampere architecture for accelerating FP32 performance. Information about this can be found here and in more detail in the architecture whitepaper. In this mode, multiplication is performed in TF32 precision and accumulation happens in FP32 precision.