Skip to content

Accera v1.2 Reference

accera.MMAShape

The following table shows the matrix multiplication parameters associated with the different enum values, for different data types for a single pass. So for example a single pass of the M32xN32xK2_B1 operation would take input matrices of dimensions [32x2] (A) and [2x32] (B) to produce a matrix multiplication result of dimensions [32x32] (C). These operations can then be composed together to perform matrix multiplication of larger matrices.

More information about the corresponding Matrix Arithmetic Instructions (MAI) can be found here.

Supported MMA shapes and their compatible types for AMD targets
accera.MMAShape MFMA Instruction M, N, K Input Type (ScalarType) Output Type (ScalarType) Compute Type (C++)
M64xN64xK1_B4 V_MFMA_F32_16x16x1F32 64, 64, 1 float32 float32 float
M64xN64xK1_B2 V_MFMA_F32_32x32x1F32
M32xN32xK2_B1 V_MFMA_F32_32x32x2F32 32, 32, 2
M16xN16xK4_B1 V_MFMA_F32_16x16x4F32 16, 16, 4
M64xN64xK2_B4 V_MFMA_F32_16X16X2BF16 64, 64, 2 bfloat16 bfloat16/float32
M64xN64xK2_B2 V_MFMA_F32_32X32X2BF16 bfloat16/float32
M32xN32xK4_B1 V_MFMA_F32_32X32X4BF16 32, 32, 4 bfloat16/float32
M16xN16xK8_B1 V_MFMA_F32_16X16X8BF16 16, 16, 8 bfloat16/float32
M64xN64xK4_B4 V_MFMA_F32_16x16x4F16 64, 64, 4 float16 float16/32
V_MFMA_I32_16X16X4I8 int8 int8/16/32 int
M64xN64xK4_B2 V_MFMA_F32_32x32x4F16 float16 float16/32 float
V_MFMA_I32_32X32X4I8 int8 int8/16/32 int
M32xN32xK8_B1 V_MFMA_F32_32x32x8F16 32, 32, 8 float16 float16/32 float
V_MFMA_I32_32X32X8I8 int8 int8/16/32 int
M16xN16xK16_B1 V_MFMA_F32_16x16x16F16 16, 16, 16 float16 float16/32 float
V_MFMA_I32_16X16X16I8 int8 int8/16/32 int
Supported MMA shapes and their compatible types for Nvidia targets
accera.MMAShape M, N, K Input Type (ScalarType) Output Type (ScalarType) Compute Type (C++)
M16xN16xK8_B1 16, 16, 8 float32 float32 tf32*
M16xN16xK16_B1 16, 16, 16 float16 float16/32 float
bfloat16 float32
u/int8 int32 int
M32xN8xK16_B1 32, 8, 16 float16 float16/32 float
bfloat16 float32
u/int8 int32 int
M8xN32xK16_B1 8, 32, 16 float16 float16/32 float
bfloat16 float32
u/int8 int32 int

*TensorFloat-32 is a floating-point type introduced in the Nvidia Ampere architecture for accelerating FP32 performance. Information about this can be found here and in more detail in the architecture whitepaper. In this mode, multiplication is performed in TF32 precision and accumulation happens in FP32 precision.


Last update: 2023-04-17