Accera v1.2 Reference

`accera.MMAShape`

The following table shows the matrix multiplication parameters associated with the different enum values, for different data types for a single pass. So for example a single pass of the M32xN32xK2_B1 operation would take input matrices of dimensions [32x2] (A) and [2x32] (B) to produce a matrix multiplication result of dimensions [32x32] (C). These operations can then be composed together to perform matrix multiplication of larger matrices.

More information about the corresponding Matrix Arithmetic Instructions (MAI) can be found here.

Supported MMA shapes and their compatible types for AMD targets
accera.MMAShape	MFMA Instruction	M, N, K	Input Type (ScalarType)	Output Type (ScalarType)	Compute Type (C++)
M64xN64xK1_B4	V_MFMA_F32_16x16x1F32	64, 64, 1	float32	float32	float
M64xN64xK1_B2	V_MFMA_F32_32x32x1F32	64, 64, 1
M32xN32xK2_B1	V_MFMA_F32_32x32x2F32	32, 32, 2
M16xN16xK4_B1	V_MFMA_F32_16x16x4F32	16, 16, 4
M64xN64xK2_B4	V_MFMA_F32_16X16X2BF16	64, 64, 2	bfloat16	bfloat16/float32
M64xN64xK2_B2	V_MFMA_F32_32X32X2BF16	64, 64, 2		bfloat16/float32
M32xN32xK4_B1	V_MFMA_F32_32X32X4BF16	32, 32, 4		bfloat16/float32
M16xN16xK8_B1	V_MFMA_F32_16X16X8BF16	16, 16, 8		bfloat16/float32
M64xN64xK4_B4	V_MFMA_F32_16x16x4F16	64, 64, 4	float16	float16/32
M64xN64xK4_B4	V_MFMA_I32_16X16X4I8		int8	int8/16/32	int
M64xN64xK4_B2	V_MFMA_F32_32x32x4F16		float16	float16/32	float
M64xN64xK4_B2	V_MFMA_I32_32X32X4I8		int8	int8/16/32	int
M32xN32xK8_B1	V_MFMA_F32_32x32x8F16	32, 32, 8	float16	float16/32	float
M32xN32xK8_B1	V_MFMA_I32_32X32X8I8	32, 32, 8	int8	int8/16/32	int
M16xN16xK16_B1	V_MFMA_F32_16x16x16F16	16, 16, 16	float16	float16/32	float
M16xN16xK16_B1	V_MFMA_I32_16X16X16I8	16, 16, 16	int8	int8/16/32	int

Supported MMA shapes and their compatible types for Nvidia targets
accera.MMAShape	M, N, K	Input Type (ScalarType)	Output Type (ScalarType)	Compute Type (C++)
M16xN16xK8_B1	16, 16, 8	float32	float32	tf32^*
M16xN16xK16_B1	16, 16, 16	float16	float16/32	float
		bfloat16	float32	float
		u/int8	int32	int
M32xN8xK16_B1	32, 8, 16	float16	float16/32	float
		bfloat16	float32	float
		u/int8	int32	int
M8xN32xK16_B1	8, 32, 16	float16	float16/32	float
		bfloat16	float32	float
		u/int8	int32	int

*TensorFloat-32 is a floating-point type introduced in the Nvidia Ampere architecture for accelerating FP32 performance. Information about this can be found here and in more detail in the architecture whitepaper. In this mode, multiplication is performed in TF32 precision and accumulation happens in FP32 precision.

Last update: 2023-04-17