Accera Tutorials

Tutorial	Description
Hello Matrix Multiplication	Start here if you are completely new to Accera and would like to learn more about the workflow
Optimized Matrix Multiplication	Once you understand the basics, we'll look at how to optimize matrix multiplication for a specific hardware target
Cross Compilation for Raspberry Pi 3	After you know how to generate code for the host target, we'll look at how to generate code for other targets
[GPU] Hello Matrix Multiplication	We'll look at how to apply the basic concepts for GPU targets
[GPU] Tensorized Matrix Multiplication	Explains the basic usage of Tensor cores on GPU
[GPU] Multi-Pass Tensorized MatMul with Pass Fusion	Shows how pass fusion can be used to control register usage of input data
[GPU] Tensorized MatMul with Caching	Explores shared memory and register caching techniques on GPU
[GPU] Tensorized MatMul with Element-wise Op fusion	Enhanced Matmul with element-wise pre/post matmul OP fusion
[GPU] Multi-Block Tensorized MatMul with different Scheduling Policies	Explains tradeoffs between register usage and memory I/O, and their performance impact

Last update: 2023-04-17