Skip to content

Accera Tutorials

Tutorial Description
Hello Matrix Multiplication Start here if you are completely new to Accera and would like to learn more about the workflow
Optimized Matrix Multiplication Once you understand the basics, we'll look at how to optimize matrix multiplication for a specific hardware target
Cross Compilation for Raspberry Pi 3 After you know how to generate code for the host target, we'll look at how to generate code for other targets
[GPU] Hello Matrix Multiplication We'll look at how to apply the basic concepts for GPU targets
[GPU] Tensorized Matrix Multiplication Explains the basic usage of Tensor cores on GPU
[GPU] Multi-Pass Tensorized MatMul with Pass Fusion Shows how pass fusion can be used to control register usage of input data
[GPU] Tensorized MatMul with Caching Explores shared memory and register caching techniques on GPU
[GPU] Tensorized MatMul with Element-wise Op fusion Enhanced Matmul with element-wise pre/post matmul OP fusion
[GPU] Multi-Block Tensorized MatMul with different Scheduling Policies Explains tradeoffs between register usage and memory I/O, and their performance impact

Last update: 2023-04-17