Accera Tutorials
Tutorial | Description |
---|---|
Hello Matrix Multiplication | Start here if you are completely new to Accera and would like to learn more about the workflow |
Optimized Matrix Multiplication | Once you understand the basics, we'll look at how to optimize matrix multiplication for a specific hardware target |
Cross Compilation for Raspberry Pi 3 | After you know how to generate code for the host target, we'll look at how to generate code for other targets |
[GPU] Hello Matrix Multiplication | We'll look at how to apply the basic concepts for GPU targets |
[GPU] Tensorized Matrix Multiplication | Explains the basic usage of Tensor cores on GPU |
[GPU] Multi-Pass Tensorized MatMul with Pass Fusion | Shows how pass fusion can be used to control register usage of input data |
[GPU] Tensorized MatMul with Caching | Explores shared memory and register caching techniques on GPU |
[GPU] Tensorized MatMul with Element-wise Op fusion | Enhanced Matmul with element-wise pre/post matmul OP fusion |
[GPU] Multi-Block Tensorized MatMul with different Scheduling Policies | Explains tradeoffs between register usage and memory I/O, and their performance impact |
Last update:
2023-04-17