Cutlass Tiled Copy
Copy is all you need.
make_tiled_copy语义理解。核心在于: tiler和layout_tv。先说结论: 用atom去对tv layout进行分tile。用tiler去对目标tensor进行分tile。最后将这两个layout组合得到新的tv layout,表示tile-wise的访问tv, v的layout能够保证满...
cutlass cute copy的本质本质: src_ptr和dst_ptr在step时用的逻辑index进行for循环, 通过layout映射到物理index再进行访存。
123456789101112template <class SrcEngine, class SrcLayout, class DstEngine, class DstLayout>CU...
Tensor core MMA指令教程
参考 https://zhuanlan.zhihu.com/p/1892346599864238276
以mma.m8n8k4为例
A warp executing mma.m8n8k4 with .f16 floating point type will compute 4 MMA operations of shape .m8n8k4.
一个...