Cutlass Tiled Copy
Copy is all you need.
make_tiled_copy语义理解。核心在于: tiler和layout_tv。先说结论: 用atom去对tv layout进行分tile。用tiler去对目标tensor进行分tile。最后将这两个layout组合得到新的tv layout,表示tile-wise的访问tv, v的layout能够保证满...
Tensor core MMA指令教程
参考 https://zhuanlan.zhihu.com/p/1892346599864238276
以mma.m8n8k4为例
A warp executing mma.m8n8k4 with .f16 floating point type will compute 4 MMA operations of shape .m8n8k4.
一个...