Skip to content

Conversation

shmsong
Copy link

@shmsong shmsong commented Aug 10, 2022

This PR tries to build out matmul CTA tiling support, prioritizing Ampere ones at the moment.

Also in this PR is an inline bank conflict check utility that reports bank conflict on shared memory accesses.

Remaining TODO:

  • add asymmetric swizzle operators to support more irregular tile sizes.

shmsong and others added 30 commits July 11, 2022 22:15
@zasdfgbnm zasdfgbnm changed the base branch from devel to rebase-matmul_swizzle_gen March 20, 2023 17:56
@zasdfgbnm zasdfgbnm merged commit c8c8cd7 into rebase-matmul_swizzle_gen Mar 20, 2023
@zasdfgbnm zasdfgbnm deleted the matmul_swizzle_gen branch March 20, 2023 17:57
zasdfgbnm added a commit to NVIDIA/Fuser that referenced this pull request Mar 21, 2023
When working on csarofeen/pytorch#1900, I find
that sometimes expr simplifier will assign the dtype of a value wrongly.
This is because expr simplifier is written loosely assuming the dtypes
of all `Val`s are the same. However, because we are putting pointer
types into the expression, we need to be more careful, otherwise we will
get wrong kernel code like:
```
__half* ptr1 = threadIdx.x * 128;
__half* ptr2 = ptr1 + 256 + T1.data;
```
This PR changes all passes in the expr simplifier to let it infer dtype
from its inputs.

- [x] TODO: update all the failing `assertCUDAKernel` tests😵‍💫😵‍💫😵‍💫😵‍💫
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants