-
Notifications
You must be signed in to change notification settings - Fork 7
Expr simplifier: simplification passes for matmul #2275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…into distribute-divmod
…into distribute-divmod
…into distribute-divmod
@naoyam This is ready for review |
Why is this legal? Is threadIdx.x assumed to be less than 8? |
} | ||
|
||
BinaryOp* toDivModOp(Expr* expr) { | ||
if (auto bop = dynamic_cast<BinaryOp*>(expr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Most of the functions have this pattern of conditional branches where we could reduce indentation levels by negating the condition and exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed most of them
Oh, sorry. I meant to say |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Warning: this PR contains #2258 and #2273. Please review this PR after I have merged these two PRs and rebased this PR.
This PR adds a few more passes that are capable of simplifying matmul indexing well. The newly added passes are:
cancelDivMod
,distributeDivisibleDivMod
, anddistributeMul
. The most helpful pass for matmul isdistributeDivisibleDivMod
. It simplifies indices like:into
which helps removing data dependency on
i1
so that the index can be hoisted outside of thei1
loop.Example matmul kernel code
Command:
Kernel diff compare (this PR + #1900) vs #1900 alone
https://www.diffchecker.com/pcxGCQkn
Matmul perf benchmark
Compare (this PR + #1900) vs #1900 alone
Command:
Before:
After: