-
Notifications
You must be signed in to change notification settings - Fork 7
Patch sync insertion for redundant predicated writes #1684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
launch_params_.gdimy() * launch_params_.gdimz(), | ||
"Wanted to launch a cooperative kernel, however the number of blocks is greater than ", | ||
"what can be resident on the GPU at once. Need: ", | ||
launch_params_.gdimx() * launch_params_.gdimy() * launch_params_.gdimz(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated formatting.
tv0->computeAt(tv3, 0); | ||
tv1->computeAt(tv3, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these meant to do something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. Just making sure all the CA parameter has a value. Vaguely remember we didn't have a default behavior without any CA setting but it was a long while ago.
What I mentioned in the MMA PR was that when we have a chain of redundant exprs, I was wondering if each would be synchronized. I added a variation of the test to see what happens, and here's the generated code:
The point I was making is the first sync is redundant. I guess this is not common, so I think it's fine with this for now, but I wanted to clarify my concern for the future optimization. |
Please remove or merge the added test as you'd like. I just wanted to demonstrate the case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix.
Yes. I was planning on handling this in a follow up, i.e. a redundant write has a use-chain that has other redundant writes. So T0-> T2 is a redundant chain and T2->T3 isn't. I think these vertical redundant chains seem to be not too bad to handle. Would need to think a bit more about if there're pathological horizontal patterns, probably worst case we arrive at sub-optimal code without re-considering expr ordering. |
38b27fa
to
9637c58
Compare
This PR is a quick patch for redundant predicate sync insertion.
A sync is needed for redundant parallel type unless all use chain of the redundantly written value in smem/gmem arrive at redundant write consumers of the same parallel type.
This PR patches the insertion so that all redundant writes are sync'ed to avoid race conditions that may happen in devel TOT.
The detection for the cases where sync is not needed for redundant types will be in a follow up.