-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter #152429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter #152429
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good
A few minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % all other pending comments.
for (int64_t dim : vectorShape) { | ||
auto stepType = VectorType::get({dim}, rewriter.getIndexType()); | ||
auto stepOp = vector::StepOp::create(rewriter, loc, stepType); | ||
stepVectors.push_back(stepOp); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using llvm::map_to_vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed. Not sure this is better? The lamba function has lots of side-effects like IR creation and writing to setpVectors, not a pure function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Value localOffsets = broadcasted[0]; | ||
for (size_t i = 1; i < broadcasted.size(); ++i) { | ||
localOffsets = | ||
arith::AddIOp::create(rewriter, loc, localOffsets, broadcasted[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: braces are not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few final touches and it should be good to go
Otherwise, looks good 👍 Thanks for all the tweaks
Lowering transfer_read/transfer_write to load_gather/store_scatter in case the target uArch doesn't support load_nd/store_nd. The high level steps: