Skip to content
This repository was archived by the owner on Aug 1, 2025. It is now read-only.

Conversation

desertfire
Copy link
Contributor

No description provided.

@desertfire desertfire changed the title [inductor] Rework #956 to avoid a perf regression [inductor] Revise #956 to avoid a perf regression Aug 24, 2022
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the issue?

index, mask = self.indexing(index)
if is_index_0 and "tl.zeros" not in index:
# Need dense_indexing when index == 0
index = f"{index} + tl.zeros({self.dense_size_str()}, tl.int32)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this?

Comment on lines +655 to 657
if need_dense and not have_dense:
mask = dense_mask
index_str = f"{index_str} + tl.zeros({self.dense_size_str()}, tl.int32)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if need_dense and not have_dense:
mask = dense_mask
index_str = f"{index_str} + tl.zeros({self.dense_size_str()}, tl.int32)"
if need_dense and not have_dense or index == 0:
index_str = f"{index_str} + tl.zeros({self.dense_size_str()}, tl.int32)"
if index == 0:
mask = ["None"]
else:
mask = dense_mask

and remove changes to load. If index is 0, you don't need mask at all, a single element load is always valid.

@desertfire
Copy link
Contributor Author

What was the issue?

The perf regression is because my previous change will cause some tl.load not being promoted out of reduction loops since they will fail the checking at

and "rmask" not in mask
.

I am actually surprised to see how perf measurement on AWS is stable enough to capture a 3% regression correctly. I will look into how stable the numbers are on the CI and consider add perf checking for CI tasks.

@desertfire desertfire merged commit d9d47d5 into main Aug 24, 2022
desertfire added a commit that referenced this pull request Aug 25, 2022
Summary: #981 introduced a
misaligned address bug which relates how tl.load from index 0 should be
written in triton.
desertfire added a commit that referenced this pull request Aug 25, 2022
Summary: #981 introduced a
misaligned address bug which relates how tl.load from index 0 should be
written in triton.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants