Skip to content

Conversation

zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Feb 21, 2023

This PR adds a new lowering pass: loop rotation.

Loop rotation is a loop transformation that transforms the code like below

for i in range(n):
  statement1(i)
  statement2(i)
  statement3(i)
  statement4(i)

into something like:

if 0 < n:
  for i = 0:
    statement1(i)
    statement2(i)
for i in range(n):
  statement3(i)
  statement4(i)
  if i + 1 < n:
    statement1(i+1)
    statement2(i+1)

Right now, because existing predicates should already cover all illegal access, I am not materializing the 0 < n and i + 1 < n predicates, so I am actually generating:

for i = 0:
  statement1(i)
  statement2(i)
for i in range(n):
  statement3(i)
  statement4(i)
  if True:
    statement1(i+1)
    statement2(i+1)

Loop rotation is conceptually very simple regarding index and predicate lowering, and their changes are mostly mechanical. For the trivial loop for i = 0, our index and predicate lowering should already handle it correctly. For the exprs inside the if i + 1 < n block, the only change needed is to replace all the loop->index() with loop->index() + 1, which requires me to detect the if i + 1 < n block and mark its body as "rotated" during index and predicate lowering.

In order to use loop rotation, users can do fusion.rotateLoop(tv, dim, {tv1, tv2}), then during lowering, the loop tv->axis(dim) will be rotated, and the expressions allocating and computing tv1 and tv2 will be the rotated exprs.

@zasdfgbnm zasdfgbnm marked this pull request as ready for review February 22, 2023 22:32
@zasdfgbnm zasdfgbnm requested a review from naoyam February 22, 2023 22:37
Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working on, but some initial comments

@naoyam
Copy link
Collaborator

naoyam commented Feb 27, 2023

I get this build error (clang 13):

/home/nmaruyama/pytorch/debug2/third_party/nvfuser/csrc/ir_cloner.h:67:10: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
        [this](auto&... x) { return std::make_tuple<Ts...>(clone(x)...); },
         ^
/home/nmaruyama/pytorch/debug2/third_party/nvfuser/csrc/ir_cloner.h:49:22: note: in instantiation of function template specialization 'nvfuser::IrCloner::clone<nvfuser::TensorView *, long, std::unordered_set<nvfuser::Statement *>>' requested here
      copy.push_back(clone(p));
                     ^
/home/nmaruyama/pytorch/debug2/third_party/nvfuser/csrc/fusion.cpp:83:40: note: in instantiation of function template specialization 'nvfuser::IrCloner::clone<std::tuple<nvfuser::TensorView *, long, std::unordered_set<nvfuser::Statement *>>>' requested here
  to->loop_rotation_param_ = ir_cloner.clone(from->loop_rotation_param_);
                                       ^
1 error generated.

@zasdfgbnm
Copy link
Collaborator Author

I get this build error (clang 13):

/home/nmaruyama/pytorch/debug2/third_party/nvfuser/csrc/ir_cloner.h:67:10: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
        [this](auto&... x) { return std::make_tuple<Ts...>(clone(x)...); },
         ^
/home/nmaruyama/pytorch/debug2/third_party/nvfuser/csrc/ir_cloner.h:49:22: note: in instantiation of function template specialization 'nvfuser::IrCloner::clone<nvfuser::TensorView *, long, std::unordered_set<nvfuser::Statement *>>' requested here
      copy.push_back(clone(p));
                     ^
/home/nmaruyama/pytorch/debug2/third_party/nvfuser/csrc/fusion.cpp:83:40: note: in instantiation of function template specialization 'nvfuser::IrCloner::clone<std::tuple<nvfuser::TensorView *, long, std::unordered_set<nvfuser::Statement *>>>' requested here
  to->loop_rotation_param_ = ir_cloner.clone(from->loop_rotation_param_);
                                       ^
1 error generated.

I think this is a bug of clang, but I changed the code to this->clone instead of clone. Hope this will make clang happy.

@naoyam
Copy link
Collaborator

naoyam commented Feb 28, 2023

I get this build error (clang 13):

/home/nmaruyama/pytorch/debug2/third_party/nvfuser/csrc/ir_cloner.h:67:10: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
        [this](auto&... x) { return std::make_tuple<Ts...>(clone(x)...); },
         ^
/home/nmaruyama/pytorch/debug2/third_party/nvfuser/csrc/ir_cloner.h:49:22: note: in instantiation of function template specialization 'nvfuser::IrCloner::clone<nvfuser::TensorView *, long, std::unordered_set<nvfuser::Statement *>>' requested here
      copy.push_back(clone(p));
                     ^
/home/nmaruyama/pytorch/debug2/third_party/nvfuser/csrc/fusion.cpp:83:40: note: in instantiation of function template specialization 'nvfuser::IrCloner::clone<std::tuple<nvfuser::TensorView *, long, std::unordered_set<nvfuser::Statement *>>>' requested here
  to->loop_rotation_param_ = ir_cloner.clone(from->loop_rotation_param_);
                                       ^
1 error generated.

I think this is a bug of clang, but I changed the code to this->clone instead of clone. Hope this will make clang happy.

Yeah, looks like so. I don't see any error with the change.

Comment on lines 25 to 26
inlineMost();
fusion.rotateLoop(tv4, -1, {tv1, tv2});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks inconsistent to me. Both are a fusion-wide operation, and one implicitly uses the fusion from a FusionGuard, but another is a method of Fusion. I'd say just rotateLoop(tv4, -1, {tv1, tv2}); would align well with the existing APIs.

@naoyam
Copy link
Collaborator

naoyam commented Feb 28, 2023

Just curious, can loop rotation be nested? Not that I think it's important.

@zasdfgbnm
Copy link
Collaborator Author

Just curious, can loop rotation be nested? Not that I think it's important.

I don't think so. To support nested loop rotation, we need to carefully handle the registerReplace and registerInsertBefore to make sure it does not change the abandoned main loop.

@naoyam
Copy link
Collaborator

naoyam commented Feb 28, 2023

Just curious, can loop rotation be nested? Not that I think it's important.

I don't think so. To support nested loop rotation, we need to carefully handle the registerReplace and registerInsertBefore to make sure it does not change the abandoned main loop.

OK. Maybe add a comment about it?

@zasdfgbnm
Copy link
Collaborator Author

Just curious, can loop rotation be nested? Not that I think it's important.

I don't think so. To support nested loop rotation, we need to carefully handle the registerReplace and registerInsertBefore to make sure it does not change the abandoned main loop.

OK. Maybe add a comment about it?

Actually, it is pretty easy to support it with very little extra work: I can just rotate one loop each pass.

I updated this PR to support this in 67f7df3

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving now. Thanks for the updates. This is still not something I like, #2500 (comment), but it's a question on code style, and it shouldn't block the PR.

@zasdfgbnm zasdfgbnm merged commit 591b8ae into devel Mar 1, 2023
@zasdfgbnm zasdfgbnm deleted the loop-rotation-submit branch March 1, 2023 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants