Skip to content

Conversation

zasdfgbnm
Copy link
Collaborator

So that it has the same size as rfactor domain

Comment on lines -114 to 127
" Send/Receive Val {T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ]} from cluster 0 to cluster 2\n"
" AggregateVal representing Val T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ] on cluster 2\n"
" AggregateExpr representing Cluster 2.Inputs={T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ], }. Outputs={T5_l[ rS12{i2}, iS13{i3} ], }.\n"
" AggregateVal representing Val T5_l[ rS12{i2}, iS13{i3} ] on cluster 2\n"
" Send/Receive Val {T5_l[ rS12{i2}, iS13{i3} ]} from cluster 2 to cluster 3\n"
" AggregateVal representing Val T5_l[ rS12{i2}, iS13{i3} ] on cluster 3\n"
" Send/Receive Val {T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ]} from cluster 0 to cluster 1\n"
" AggregateVal representing Val T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ] on cluster 1\n"
" AggregateExpr representing Cluster 1.Inputs={T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ], }. Outputs={T3_l[ rS8{i2}, iS9{i3} ], }.\n"
" AggregateVal representing Val T3_l[ rS8{i2}, iS9{i3} ] on cluster 1\n"
" Send/Receive Val {T3_l[ rS8{i2}, iS9{i3} ]} from cluster 1 to cluster 3\n"
" AggregateVal representing Val T3_l[ rS8{i2}, iS9{i3} ] on cluster 3\n"
" AggregateExpr representing Cluster 3.Inputs={T5_l[ rS12{i2}, iS13{i3} ], T3_l[ rS8{i2}, iS9{i3} ], }. Outputs={T6_g[ iS14{i3} ], }.\n"
" Send/Receive Val {T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ]} from cluster 0 to cluster 2\n"
" AggregateVal representing Val T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ] on cluster 2\n"
" AggregateExpr representing Cluster 2.Inputs={T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ], }. Outputs={T5_l[ rS12{i2}, iS13{i3} ], }.\n"
" AggregateVal representing Val T5_l[ rS12{i2}, iS13{i3} ] on cluster 2\n"
" Send/Receive Val {T5_l[ rS12{i2}, iS13{i3} ]} from cluster 2 to cluster 3\n"
" AggregateVal representing Val T5_l[ rS12{i2}, iS13{i3} ] on cluster 3\n"
" AggregateExpr representing Cluster 3.Inputs={T3_l[ rS8{i2}, iS9{i3} ], T5_l[ rS12{i2}, iS13{i3} ], }. Outputs={T6_g[ iS14{i3} ], }.\n"
"}\n"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this changed, and I don't know if this change is OK, it seems it is just some order change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samnordmann will look into it. I think it's fine to change it for now.

@zasdfgbnm, please update the test if necessary.

@zasdfgbnm
Copy link
Collaborator Author

zasdfgbnm commented Mar 10, 2023

I am marking this PR as draft because I need to wait #2561 and update the python frontend. But I ran C++ and torchscript tests, and they pass. So this test should be ready for review.

I run tests with

git revert --no-commit 3b85308a8e303d0df43c2d3cac1edba87dde2e49

@zasdfgbnm zasdfgbnm requested review from jjsjann123 and naoyam March 10, 2023 02:42
@zasdfgbnm
Copy link
Collaborator Author

Marking as ready-for-review, python frontend updated.

@zasdfgbnm zasdfgbnm marked this pull request as ready for review March 10, 2023 05:24
Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a couple of minor comments.

Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment and potentially a bug (I'm speculating)

@@ -425,12 +416,14 @@ class VectorizeValidator : public OptInDispatch {

// Contiguity is based on rfactor domain.
IterDomain* last_root_dim = nullptr;
size_t last_root_dim_pos;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uninitialized last_root_dim_pos. If we have a tensor that's full size-1 (broadcasted), then we are *tv->domain()->contiguity().at(last_root_dim_pos), that would be a segfault/UB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be avoided by the if (last_root_dim == nullptr) before the *tv->domain()->contiguity().at(last_root_dim_pos)

@@ -6,6 +6,7 @@
#include <torch/csrc/jit/ir/ir.h>
#include <type.h>
#include <array>
#include <optional>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naoyam I fixed a compilation error:

/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/executor_kernel_arg.h:282:18: error: ‘optional’ in namespace ‘std’ does not name a template type
  282 |       const std::optional<KernelIndexMode>& index_mode = std::nullopt);

Comment on lines +81 to +82
executor_ptr->compileFusion(
fusion_from_cluster.get(), args, launch_params, {});
Copy link
Collaborator Author

@zasdfgbnm zasdfgbnm Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another compilation error fix:

/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/multidevice/multidevice_runtime.cpp:81:30: error: no matching function for call to ‘nvfuser::FusionExecutor::compileFusion(std::unique_ptr<nvfuser::Fusion>::pointer, nvfuser::KernelArgumentHolder&, nvfuser::LaunchParams&)’
   81 |   executor_ptr->compileFusion(fusion_from_cluster.get(), args, launch_params);
      |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/kernel_cache.h:4,
                 from /home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/fusion_segmenter.h:5,
                 from /home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/multidevice/multidevice_runtime.cpp:2:
/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/executor.h:62:8: note: candidate: ‘void nvfuser::FusionExecutor::compileFusion(nvfuser::Fusion*, const nvfuser::KernelArgumentHolder&, const nvfuser::LaunchParams&, nvfuser::CompileParams)’
   62 |   void compileFusion(
      |        ^~~~~~~~~~~~~
/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/executor.h:62:8: note:   candidate expects 4 arguments, 3 provided
/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/executor.h:71:8: note: candidate: ‘void nvfuser::FusionExecutor::compileFusion(nvfuser::Fusion*, const c10::ArrayRef<c10::IValue>&, const nvfuser::LaunchParams&, nvfuser::CompileParams)’
   71 |   void compileFusion(
      |        ^~~~~~~~~~~~~
/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/executor.h:73:40: note:   no known conversion for argument 2 from ‘nvfuser::KernelArgumentHolder’ to ‘const c10::ArrayRef<c10::IValue>&’
   73 |       const at::ArrayRef<c10::IValue>& inputs = {},

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I guess I didn't see it as I had USE_DISTRIBUTED=0.

@zasdfgbnm zasdfgbnm merged commit 9eb4c20 into devel Mar 10, 2023
@zasdfgbnm zasdfgbnm deleted the contiguity-none branch March 10, 2023 22:03
@jjsjann123
Copy link
Collaborator

The build fix comes in just about time~~ Thanks for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants