-
Notifications
You must be signed in to change notification settings - Fork 7
Change contiguity into std::vector<c10::optional<bool>>
#2569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
" Send/Receive Val {T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ]} from cluster 0 to cluster 2\n" | ||
" AggregateVal representing Val T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ] on cluster 2\n" | ||
" AggregateExpr representing Cluster 2.Inputs={T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ], }. Outputs={T5_l[ rS12{i2}, iS13{i3} ], }.\n" | ||
" AggregateVal representing Val T5_l[ rS12{i2}, iS13{i3} ] on cluster 2\n" | ||
" Send/Receive Val {T5_l[ rS12{i2}, iS13{i3} ]} from cluster 2 to cluster 3\n" | ||
" AggregateVal representing Val T5_l[ rS12{i2}, iS13{i3} ] on cluster 3\n" | ||
" Send/Receive Val {T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ]} from cluster 0 to cluster 1\n" | ||
" AggregateVal representing Val T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ] on cluster 1\n" | ||
" AggregateExpr representing Cluster 1.Inputs={T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ], }. Outputs={T3_l[ rS8{i2}, iS9{i3} ], }.\n" | ||
" AggregateVal representing Val T3_l[ rS8{i2}, iS9{i3} ] on cluster 1\n" | ||
" Send/Receive Val {T3_l[ rS8{i2}, iS9{i3} ]} from cluster 1 to cluster 3\n" | ||
" AggregateVal representing Val T3_l[ rS8{i2}, iS9{i3} ] on cluster 3\n" | ||
" AggregateExpr representing Cluster 3.Inputs={T5_l[ rS12{i2}, iS13{i3} ], T3_l[ rS8{i2}, iS9{i3} ], }. Outputs={T6_g[ iS14{i3} ], }.\n" | ||
" Send/Receive Val {T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ]} from cluster 0 to cluster 2\n" | ||
" AggregateVal representing Val T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ] on cluster 2\n" | ||
" AggregateExpr representing Cluster 2.Inputs={T1_l[ iS3{i0}, iS4{i2}, iS5{i3} ], }. Outputs={T5_l[ rS12{i2}, iS13{i3} ], }.\n" | ||
" AggregateVal representing Val T5_l[ rS12{i2}, iS13{i3} ] on cluster 2\n" | ||
" Send/Receive Val {T5_l[ rS12{i2}, iS13{i3} ]} from cluster 2 to cluster 3\n" | ||
" AggregateVal representing Val T5_l[ rS12{i2}, iS13{i3} ] on cluster 3\n" | ||
" AggregateExpr representing Cluster 3.Inputs={T3_l[ rS8{i2}, iS9{i3} ], T5_l[ rS12{i2}, iS13{i3} ], }. Outputs={T6_g[ iS14{i3} ], }.\n" | ||
"}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this changed, and I don't know if this change is OK, it seems it is just some order change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samnordmann will look into it. I think it's fine to change it for now.
@zasdfgbnm, please update the test if necessary.
I am marking this PR as draft because I need to wait #2561 and update the python frontend. But I ran C++ and torchscript tests, and they pass. So this test should be ready for review. I run tests with git revert --no-commit 3b85308a8e303d0df43c2d3cac1edba87dde2e49 |
Marking as ready-for-review, python frontend updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left a couple of minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment and potentially a bug (I'm speculating)
@@ -425,12 +416,14 @@ class VectorizeValidator : public OptInDispatch { | |||
|
|||
// Contiguity is based on rfactor domain. | |||
IterDomain* last_root_dim = nullptr; | |||
size_t last_root_dim_pos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uninitialized last_root_dim_pos
. If we have a tensor that's full size-1 (broadcasted), then we are *tv->domain()->contiguity().at(last_root_dim_pos)
, that would be a segfault/UB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be avoided by the if (last_root_dim == nullptr)
before the *tv->domain()->contiguity().at(last_root_dim_pos)
@@ -6,6 +6,7 @@ | |||
#include <torch/csrc/jit/ir/ir.h> | |||
#include <type.h> | |||
#include <array> | |||
#include <optional> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naoyam I fixed a compilation error:
/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/executor_kernel_arg.h:282:18: error: ‘optional’ in namespace ‘std’ does not name a template type
282 | const std::optional<KernelIndexMode>& index_mode = std::nullopt);
executor_ptr->compileFusion( | ||
fusion_from_cluster.get(), args, launch_params, {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another compilation error fix:
/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/multidevice/multidevice_runtime.cpp:81:30: error: no matching function for call to ‘nvfuser::FusionExecutor::compileFusion(std::unique_ptr<nvfuser::Fusion>::pointer, nvfuser::KernelArgumentHolder&, nvfuser::LaunchParams&)’
81 | executor_ptr->compileFusion(fusion_from_cluster.get(), args, launch_params);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/kernel_cache.h:4,
from /home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/fusion_segmenter.h:5,
from /home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/multidevice/multidevice_runtime.cpp:2:
/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/executor.h:62:8: note: candidate: ‘void nvfuser::FusionExecutor::compileFusion(nvfuser::Fusion*, const nvfuser::KernelArgumentHolder&, const nvfuser::LaunchParams&, nvfuser::CompileParams)’
62 | void compileFusion(
| ^~~~~~~~~~~~~
/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/executor.h:62:8: note: candidate expects 4 arguments, 3 provided
/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/executor.h:71:8: note: candidate: ‘void nvfuser::FusionExecutor::compileFusion(nvfuser::Fusion*, const c10::ArrayRef<c10::IValue>&, const nvfuser::LaunchParams&, nvfuser::CompileParams)’
71 | void compileFusion(
| ^~~~~~~~~~~~~
/home/gaoxiang/nvfuser7/third_party/nvfuser/csrc/executor.h:73:40: note: no known conversion for argument 2 from ‘nvfuser::KernelArgumentHolder’ to ‘const c10::ArrayRef<c10::IValue>&’
73 | const at::ArrayRef<c10::IValue>& inputs = {},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I guess I didn't see it as I had USE_DISTRIBUTED=0
.
The build fix comes in just about time~~ Thanks for that |
So that it has the same size as rfactor domain