Sibiling computeAt pos out of sync in inlining #1803

shmsong · 2022-07-05T23:08:21Z

🐛 Describe the bug

Recent changes in PR #1800 made it possible for multi-output siblings to run out of sync after computeAt pass. Here is a minimal repro:


TEST_F(NVFuserTest, FusionInlineRepro_CUDA) {
  Fusion fusion;
  FusionGuard fg(&fusion);
  
  TensorView* tv0 = makeContigTensor(2);
  
  fusion.addInput(tv0);
  auto tv1 = set(tv0);
  auto tvs = Welford(tv1, {1});
  auto tvo = set(tvs.var_sum);
  fusion.addOutput(tvo);

  tvo->split(0, 16);
  tvo->axis(1)->parallelize(ParallelType::Unroll);

  tv0->computeAt(tvo,-1, ComputeAtMode::BestEffort);

  fusion.printMath();

  TORCH_INTERNAL_ASSERT(tvs.var_sum->getComputeAtPosition() == tvs.avg->getComputeAtPosition());
}

snippet from output:

T2_l[ iS21{( ceilDiv(i1, 16) )}, iS22{16}, rS5{i2} ] ca_pos( 2 ) produce_pos( 2)(Avg),
T3_l[ iS13{( ceilDiv(i1, 16) )}, iS14{16}, rS7{i2} ] ca_pos( 1 ) produce_pos( 2)(Var),
T4_l[ iS19{( ceilDiv(i1, 16) )}, iS20{16}, rS9{i2} ] ca_pos( 2 ) produce_pos( 2)(Count)
 = Welford ( T1_l[ iS15{( ceilDiv(i1, 16) )}, iS16{16}, iS3{i2} ] ca_pos( 2 )(Avg), 
  allreduce = 0 )

Note the difference in ca_pos on the three outputs.

Out of sync is one trouble, and actually ca_pos of 2 should not be allowed, this would lead to confusion at expression sorting stage.

A possible fix I'm guessing would be to incorporate getMaxPosAll of all siblings when propagating among them.

https://github.com/csarofeen/pytorch/blob/devel/torch/csrc/jit/codegen/cuda/inline_propagator.cpp#L253

Versions

TO BE UPDATED

The text was updated successfully, but these errors were encountered:

shmsong assigned zasdfgbnm Jul 5, 2022

zasdfgbnm mentioned this issue Jul 6, 2022

Check siblings in InlinePropagator::getMaxPosAll #1805

Merged

csarofeen closed this as completed in #1805 Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sibiling computeAt pos out of sync in inlining #1803

Sibiling computeAt pos out of sync in inlining #1803

shmsong commented Jul 5, 2022 •

edited

Loading

Sibiling computeAt pos out of sync in inlining #1803

Sibiling computeAt pos out of sync in inlining #1803

Comments

shmsong commented Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐛 Describe the bug

Versions

shmsong commented Jul 5, 2022 •

edited

Loading