Skip to content

Commit 65af1a4

Browse files
authored
Inserting sync for redundant parallel types is already done at the (#2023)
beginning of this loop.
1 parent 6ac74d1 commit 65af1a4

File tree

1 file changed

+0
-20
lines changed

1 file changed

+0
-20
lines changed

torch/csrc/jit/codegen/cuda/lower_sync_information.cpp

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -410,26 +410,6 @@ void SyncMap::build(Fusion* fusion) {
410410
}
411411
}
412412

413-
// If same parallel type and mapped, no need for syncs unless
414-
// producer is in smem, producer parallel type is a thread
415-
// dimension, and consumer concretizes the dimension. This sync is
416-
// due to the redundant predicate omission in lower thread
417-
// predicate.
418-
auto redundant_preds = GpuLower::current()
419-
->threadPredMap()
420-
.getPredicateInfo(producer)
421-
.redundant_types;
422-
423-
if (p_id->isBroadcast() &&
424-
GpuLower::current()->concretizedBroadcastDomains()->isConcretized(
425-
p_id) &&
426-
producer->getMemoryType() == MemoryType::Shared &&
427-
redundant_preds.hasTID()) {
428-
redundant_preds.clearAllBID();
429-
raw_dims |= redundant_preds;
430-
continue;
431-
}
432-
433413
// When the producer axis is a broadcast, it is not really
434414
// parallelized unless thread-predicated and concretized
435415
if (isParallelTypeThread(producer_ptype) && p_id->isBroadcast() &&

0 commit comments

Comments
 (0)