Some misc swizzle changes #2138

zasdfgbnm · 2022-10-29T01:09:51Z

The biggest change in this PR is to change the index lowering of swizzle. Currently in devel branch, in index_compute.cpp the lowering of swizzle still creates a swizzle op, which in codegen.cpp will be replaced with strings like "Xor", "ZShape", etc., which are functions defined in swizzle.cu. This PR changes the lowering of swizzle as follows:

The definition of swizzle is pulled from swizzle.cu into ops/swizzle.{h, cpp} as composite operators.
- Swizzle is no longer different from other composite ops like dropout, layer_norm, etc.
  - We can just apply swizzle to TensorViews to obtain new TensorViews, making it more flexible to test and play with. See FusionSwizzleExample*, which uses PyTorch's advanced indexing to visualize the memory layout of swizzled tensor.
  - The bank conflict checker is now able to work with swizzle.
- The swizzle.cu will be cleaned up in later PR
- The special handling of swizzle op in codegen.cpp is also deprecated, and will be removed in a followup PR.
In index_compute.cpp, instead of creating a swizzle op in index math, it just calls these composite operators in swizzle.h to create the index math.
In the future, I will add swizzle support to the transpose scheduler. The getTransposeHeuristics will take advantage of the bank conflict checker to pick the best swizzle strategy for each shared memory buffer.

Besides, I also:

modified VectorizeValidator to check both X and Y to reject vectorized swizzled ID. See [MatMul] Prolog build out, adding automatic swizzle generator for a few tile sizes #1900 (comment)
added a check to disallow data swizzle on global tensor. Currently, global indexing just ignores data swizzle.

Performance checked against #2022, no perf regression.

zasdfgbnm · 2022-10-29T07:29:58Z

torch/csrc/jit/codegen/cuda/test/test_gpu3.cpp

      "TV4 is not redundantly used but not detected.");
 }

-// Test a basic swizzle pattern


All swizzle tests in this file are moved to test_gpu_swizzle.cpp

zasdfgbnm · 2022-10-29T07:30:36Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+using namespace torch::jit::fuser::cuda;
+
+// Test a basic swizzle pattern
+TEST_F(NVFuserTest, FusionSimpleSwizzle0_CUDA) {


Moved from test_gpu3.cpp with trivial modification.

zasdfgbnm · 2022-10-29T07:30:43Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+}
+
+// Test swizzle inlining
+TEST_F(NVFuserTest, FusionSimpleSwizzle1_CUDA) {


Moved from test_gpu3.cpp without modification.

zasdfgbnm · 2022-10-29T07:30:59Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+// Test sync insertion and memory check in parallelized swizzles.
+//  In this test, data is parallel written into smem in zcurve
+//   pattern and then read out and output to global mem unswizzled.
+TEST_F(NVFuserTest, FusionSimpleSwizzle2_CUDA) {


Moved from test_gpu3.cpp without modification.

zasdfgbnm · 2022-10-29T07:31:30Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+}
+
+// Test BestEffortReplay behavior with swizzle op
+TEST_F(NVFuserTest, FusionSwizzleMapping_CUDA) {


Moved from test_gpu3.cpp without modification.

zasdfgbnm · 2022-10-29T07:31:37Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+}
+
+// Test a basic loop swizzle pattern
+TEST_F(NVFuserTest, FusionLoopSwizzle0_CUDA) {


Moved from test_gpu3.cpp without modification.

zasdfgbnm · 2022-10-29T07:31:43Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+}
+
+// Outer block zshape pattern
+TEST_F(NVFuserTest, FusionLoopSwizzle1_CUDA) {


Moved from test_gpu3.cpp without modification.

zasdfgbnm · 2022-10-29T07:31:53Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+}
+
+// Test assertion in unsupported pattern: non-leaf loop swizzle.
+TEST_F(NVFuserTest, FusionLoopSwizzleCheck0_CUDA) {


Moved from test_gpu3.cpp without modification.

zasdfgbnm · 2022-10-29T07:32:00Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+}
+
+// Test assertion in unsupported pattern: half-inlined loop swizzle.
+TEST_F(NVFuserTest, FusionLoopSwizzleCheck1_CUDA) {


Moved from test_gpu3.cpp without modification.

zasdfgbnm · 2022-10-29T07:32:17Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+  ASSERT_ANY_THROW(fe.compileFusion(&fusion));
+}
+
+TEST_F(NVFuserTest, FusionSwizzleVectorize_CUDA) {


This is a new test, please review.

zasdfgbnm · 2022-10-29T07:32:24Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+  ASSERT_ANY_THROW(GpuLower lower(&fusion));
+}
+
+TEST_F(NVFuserTest, FusionTransposeBankConflictSwizzle1_CUDA) {


This is a new test, please review.

zasdfgbnm · 2022-10-29T07:32:46Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+  }
+}
+
+TEST_F(NVFuserTest, FusionDataSwizzleGlobal_CUDA) {


This is a new test, please review.

zasdfgbnm · 2022-10-29T07:33:03Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+
+} // namespace
+
+TEST_F(NVFuserTest, FusionSwizzleExampleZShape_CUDA) {


This is a new test, please review.

Nice way to test the swizzle ops!

zasdfgbnm · 2022-10-29T07:33:09Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+  TORCH_CHECK(at::allclose(input, unswizzled));
+}
+
+TEST_F(NVFuserTest, FusionSwizzleExampleXor_CUDA) {


This is a new test, please review.

zasdfgbnm · 2022-10-29T07:33:15Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+  TORCH_CHECK(at::allclose(input, unswizzled));
+}
+
+TEST_F(NVFuserTest, FusionSwizzleExampleCyclicShift_CUDA) {


This is a new test, please review.

naoyam

Overall looks very good. One question is whether we would need to expose the new swizzle functions as they seem to be only used as part of the lowering. Would you expect they could be directly used by the user as well?

torch/csrc/jit/codegen/cuda/arith.h

torch/csrc/jit/codegen/cuda/ops/swizzle.h

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

naoyam · 2022-10-29T18:59:00Z

torch/csrc/jit/codegen/cuda/test/test_gpu_swizzle.cpp

+
+} // namespace
+
+TEST_F(NVFuserTest, FusionSwizzleExampleZShape_CUDA) {


Nice way to test the swizzle ops!

naoyam

LGTM

zasdfgbnm added 14 commits October 28, 2022 14:47

fix lower validation

86f28b8

vectorize throw test

c0ed092

add cpp definition of swizzles

9c9c0c9

cpp_div

09aec5a

new test file

2c89190

disable global data swizzle

36d63dd

lower swizzle to expr

acc220e

fix

ae1e128

swizzle bank conflict

4c16818

FusionSwizzleExample

a5d6c5e

TODO

9556f5f

comment

1264c15

some comments

9fa281c

new line

8ae7435

zasdfgbnm marked this pull request as ready for review October 29, 2022 07:06

zasdfgbnm changed the title ~~[WIP][Not ready for review] Swizzle changes~~ Some misc swizzle changes Oct 29, 2022

zasdfgbnm commented Oct 29, 2022

View reviewed changes

zasdfgbnm requested review from csarofeen and naoyam October 29, 2022 07:42

naoyam reviewed Oct 29, 2022

View reviewed changes

zasdfgbnm added 4 commits October 29, 2022 16:46

pull swizzles out of ops

f812530

only expose dispatchSwizzle and dispatchUnSwizzle

aeb5670

message for test failing FusionTransposeBankConflictSwizzle1_CUDA

9947796

use at::equal instead of at::allclose

5fe2e41

zasdfgbnm requested a review from naoyam October 30, 2022 00:04

zasdfgbnm mentioned this pull request Oct 30, 2022

Swizzle cleanups #2142

Merged

format

59c66b2

naoyam approved these changes Oct 30, 2022

View reviewed changes

zasdfgbnm merged commit 292ebef into devel Oct 30, 2022

zasdfgbnm deleted the swizzle-changes branch October 30, 2022 16:59


		} // namespace

		TEST_F(NVFuserTest, FusionSwizzleExampleZShape_CUDA) {

Some misc swizzle changes #2138

Some misc swizzle changes #2138

Uh oh!

Conversation

zasdfgbnm commented Oct 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm Oct 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zasdfgbnm commented Oct 29, 2022 •

edited

Loading

zasdfgbnm Oct 29, 2022 •

edited

Loading