forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 7
Minor update on cp.async code generation. #1901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
54 commits
Select commit
Hold shift + click to select a range
6f5ba21
use custom propagator in ampere TN
shmsong 2329caf
add tile ordering utilities
shmsong 121af43
initial matmul scheduler implementation
shmsong f958c53
use matmul scheduler prototype on ampere and turing test cases
shmsong 397f74c
extend to support Volta
shmsong 00d9a57
minor cleanup
shmsong d7035aa
comment cleanup
shmsong 9ffc61d
minor fix
shmsong ed0f525
add fragment iteration and use it in matmul scheduler
shmsong c972116
use scheduler params for tests
shmsong d12a90f
fragment support in double buffer
shmsong c306b9b
add register double buffering test cases
shmsong 63f561f
clean up custom transform propagator
shmsong 3d47c1f
Merge remote-tracking branch 'origin/devel' into matmul_propagator
shmsong 29f88c7
rebase fix
shmsong d029b9f
comment
shmsong 5ac053f
move bounded selector to common area
shmsong b51d247
Add logic to handle fake boundary tensors in selection.
shmsong aba5087
naming and comment
shmsong 426c381
remove unused parameters from mma node
shmsong 6d4f377
remove unnecessary parameters from mma ir node
shmsong 5e1f41f
rename scheduling variables
shmsong 1960da9
change accumulator tv interface
shmsong 3a411c2
Update torch/csrc/jit/codegen/cuda/scheduler/utils.h
shmsong 8f2e4da
PR feedback
shmsong eef3a97
Merge branch 'matmul_propagator' of https://github.com/csarofeen/pyto…
shmsong 6ad2967
pipe through parallel type position
shmsong 65c8f0a
Merge remote-tracking branch 'origin/devel' into matmul_propagator
shmsong cd03b00
Revert "fragment support in double buffer"
shmsong 380dd66
Merge branch 'matmul_propagator' into fragment_iter
shmsong 6ce6ff6
use cache op to handle double buffer input
shmsong 62f09fc
add more comment in matmul scheduler
shmsong 538aa8b
more comments
shmsong 91f44fd
comment fix
shmsong 75d51a5
Merge remote-tracking branch 'origin/devel' into fragment_iter
shmsong 546844a
rebase fix
shmsong ca55194
add inline pred for cpasync
shmsong 2b6f447
Merge remote-tracking branch 'origin/devel' into speculative_index
shmsong 41c221a
minor cleanup
shmsong 214f2a2
add inlining test in unit
shmsong 99e4d4c
add option to dump ptx
shmsong da45d51
Merge remote-tracking branch 'origin/devel' into speculative_index
shmsong c4a8739
rebase fix
shmsong 7f42537
Fix missing thread predicates
naoyam 93124a3
Merge branch 'devel' of github.com:csarofeen/pytorch into speculative…
zasdfgbnm ebeb201
fix merge
zasdfgbnm cde6e4d
fix merge
zasdfgbnm 022c443
format
zasdfgbnm c90b90f
Merge branch 'devel' of github.com:csarofeen/pytorch into speculative…
zasdfgbnm 3417e8e
Merge branch 'speculative_index' of github.com:csarofeen/pytorch into…
zasdfgbnm 52099e0
cleanup
zasdfgbnm 7d6e28d
Merge branch 'devel' of github.com:csarofeen/pytorch into speculative…
zasdfgbnm 1f8ecba
cleanup clone
zasdfgbnm 0742e7e
fix
zasdfgbnm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1434,42 +1434,6 @@ TEST_F(NVFuserTest, FusionSimplePWise_CUDA) { | |
TORCH_CHECK(output_ref.equal(output)); | ||
} | ||
|
||
TEST_F(NVFuserTest, FusionSimpleAmperePipeline_CUDA) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. moved to the new file to keep this file below 10k |
||
Fusion fusion; | ||
FusionGuard fg(&fusion); | ||
|
||
// requires ampere+ GPU | ||
if (!deviceMajorMinorCheck(8)) { | ||
GTEST_SKIP() << "skipping tests on pre-AMPERE GPUs"; | ||
return; | ||
} | ||
|
||
auto tv0 = makeContigTensor(1); | ||
|
||
fusion.addInput(tv0); | ||
|
||
auto tv1 = set(tv0); | ||
|
||
fusion.addOutput(tv1); | ||
|
||
auto tv_cache = tv0->cacheAfter(LoadStoreOpType::CpAsync); | ||
tv_cache->setMemoryType(MemoryType::Shared); | ||
|
||
tv1->split(0, 16); | ||
tv0->computeAt(tv1, 1); | ||
|
||
tv_cache->circularBuffer(10); | ||
|
||
auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0); | ||
at::Tensor input1 = at::randn({255}, options); | ||
|
||
FusionExecutor fe; | ||
fe.compileFusion(&fusion, {input1}); | ||
auto cg_outputs = fe.runFusion({input1}); | ||
|
||
testValidate(&fusion, cg_outputs, {input1}, {input1}, __LINE__, __FILE__); | ||
} | ||
|
||
TEST_F(NVFuserTest, FusionSimplePWiseDtypeComplex_CUDA) { | ||
Fusion fusion; | ||
FusionGuard fg(&fusion); | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.