Skip to content

Conversation

naoyam
Copy link
Collaborator

@naoyam naoyam commented Mar 13, 2023

persistent_use_of_buffer is accumulated over all the resolution points. Recomputation for each persistent use should be done after the accumulation is done.

Currently, recomputation and replaceVal can be done redundantly. For example, on A100, that happens with NvFuserScheduler_BatchNorm_fp32/64/32/256.

Recomputation for each persistent use should be done after the
accumulation is done.

Currently, recomputation and replaceVal can be done redundantly. For
example, on A100, that happens with NvFuserScheduler_BatchNorm_fp32/64/32/256.
@naoyam naoyam requested a review from csarofeen March 13, 2023 21:40
Copy link
Owner

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@naoyam naoyam merged commit e8557ec into devel Mar 14, 2023
@naoyam naoyam deleted the fix_buffer_projection branch March 14, 2023 00:50
naoyam added a commit that referenced this pull request Mar 14, 2023
jjsjann123 added a commit to NVIDIA/Fuser that referenced this pull request Mar 14, 2023
…s. (#4)

Cherry-picking from: csarofeen/pytorch#2576

Author: Naoya Maruyama [email protected]
Date: Mon Mar 13 17:50:01 2023 -0700

persistent_use_of_buffer is accumulated over all the resolution points. (#2576)

Recomputation for each persistent use should be done after the
accumulation is done.

Currently, recomputation and replaceVal can be done redundantly. For
example, on A100, that happens with NvFuserScheduler_BatchNorm_fp32/64/32/256.

Co-authored-by: Naoya Maruyama <[email protected]>
jacobhinkle pushed a commit to jacobhinkle/Fuser that referenced this pull request Mar 15, 2023
…s. (NVIDIA#4)

Cherry-picking from: csarofeen/pytorch#2576

Author: Naoya Maruyama [email protected]
Date: Mon Mar 13 17:50:01 2023 -0700

persistent_use_of_buffer is accumulated over all the resolution points. (NVIDIA#2576)

Recomputation for each persistent use should be done after the
accumulation is done.

Currently, recomputation and replaceVal can be done redundantly. For
example, on A100, that happens with NvFuserScheduler_BatchNorm_fp32/64/32/256.

Co-authored-by: Naoya Maruyama <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants